Hi, I don't know if this is an A-Shell problem, I'm just posting here in case we have any workaround that can speed up the solution.
In resume, I'm importing CSV files generated in KOFAX (maybe some of you know this scanning package) and they produce it in Unicode. If I read them using "input csv" I got nothing, I must open in notepad and save it in ANSI or UTF-8 and, from there, everything runs smoothly. I already asked to Kofax guys if they can save it in one of those two formats but, they spent already two days w/o replying and, from the previous steps of this project, I can see how limited they are.
It's not a nightmare to open and save those files but, from what's supposed to be an automatic process, have to start it with this kind of step, it's not very elegant.
The name Kofax brings back memories - way back in the 90's Ty and I were involved in an imaging product called Co-Star which Alpha Micro briefly sold, and Kofax was the preferred high-end scanning solution. But that's not much help to us now.
It seems like we should have already solved this problem, but I guess not. XTREE is now entirely in Unicode (thanks to a request by you relating to some Russian language translations), with translations back and forth between Unicode, UTF-8 and Ansi, so it doesn't seem like too much of a stretch. The question is exactly how to implement it.
From the application perspective, the ideal would if it were transparent, i.e. the conversion was automatic within the INPUT statement. But I'm afraid that will be messy to implement, as INPUT is extremely complex, and the conversion from Unicode to ANSI/Latin1 (which we normally use) is not guaranteed. (UTF-8 would be, but existing applications are going to choke when they run into UTF-8 multi-byte characters.) So I'm thinking that some kind of standalone utility, perhaps an XCALL, to convert the file, or maybe one line of the file at a time, would be more straightforward, with less risk of ripple effects. You still have the problem of what to do with Unicode characters that require multiple bytes to translate into UTF-8, or which can't be handled in ANSI/Latin1 at all. Converting to UTF-8 and not worrying about that would be the simplest. But probably more practical would be to convert to ANSI/Latin1 and replace untranslatable characters with something like "?".
I'm kind of backed up right now but could imagine getting something done in the next few days, once we settle on just what it should be.
Hi Jack, I remember Co-Star and participated in one, and only, implementation of it in Portugal, in the early 90's And, yes, I believe we are talking about the same Kofax which, regarding the data handling look like suspended on time.
As I mentioned, I believe they have the option to save it in ANSI already but, after three days of my request, they should still be looking for that option around, the same one that I saw in their dashboard at the beginning of the project when they sent a screenshot (I believe, it was a 2MB one, inside of a Word document)
So, regarding to this particular case, an XCALL to just save the file in ANSI would be more than enough, better would be if we could have a GET_ENDCODING to know if it needs convertion and a SAVE_ENDCODING. But. considerig that I have a few settings to import the Kofax files, addiing another one to inform the expected endcoding would the replace the need to have the GET_ENDCODING safely because they will not produce files in different formats.
Obviously that have this fully handled by INPUT and A-Shell globally would be the perfect option but, put a lot of work on this and add hipotetical problems to solve ahead, I'm strongly sure that don't worth it.
Thanks for the reply and (usual) availability to take care of this.
PS: considering all the details I solved for them after designing the solution, adding this one more, probably they announce A-Shell as the best KOFAX integration platform
Last edited by Jorge Tavares - UmZero; 30 Oct 2009:18 AM.
Yes, I think a joint marketing campaign is the least they could do for us! We might even give them a discount if they bundle an A-Shell with every Kofax board.
But as for the GET_ENCODING, unfortunately, like so many other standards, there's some fuzziness around the outer edges. In this case, while the rules of the Unicode encoding are fairly rigid and precise, that's not true when it comes to how a file is supposed to indicate the encoding of the character within. Ideally it would be part of the file metadata, but as there is no standard for that, it comes down to inserting some telltale magic bytes at the start of the file. Most Unicode files will have the so-called BOM (Byte Order Mark) at the start, but there's no particular enforcement of that rule.
I would suggest the creation of a Fn'GetEncoding(file$) function that uses XCALL GET to sample the first few bytes of the file to try to determine whether it is a valid BOM and what kind, or perhaps even to do some statistical sampling of the bytes to see if the "look like" Unicode. That would be more flexible than a routine embedded in A-Shell. Once you have confidence in the identification of the encoding, then it becomes more practical to create an embedded XCALL to convert the file from one encoding to another.
In the meantime, I'm gearing up to head out to the polling station. First I have to pack my supplies (food, clothing, blankets, battery packs, masks, weapons, ...)
Well, I'm happy to say I survived voting in person. It wasn't actually that traumatic, which just goes to show how unevenly government services are distributed across the country. Even though it was the longest I've waited to vote in my entire life, it was still only about 30 minutes. And sadly, much of the delay was due to software problems in the check-in process! The name lookup logic apparently could not handle a complicated last name like McGregor - they tried every combination they could think of (MCG, MC G, MC, MAC, etc.) but it wasn't until the onsite tech support guy figured out how to get into the "advanced search" and managed to find me by just looking at all the M's and then filtering by address that I finally got my blank ballot and was directed to the actual voting stations. (There were about 50 of them, most of them vacant, since all of the bottleneck was at the check-in). And for better or worse (almost certainly the latter, as far as the taxpayer is concerned), they've done away with the old-fashioned ink pen marking devices and gone to giant touch-screens, each station with it's own printer and scanner. So you scan in your blank ballot, use the touch-screen to fill it out, it then gets printed, spit out, re-inserted to be scanned and then transmitted probably via Google, the NSA, the FSB, and who knows who else before hopefully arriving in the official electoral bit bucket with most of the bits intact. (Hopefully they haven't thrown away the old ink-blot devices as we'll probably have to go back to them when these newfangled devices break down and become too expensive to maintain. Assuming of course they don't take the obvious step of re-implementing the entire thing in XTREE.)
Anyway, in the relative calm before the much-feared political storm to come, I've put together a function that converts Unicode (U16 Windows format) to either ANSI or UTF8. It could probably use some refinement, but the beauty of it is that it's in A-ShellBASIC, so it doesn't depend on updating A-Shell, and it's easy to modify. (It is limited to the Windows platform though, since it makes a call to the KERNEL32.DLL via DYNLIB.SBR.) I posted a preliminary copy here: fnunicode.bsi
There's a test routine included in the bsi which you can activate by compiling as follows:
To test, I loaded the ash65notes.txt into notepad and then saved it as ash65notes.u16 in Unicode format and then...
Code
.RUN FNUNICODE
Test Fn'U16'to'MCBS'File()
U16 file to convert from: SYS:ASH65NOTES.U16
BOM = FEFF Windows U16 - OK
File to convert to (blank to quit): test.txt
Codepage [0]:
Flags [0]:
ASCII value of default char (63):
Return value (bytes output) = 342874
Dfltused: 0
1) to view output file:
If all goes well (which it did for me) the result is identical to the original ash65notes.txt.
Note that the test program starts by calling a separate function to check the BOM (Byte Order Mark) of the file to make sure it's the expected FE FF. (If you try to convert a file that isn't in U16 format, the results won't be pretty.) Over time, I can imagine adding a function to go the other way, or maybe a more generalized one that will go from any format to any other format. But this should be enough to help you seal the worldwide joint-marketing deal with Kofax.
I wonder about which version of the voting software is being used by Los Angeles County. The latest developer's version? The stable version? Not the latest developer's version but the one released a few months ago because they need the xyz feature? The one from four years ago because they just can't bear the idea of an update?
I’m surprised they not installed the latest multiple-checkbox xtree version? Choose as many candidates as you want..or Frank’s drop and drag to reorder your preference...
Now I'm really scared about that result in the elections Are you saying that you start scannig the blank form that you received from one guy that took half an hour to find you on the system. Then, you put the crosses using a touch-screen. Then you print the filled form. And, finally, you have to scan that paper which digital interpretation is sent as your final and secret vote to be computed somewhere in the Universe!!!!!!
Do you have statistics about, how many voters don't complete the process? How many left the initial form filled by pen in the scanner? How many left their vote on the screen and leave? How many gave up in one of the steps of the process? How many commited suicide in the process?
My God, only one thing come into my mind, the cherry on top of the cake would be if the final step, scan, read and send the result is a Kofax system
PS: Maybe you should send an urgent email to the votes counting Central, asking if they need a function to convert the received result to something understandable.
By the way, thank you very much for the solution, after recovering my breath from the news, I'll try it and give you feedback. Many, many thanks.
And you wonder how our current President got elected????
Fortunately, I anticipated all of those concerns, which is why I skipped all of the checkboxes and then used my sweaty fingertip to handwrite "Jorge Tavares" in the write-in section for position of Election Czar. (I'm afraid they may need more than Kofax to read my touch-screen handwriting though.)
Sorry for the interruption but, considering the high probability for a lot of vote recount around there, you should not have much to announce for a while so, I take the chance for a breaking news to say that, this convertion ran like a charm. It's a little detail that made an huge difference for the user. Much appreciated the solution and the way you did it, which took me a couple of minutes to embbed in my program.
I'm a little confused here -- we are talking about DYNLIB.SBR and not INPUT CSV, right? A/a parameter types?
There was definitely some recent tinkering in DYNLIB related to 64 bit support (parameter types l, L, h), but after some review, I don't see how it impacted use of the A/a parameter types (for ANSI to UTF8 conversions).
I don't have a handy test DLL that either consumes or returns UTF8 parameters, so I'm not quite sure how to test. (If you have one that can be used in a standalone way, please share it!) But I can verify that routines that use standard ANSI parameters (type Z/z) work fine when the parameter type is changed to A/a (which only proves that the ANSI-UTF8 conversion doesn't break ANSI strings that don't contain characters requiring multi-byte conversion).
I've attached a CSV needing conversion and below is my code, calling your functions (funicode.bpi) that takes care of everything. I'm getting "A conversão não foi bem sucedida." after the call of Fn'U16'to'MCBS'File()
Apologize if I'm giving you just pieces instead of a ready example, you're much better than me doing that, but let me know if you need anything more.
Code
encoding = Fn'U16'BOM(csv'file$)
if encoding=BOM_U16_LSB then
xcall sbxmsg, MSG'EXIT, "Ficheiro em Windows Unicode16."+CRLF$+"Vai ser convertido para UTF8.", "Conversão de encoding", OK_CANCEL, INFORMACAO
if MSG'EXIT=0 then
if Fn'U16'to'MCBS'File(csv'file$, csv'file$[1, -4]+"txt", codepage=0, flags=0, dfltchar=63)=0 then
xcall sbxmsg, 0, "A conversão não foi bem sucedida.", "Conversão de encoding", 0, CRUZ
EXITFUNCTION
else
csv'file$ = csv'file$[1, -4]+"txt"
endif
else
EXITFUNCTION
endif
elseif encoding=BOM_U16_MSB then
xcall sbxmsg, 0, "Ficheiro em MSB Unicode16."+CRLF$+"Conversão não suportada."+CRLF$+"Abra o ficheiro no notepad e grave em UFT-8.", "Conversão de encoding", 0, CRUZ
EXITFUNCTION
endif
As for the half-marathon, yes, but not yesterday, it was on the previous Sunday, 14th and it was great, the weather conditions were perfect, sunny but not hot, between 17-20ºC considering the start (at 5:50 AM) and the finish line at 7:45 so, I reached my goal to break my record and the 2:00 hours barrier with the official time of 1:55:01 corresponding to the 1268 position on the Men ranking in a total 3394 participants and the 33 place between 130 in my age slot (55-59). I'm happy because that closed the season of drinks abstinence . No matter the results, the path is very beatiful, I've attached some pictures I bought from the offical reporters.
Congrats Jorge well done! You are very focused! You give hope the the old men of the world while at the same time giving me some reason to get off my tail.. though i am more of a walker/hiker my running days are far behind me! Keep up the good work!
Wow - awesome accomplishment! I'm sorry I couldn't have been there to cheer from the sidelines (and especially to help celebrate the end of abstinence!)
Kind of hard to get psyched up to dig into character set conversion after that, but thanks for reminding me of the function. (I had forgotten that it even used DYNLIB.) This week I'm juggling both jury duty and grandparent duty (kids now take the entire week of Thanksgiving off), so I'm not sure exactly when I'm going to get to it, but it shouldn't be too long...
Thank you guys, anxyous to drink with you all in person and celebrate many victories together, until there, enjoy a fantastic Thanksgiving week with a lot of family at home to cheer, kiss, hug and do everything we love to do, which is celebrate life with family and friends. Jack, don't worry, take your time, enjoy the week with the kids, I'm not waiting for that fix. Big hug
Last edited by Jorge Tavares - UmZero; 22 Nov 2108:48 PM.