Two Dimensional Ordered Maps?
#35982
03 Mar 23 08:17 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
I'm putting out an idea here for a possible enhancement to the existing Ordered Map feature to get a sense of the degree of interest and any other comments/suggestions. I'm not sure about the rest of you, but for me, ordered maps are possibly the best addition to ASB since ... named parameters? dynamic structures? millirows? ( ) Seriously, just about any data processing task I undertake these days starts with building one or more ordered maps (typically of structures) and then manipulating them in memory. The days of one-record-at-a-time logic, particularly with SQL, but even with ISAM or other indexed files, are rapidly receding into the rear view mirror. With the massive amount of memory available today, it's almost always more efficient to assemble all the data relevant to a problem in memory (typically in ordered maps or other arrays) and work on it there. But, one recurring annoyance is how to deal with two-dimensional grids (e.g. spreadsheets, or query result sets) where you don't know in advance the columns. If you know the column layout, you can define a structure to represent one row and then load the entire data set into an array of those structures. But what do you do if the columns aren't known in advance? One option would be to first input the column information, then create a dynamic structure to represent it. But in many cases that seems like a little too much effort. What would seem better would be the ability to just load the entire data grid into a two-dimensional ordered map, indexed by row and keyed by column. That's basically equivalent to a two-dimensional array, except that here we can use alphanumeric indexing for the columns (and maybe even the rows, although that's probably less of an issue). For example, you could load an arbitrary CSV (assuming the first row contained the column names) into such a map and then directly access any one value via $map(row,column'name). You could iterate across rows using something like foreach $$i in $map(row,""), or possibly down columns using foreach $$i in $map("",column'name)To take it a step further, the rows might also be alphanumerically keyed. For example, consider a set of student records. For each student, we'd have a variable list of classes taken and grades received. The student/rows would be indexed by student ID and the columns by class ID. $strec(student'id$,class'id$) would directly tell you whether the student had taken that class and if so, what grade was received. foreach $$i in $strec(student'id$,"") would allow you to list all of the classes/grades for that student. foreach $$i in $strec("",class'id$) would allow you to list all the students that have taken that class. (We'd probably need a new variation of the .key($$i) function, perhaps .key1($$i) for the student'id$ (first key dimension) and .key2($$i) for the class'id$ (second dimension)? You can handle the direct lookup aspect with the existing one-dimensional ordered maps by creating a composite key for your map, e.g. student'id$ + class'id$. (I do this a lot, often concatenating several fields to make a composite key for temporary indexing of some dataset; two-dimensional maps won't eliminate the utility of that technique.) But it's a little awkward to iterate across rows or columns with composite keys. (Basically you have to embed some filtering logic inside your foreach loop to skip over the items outside your virtual row or column.) It might even be the case that if we implemented a more natural two-dimensional ordered map syntax, it would still be represented internally as a single map with a two part key. (A single map with a composite key is more efficient for lookup and iterating across columns, a nested structure consisting of multiple linked maps might be faster for iterating down columns.) So in summary, the main advantage of the two-dimensional map may just be another form of "syntax sugar", i.e. making the code easier/cleaner/simpler without necessarily expanding the range of capabilities. In terms of implementation, it would mostly consist of expanding the dimx syntax for declaring these maps...
dimx $map, ordmap(varstr;varstr) ! traditional one-dimensional map of string values
dimx $map2n, ordmap2(int;varstr;varstr) ! two-dimensional map of strings with numeric 1st dimension
dimx $map2S, ordmap2(varstr;varstr;varstr) ! two-dimensional map of strings with string 1st dimension
And we need the variations of of the foreach and .key syntax alluded to above. And probably a few other quirks here and there. Definitely non-trivial to implement; definitely complicating the logic involving passing maps to functions; possibly contributing to the likelihood of coding errors related to getting the dimensions mixed up, confusion over when you need one or two arguments to the map references, etc.
Last edited by Jack McGregor; 06 Mar 23 09:26 PM. Reason: Fix typo in $map2S code comment
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35984
03 Mar 23 09:10 PM
|
Joined: Sep 2003
Posts: 4,158
Steve - Caliq
Member
|
Member
Joined: Sep 2003
Posts: 4,158 |
I’m not sure anything will possibly beat millirows right? But it’s a close second. I like your idea ! Not that I can think of a use my end for it (yet) if it’s a toss between moving virtual code about your head while drinking a pint and a bbq go for that over hard work at a desk actually doing it, there again , it could well be useful…. I’ll let you and others ponder on what task this weekend. :-)
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35985
03 Mar 23 11:39 PM
|
Herman Roehm
Unregistered
|
Herman Roehm
Unregistered
|
I doubt I’ll be doing a lot of programming, but can certainly see uses for it. If I do some more for Gregg, I’d use it. Maybe not as much as millirows, but I certainly think it would be a very useful tool.
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35986
04 Mar 23 11:50 AM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
Wow!!! After millirows, I put Functions/Procedures on the same position of Ordemap, but the list of "the best one" is huge we would need a range by year. Now back to the subject in discussion, this would be a must and I would use a lot, lot, lot By now I workaround this using both techiques that you've mentioned, multiple ordmaps one by each "column" and composed keys in a single ordmap. Like you, ordmap for me is like the startup of any program, I idealize all the logic arround how I will fill ordmaps, how many, what kind of sort to prepare everything to load into some xtree. I would imagine something like: dimx $row'and'columns, ordmap(int; varstr; varstr) input csv #channel, $row'and'columns() total'rows = .rows($row'and'columns()) total'columns = .cols($row'and'columns()) This would be very useful and amazing, thanks for bringing it, I'll spend the rest of the day on the beach taking several Caipirinhas while wondering about possible usages for this.
Last edited by Jorge Tavares - UmZero; 04 Mar 23 11:53 AM.
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35987
04 Mar 23 11:59 AM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
I remembered now that I use ordmap to build combo lists, that's practical because I can use the same technique to build <code,description> or <description> lists and get them ordered the way I want in a single step. It would be nice that we could use an ordmap for combos (AUI and XTREE) instead of a string or the function FN'build'combo$($mylist()) Would it be complicated?
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35988
04 Mar 23 05:37 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
Anyone else out there surprised that Jorge likes this idea? Your suggested dimx and input csv syntax was exactly what I was thinking. (Great minds...?) I'm undecided about the advantages of introducing new dot-functions .rows() and .cols(), vs. making them both variations of .extent(), but that's a minor detail. I hadn't really thought about the idea of using an ordmap to specify the contents of a combo, but that seems reasonable. I guess for the simple version (description only), we would use the key of the map and ignore the value? As an aside, an ordmap with only keys and no values, or vice versa (i.e. a 'set') might logically be declared as dimx $set, ordmap(varstr). (If ordmap(int;varstr;varstr) is a two-dimensional ordmap, would that make the set version zero dimensional? Or is that "one dimensional", making our new proposed idea "three dimensional"?) Whatever you call it, I didn't think that it was worth adding special syntactic support for the set case since it is so easy to just represent as an ordmap with values set to "" (not to be confused with .null, which would delete the key from the set). Particularly since if we didn't use the assignment $set(key) = "" to add items to the set, we'd need yet another special add-to-set operator. But back to our new toy, the only obstacle left is for me to figure out how to switch places with you Jorge. I'll take care of the Caipirinhas and you get to work ....!
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35989
06 Mar 23 05:20 PM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
Good morning, still with coffee ... Regarding the combos builder, yes, ordmap with "key+value" solves both flavours for combos, where key=description and value="" for "description only" lists. As for the "zero dimensional" ordmap, I can't see much benefit vs the blank assignment to an one-dimentional ordmap and, like you mentioned, we would need a different synthax for assignments. Also, if we have an one-dimensional w/o value, why not have a two-dimensional w/o value too? In resume, my suggestion is to not change anything on the existing ordmap, just jump into the two-dimensional one
Last edited by Jorge Tavares - UmZero; 06 Mar 23 05:20 PM.
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35990
06 Mar 23 07:24 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
You'll be happy to know that I implemented the ordmap option for the INFLD setdef parameter over the weekend (while you were laying in your hammock sipping Caipirinhas!). So you'll soon be able to just specify your $map() variable in place of the setdef parameter and INFLD will automatically expand it into a list of valid inputs or List of paired inputs, depending on the type codes. I don't think it makes sense for any of the other setdef variations (and the implementation probably could some further error handling and testing to make sure that a mismatch doesn't send it off a cliff). I don't see how that can easily work with XTREE though, since the list specification is embedded in the larger coldef parameter. (I suppose we could create a .ordmap'to'delimited'list$($map()) function to make it easier to insert your map/list into the coldef, but there wouldn't be that much advantage to such an embedded function compared to a user-defined one.) And let's drop the zero-dimension (aka set) discussion for now, and concentrate on the 2D ordmap. At the moment I'm leaning towards calling it ordmap2 to reduce the chance for confusion...
dimx $map, ordmap2(int; varstr; varstr)
As for counting the number of rows and columns, the existing .extent() syntax appears to already handle multiple dimensions, so it makes sense to just stick with that, i.e.
total'rows = .extent($map(),1) ! extent of the first dimension (rows)
total'cols = .extent($map(),2) ! extend of the second dimension (cols)
You could always create a function wrapper if you prefer to call it fn'rows()...
function fn'rows($m() as ordmap(int;varstr;varstr)) as i4
.fn = .extent($m(),1)
endfunction
The syntax for loading a CSV directly into an ordmap2 would simply be: input csv #ch, $csvmap(). The syntax for referencing an individual value would be: $csvmap(row,column$), where row is an integer 1-n, and column$ is the column header/title. The main problems yet to be resolved involve iteration. First, two limitations of ordmaps as a container for a CSV need to be acknowledged: - The column order will always be determined by their titles. The only way to preserve the original column order would be to use column numbers instead of names/titles. But in that case, you might as well just represent it as regular two-dimension array instead of an ordmap.
- The columns have to have unique titles/names. Actually I suppose we could also support an ordmap2m variation, but that would be awkward and unlikely to really be useful, since it would obfuscate the original separation of columns with the same title. Note that this issue would not likely apply to query result sets, nor for that matter, to 'well-formed' CSVs, but if it did, the input csv operation would end up saving only the last of any duplicate column titles.
Assuming you can accept those limitations, the remaining issue is exactly how to iterate. Since the rows are identified by integers, we can just use a regular for/next loop counter for the rows, but we do need to extend our foreach syntax to specify which row we are iterating across. One option would be to just repurpose the existing starting key parameter to mean the row when applied to an ordmap2, i.e.
foreach $$c in $csvmap(row) ! iterate across the columns of specified row
That might invite confusion though, since without knowing whether $csvmap() was an ordmap or an ordmap2, you wouldn't be able to tell from the syntax alone whether (row) was meant to be a starting key or a row number. I guess the other alternative would be new variations of foreach -- foreach'row and foreach'col? And we need an enhanced version of .key() to extract the row and column from the iterator. Again, one possibility would be to just make .key() be context sensitive, i.e. returning the column when iterating across the columns in one row. Or maybe introduce new functions .key1() and .key2() (reminiscent of Thing1 and Thing2 from The Cat in the Hat)? Some possibilities then for iterating through the entire grid might be:
! context-sensitive without new keywords...
for row = 1 to .extent($csvmap()) ! default is 1st dimension, so this is rows
foreach $$c in $csvmap(row) ! here row is 1st dimension of key, not starting key
? row, .key($$c), $$c ! row #, column title, value
next $$c
next row
! using explicit new keywords...
for row = 1 to .extent($csvmap())
foreach'col $$c in $csvmap(row)
? row, .key2($$c), $$c
next $$c
next row
It's tempting to use a nested pair of foreach loops...
foreach'row $$r in $csvmap()
foreach'col $$c in $csvmap()
? $$r, $$c ! ???
next $$c
next $$r
... but that raises the problem of what the $$r (row iterator) value is. The entire row? At one level that makes sense, but if so, then the nested foreach statement should be foreach'col $$c in $$r. There's a certain elegance there, but I'm not sure it's worth all of the complications that it introduces. I think I would prefer that we stick with the idea that the value of an iterator $$i is always a scalar value. But in that case it doesn't make sense at all to iterate through the rows, and you'll have to stick with the for row = 1 to n approach for the outer level. That also means that the only ordmap-style iteration that makes sense in the ordmap2 is across a row, i.e. through the columns. And in that case, we probably don't need .key1() and .key2() functions; the regular .key() would always refer to the column titles, and you'd always know what row you were on since you'd need to specify the row to start the iteration. The only exception would be if you wanted to iterate through the entire grid in one pass, as it was a regular ordmap. In that case, the ability to extract the row # and column name for each value might be useful.
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35991
06 Mar 23 11:24 PM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
Before I open the wine for the dinner and download the "Cat in the Hat" , let me thank you for the INFLD upgrade and comment the mentioned topics. The ordmap usage in XTREE for combos it's not that important because I use already a function there so, maybe it doesn't worth too much effort to invent a different but similar solution. In my opinion, it's easier to forget the 2 in ordmap2 than get into any confusion, I think that ordmap() for all the variants will be fine. Regarding the .extent() variants to get the rows and columns, it's perfect. My idea about the input csv #ch, $csvmap() don't have restrictions, at least those you described because the returned result would be the numeric rows and the alphabetic sequence of letters for the columns. The header, if present, could be returned in $csvmap(0, A), ... $csvmap(0, ZZ) We could inform input csv that we want the header specifying zero in the argument like: input csv #ch, $csvmap(0) For the iteration topic, I would be presumptuous to propose something w/o think better about all the things envolved because what first crosses my mind, influenced by the .extent() logic above, is something like:
foreach'cell $$i in $csvmap()
? $$i ! the cell value
? .key($$i, 1) ! the row number
= .key($$i, 2) ! the column letter
next $$i
I'll elaborate this better on each sip Many thanks
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35992
07 Mar 23 01:46 AM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
I'm not sure if it's wise to get into a debate over ordmap vs ordmap2 as it might distract you from the more important business at hand (wine)! I don't feel that strongly about it, and I'm sure that over the course of a few glasses you could easily convince me, but just to explain my reasoning better: - (Human engineering) When two things are syntactically very similar but semantically different, it's often a good idea to separate them in a couple of ways, lest it becomes too easy to think about one while typing (or reading) the other. (I know you would rather run that risk than have to type an extra character, but we have to think of others.) An example would be instr() vs .instrr() -- the dot messes up the symmetry, but it significantly decreases the chances of mistaking one for the other. In this case since we are only talking about int as the type of the first dimension, arguably that's enough to avoid confusion. But it's conceivable that we'll get to a (varstr; varstr; varstr) version, and maybe even an ordmap(int; varstr). - (Nomenclature) Since the keyword itself often becomes the name used to refer to the topic (for example, we talk about "dimx" arrays rather than describe them as "dynamically allocated arrays"), having a standardized different name may make it easier to document without having to add a qualifier like "two dimensional ordmap". And I think one could argue that this "2D ordmap" is different enough from the other 4 variations of ordmap that the distinction will come up in the documentation. - (Laziness) It makes it slightly easier for the compiler. (Ok, that's a bad excuse - the compiler should always be made to work harder so the developers can work easier.) Let's set that aside, at least until the bottle is empty, and focus on the more functional issues... Regarding the "restrictions", I think we agree here. But I just wanted to make clear that if the columns are indexed by name, then the order of the columns is going to be determined by the collating sequence, and not the original order. For example, if we start with this source...
animal, mineral, Vegetable, fruit, language
dog, gold, rutabaga, apple, english
cat, silver, kale, orange, spanish
squirrel, aluminum, potato, kiwi, french
and read it into an ordmap(2)... and then iterate through the map to write it back out again, the output version would look like this...
Vegetable, animal, fruit, language, mineral
rutabaga, dog, apple, english, gold
kale, cat, orange, spanish, silver
potato, squirrel, kiwi, french, aluminum
( Vegetable is first because upper case sorts before lower case; if we don't like that we might add some kind of flag to fold the keys to upper case, but we've never discussed that with the standard ordmap. And fruit and language sort before mineral.) I don't think this is a major problem, since a large part of the attraction of the ordmap over the array is the ability to index values by their keys rather than by their ordinal position. But it could matter to some people or some applications. One example might be ASQL. If we were to return the entire result set in a two dimensional ordmap (see, it would have been easier to just say "ordmap2" there), then we wouldn't be able to preserve the original order of the result columns, and that might be a slight annoyance to the user (since alphabetic column order isn't necessarily the one that makes the most sense to the user, who probably prefers to have the most important columns first, similar columns in proximity, etc.) The application could overcome that and put the columns in any order by retrieving the cells via explicit $sqlmap(row,colname$) references, but that would be a bit less efficient than iterating. As for whether to include a zero row containing just the headers, perhaps that should be an option. But it seems unnecessary, since you can extract that same information by just iterating across any row...
foreach'col $$c in $csvmap()
? .key($$c,2) ! or .key2($$c)
next $$c
I'm also not sure it makes sense to provide an option to not return the headers, since without headers, we can't build the map. (The headers are the 2nd part of each key.) You would have to supply a separate ordered map associating the column names with the column numbers, and make sure it really did match the data. Seems like the best way to minimize confusion is to just require that the first row of the data set contain the headers. Iterating over the entire map using your foreach'cell makes perfect sense to me. In fact, here I think we could even let you get by with less typing and just use the regular foreach $$i in $csvmap() syntax. Here's a slight variation of your loop and the output when applied to the above example set:
foreach $$i in $csvmap()
? .key($$i, 1);", ";key($$i,2);" --> ";$$i
next $$i
1, Vegetable --> rutabaga
1, animal --> dog
1, fruit --> apple
1, language --> english
1, mineral --> gold
2, Vegetable --> kale
2, animal --> cat
2, fruit --> orange
2, language --> spanish
2, mineral --> silver
3, Vegetable --> potato
3, animal --> squirrel
3, fruit --> kiwi
3, language --> french
3, mineral --> aluminum
I still think it would be more common to iterate across one row at a time though. Going back to the result set example, a report program might start by retrieving the selected data into the map, then it might assemble the cells for one row at a time into some kind of print structure...
< -- get data set into $dataset() -- >
output$ = fn'output'headers$($dataset(),header'mask$) ! output the column headers
for row = 1 to .extent($dataset(),1) ! for each row
output$ = fn'output'row$($dataset(),row, row'mask$)
<inter-row processing -- subtotals, formatting, printing, etc.>
next row
...
function fn'output'headers$($d() as ordmap2(int;varstr;varstr), mask$ as s0) as s0
foreach'col $$c in $d()
.fn += .key($$c,2) + "," ! get the column headers
next $$c
endfunction
function fn'output'row$($d() as ordmap2(int;varstr;varstr), row as b4, mask$ as s0) as s0
foreach'col $$c in $d(row)
.fn += $$c + "," ! get the values
next $$c
endfunction
Or, maybe a report generator would build it's own list of columns in the order it wanted to print them, and then would just pull each cell individually from the map...
dimx $dataset, ordmap2(int; varstr; varstr)
dimx coldata(0), s, 0, auto_extend ! array of field buffers, one per column
dimx colnames(0), s, 0, auto_extend ! array of column names
< -- get data set into $dataset() -- >
< -- construct the colnames() array based on report generation logic and $dataset() -- >
! output the rows, cell by cell
for row = 1 to .extent($dataset(),1) ! for each row
for col = 1 to .extent(colnames())
coldata(col) = $dataset(row,colnames(col))
next col
< print the row>
next row
Well, I've given you enough of a head start on the wine. it's time for me to open my own bottle...
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35993
07 Mar 23 08:33 AM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
Back to the coffee, hope you had a nice wine after such (great) explanations, at least better than mine that didn't happened. In resume, I agree with almost everything, the model that you've already designed to ordmap2 it's an huge upgrade to the existing ordmap (see how easy it is to reference each one ) Anything we can add over it, if you prefer, could be considered for an upgrade after the first release and after playing with the real thing, we all know that new features are very different in our minds than in our hands. The only thing that I'm still reluctant is about the specific case of input csv and the: 1. inability to get the columns in the exact order they are in the file 2. the possibility to loose columns due duplicated headers 3. loose the first row of data in case we are handling a file w/o an header row The only way I can see to avoid these issues is: 1. consider the column letters 2. inform input csv if the first row is an header An alternative to your model, keeping it untouched about everything, is to consider the possibility of using a regular 2D dimx array in input csv but that would bring the need to AUTO_EXTEND the second argument of the array what can be even a bigger effort than the current project. Anyway, what I mean is: 1. input csv #ch, $csvmap() build the keys for the ordmap2 with the rows number and the content of the first row in the file 2. input csv #ch, array'csv() index1=row number, index2=col number With the two options above, I could decide if I want to handle the columns by their names (option 1) or by their sequence in the file (option 2). An alternative to option 1 that would preserve the sequence and guarantee the uniqueness of each column is to add the prefix with the column letters in key2 (eg: $csvmap(1, [A]animals)) Now it's time to enter in the real world and help my customers to deal with the basic problems, like configure a new printer
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35994
07 Mar 23 05:56 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
Ah, so the order of the columns is a problem for you. (I guess by "column letters" you were referring to the Excel-style column identifiers A, B, ..., rather than the column titles. But even those column identifiers don't sort properly, unless we right justify them.) One option would be to prepend the column # to the column names, in which case my example above would become something like this (with a 2-digit column number prepended to the names)...
1, 01:animal --> dog
1, 02:mineral --> gold
1, 03:Vegetable --> rutabaga
1, 04:fruit --> apple
1, 05:language --> english
2, 01:animal --> cat
2, 02:mineral --> silver
2, 03:Vegetable --> kale
2, 04:fruit --> orange
2, 05:language --> spanish
3, 01:animal --> squirrel
3, 02:mineral --> aluminum
3, 03:Vegetable --> potato
3, 04:fruit --> kiwi
3, 05:language --> french
That preserves the order (animal,mineral,Vegetable,fruit,language), but does introduce some complications, namely how to specify such an option, and how to determine a suitable format for the column-number prefixes. Since it isn't necessarily unique to the INPUT CSV operation (it could for example apply to an ASQL query result set), the option probably should be associated with the map declaration rather than the operation that loads it. And in order for the column numbers to sort alphabetically, we need a sufficient number of right-justified digits, but how many? One possibility might be to allow the varstr keyword to include a format specification, something like...
dimx $csvmap, ordmap2(int; ##:varstr; varstr) ! column key prepended by two-digit row # and :
That would allow a lot of flexibility in how the column key gets formatted, but I'm afraid it's too much flexibility (if there is such a thing!). Ironically, it would actually reduce the flexibility inherent in passing ordmaps to functions, since this precise format would have to match. Perhaps the biggest obstacle is that the variable descriptor structure doesn't currently allow for open-ended attributes (like "##:") as part of the variable description in memory. So it would require a massive shakeup. We could however just offer a single format option, maybe one of these...
dimx $csvmap, ordmap2(int; #varstr; varstr) ! column key prepended by column #
dimx $csvmap, ordmap2(int; int+varstr; varstr) ! column key prepended by column #
If you think about it, the same problem already occurs with the first dimension (the row #'s). There, I figured we just have to allow up to 32 bit integers to be safe, even if it's overkill in most cases. We could adopt a similar compromise with the column #'s and treat them all as 16 bit integers (supporting up to 65536 columns). Another consideration here is that this kind of automatic key formatting would only be practical in a context like loading a CSV or result set where the load operation knows the original column order. So it wouldn't make sense for ordinary ordmaps, adding another degree of separation between the ordmap2 and ordmap, although it now makes me wonder whether gridmap might be a better name. Yet another problem is that regular assignment statements, e.g. $gridmap(row, colkey$) = value$ would not have any way to know what number to prepend to the colkey$. You'd have to be responsible for formatting the colkey$ yourself, which creates a bad dependency internal implementation details and application details. Or, maybe the gridmap is really a 3-D ordmap and should be declared as:
dimx $csvmap, ordmap2(int; int; varstr; varstr) ! row#, col#, key$, value
In that case, assignment would require 3 arguments:
$csvmap(rowno, colno, colkey$) = cellvalue
This is starting to get overwhelming! Or, as you suggest, if you really care about the column numbers, then maybe a regular two-dimensional array makes more sense.,,,
dimx ary(0,0), s, 0, auto_extend
input csv #ch, ary()
There are complications here too though. One is that we already have a input csv #ch, ary() statement, but it is meant to only input one row at a time. (Conveniently, this allows you the flexibility to handle the header row separately.) While it's true that the runtime can determine whether the specified array has one or two dimensions and act accordingly, this is another example of a situation where it's far too easy to misinterpret the code when reading it later. Another problem is that auto_extend only works for the first dimension. We could overcome that within the input csv routine by first counting the columns in the first row, and then internally using redimx to set the 2nd dimension. But it would mean that the returned array would have uniform-extent rows even if the original data source did not. (Any longer rows would be truncated, while shorter rows would be filled with null elements.) It's too early for wine. I think I'll have another cup of coffee and try to think of something else for awhile!
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#35996
07 Mar 23 08:34 PM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
In fact I really mean the Excel-style column identifiers A, B, ..., for the prefix but you're right, we must add two spaces [ A], [ B], ... [ AA],...,[XFD] if we want to support the maximum range of columns in Excel. And I insist in the possibility of a file w/o a row with headers, how would we handle those without these identifiers? Note that, I'm talking about these prefixes exclusively for the input csv case, when an ordmap2 is in place for that purpose, the column identifiers are prefixed to the key. But I agree that, even in input csv, there are cases where the column order is not important and the prefixes should not be there so, we should have an option to inform input csv to not include them. I would not go for complex syntaxes to allow many variations of this (maybe one day I will regret for this), taking your example I could suggest ordmap2(int; prefheader; varstr) for this specific case and ordmap2(int; varsr; varstr) for the normal case.
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36003
14 Mar 23 11:46 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
It's time for a status report on this little project. It's not quite ready to let out of the lab, but here's what I've got so far: two variations of ordmap2...
dimx $map2i, ordmap2(int; int; varstr) ! numeric row & columm
dimx $map2s, ordmap2(int; varstr; varstr) ! numeric row, string column titles
Both variations use integers for the first dimension (rows). It would be possible to add a string version, but for the kinds of grids we are talking about (CSV files, spreadsheets, query result sets), rows typically are identified by number. The columns on the other hand, could be indexed numerically, but are more likely to have alphanumeric identifiers (column or field names). Loading the Map - You can do it one cell at at time, just like you do currently, only now you need to specify two indices:
$map2i(rowno, colno) = value$
$map2s(rowno, colname$) = value$
You can also load an entire CSV file into one of these maps using input csv #ch, $map(). Depending on whether the map was declared as (int; int; varstr) or (int; varstr; varstr), the result will either be indexed by row and column numbers, or by row numbers with column names (in which case it will assume that the first row contains the column names.) For the (int; varstr; varstr) version though, I'm not (yet) convinced that we need to have embedded support for Excel-style column identifiers (A-Z, AA-ZZ, AAA-ZZZ, etc.) as an alternative to names. I would say that those Excel-style column identifiers are really just numbers in Base26, so might as well use the (int; int; varstr) variation, along with a pair of conversion functions, e.g. Fn'Dec'to'Base26() or Fn'Base26'to'Dec(). At some point it might make sense to internalize support for variations of the int key type (radix, maximum range, etc.). But to avoid getting bogged down in minutiae, I think we should for now we should just start with plain old decimal integer representation, probably with a range of something like +/- 10 million rows, and +/- 100K columns. (It's not likely we would use negative row or column identifiers, but that's one advantage of the ordered map over the array: the indices are internally just alphanumeric keys and so are not limited in the way that array indices are.) Iteration: I'm taking your advice here, thinking maybe we can get away without needing special foreach'row and foreach'col iteration types. Instead, if you want to iterate through the grid, just go through the entire thing, using the upgraded .key($$I,1) and .key($$I,2) functions to access the two key dimensions...
dimx $map2i, ordmap2(int; int; varstr)
...
input csv #ch, $map2i() ! load CSV file into the map
foreach $$i in $map2i() ! iterate entire map, working across each row and then down
? "Row #"; .key($$i; 1); ", Col #"; .key($$i, 2);" -> "; $$i
next $$i
Extent: We previously discussed implementing a means of getting the number of rows and columns in the map, but I'm not sure it's really necessary if we aren't going to have special row and column iterators. I also suggested that for the rows (and also columns, if numeric), we might as well just use the traditional for row = 1 to x method. But the problem with that method is that for the ordmap, extent is not necessarily the same as the maximum row number (unlike the traditional array). It would be if we loaded the map using the input csv method, but in general, you could build this kind of map with arbitrary row (and column) index values. For example, imagine a grid where the rows are students, indexed by student #, and the columns are classes. The student #'s might be, say, 8 digit numbers, but the school has only a few thousands students. So the extent of the first dimension would be only a few thousand, but the maximum number might be 12345678. We avoid all of that confusion by just not trying to support separate by-dimension extents and iteration. So, the $64 question: is this enough functionality to make the ordmap2() useful enough to release?
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36004
15 Mar 23 08:29 AM
|
Joined: Sep 2003
Posts: 4,158
Steve - Caliq
Member
|
Member
Joined: Sep 2003
Posts: 4,158 |
Sure does, the CSV example is great!
dimx $map2i, ordmap2(int; int; varstr)
input csv #ch, $map2i() ! load CSV file into the map
foreach $$i in $map2i() ! iterate entire map, working across each row and then down
? "Row #"; .key($$i; 1); ", Col #"; .key($$i, 2);" -> "; $$i
next $$i
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36005
15 Mar 23 08:35 AM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
Definitely YES, it's probably all that we need for a long time
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36006
15 Mar 23 04:53 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
Ok then! I need to do some further cleanup, testing, etc. but should have a beta version ready by tomorrow. In the meantime, here's your opportunity to voite on the best name for this storage type... - ordmap2 - does a reasonable job of reflecting the fact that it is a variation of ordered map, and that it has two dimensions. (And if we ever decide we need a multi-map variety, we'd have to decide between ordmap2m and ordmapm2.)
- ordmap - avoids introducing a new keyword, but might introduce some ambiguity or confusion into any mention of the term without clarifying whether we are talking about the original one-dimension or new two-dimensional variety.
- gridmap - one less syllable than ("ordmap two"), and does a better job of evoking a mental image of its purpose; but might obscure its similarity to ordmap.
- _____________ (write-in candidate)
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36013
15 Mar 23 06:09 PM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36016
15 Mar 23 06:26 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
gridmap is off to an early lead, but the polls are still open. And then we may have several days of counting the mail-in ballots (inevitably followed by challenges, accusations, lawsuits). So stay calm everyone.
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36036
18 Mar 23 12:33 PM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
Some hidden force, push me into the ashnotes, made me read everything and leave for today the download, in the opposite of the usual
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36062
26 Mar 23 10:51 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
Last edited by Jack McGregor; 26 Mar 23 10:57 PM.
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36063
27 Mar 23 01:51 PM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
I was out for a week in the paradise, alone with Viviane (important information ), in "rescue mode" for professional subjects and no room for programming Now, back to reality, it's time to put my hands on this beauty, Thank you
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36065
27 Mar 23 06:04 PM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
Hi again, I tried this in a program that import customer orders from XLS files, in resume, most of the work was processed in a few lines of (very simple) code using this new feature, like magic, everything working smoothly! Now it's time to remove a lot of old-fashioned code from that function. Many thanks for this.
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36067
27 Mar 23 08:16 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
Sounds like you've been working hard! On the XLS to gridmap conversion, I'm curious whether you first converted the XLS to CSV and then use INPUT CSV #CH $GRIDMAP()? Or used the LIBXL functions to load the grid one celll at a time? (I have it on my to-do list to create a Fn'LIBXL'Import'To'Gridmap() function to import directly from XLS or XLSX into the gridmap, along with a variation of the ASQL SQLOP_FETCH_ROW operation to fetch an entire result set in one call.)
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36068
27 Mar 23 08:37 PM
|
Joined: Jun 2001
Posts: 3,406
Jorge Tavares - UmZero
Member
|
Member
Joined: Jun 2001
Posts: 3,406 |
Curiously, today I'm feeling very efficient and productive, no doubt that relaxing and enjoying life is far better than fighting with time to get things done, just let it flow Now that you mention it, I didn't realize that I've converted the XLSX to CSV using my old "VB netcaller" which still handles a few things in Excel. I'm anxyous to get rid of this old module so, it's time to implement this XLSX/CSV converter, but if you intend to do it, surely, it will be far better than any function I could write, just let me know if I should wait or go ahead, it's not urgent. As an aside, another function implemented in the VB Netcaller is to list the existing pages in the Excel Workbook, it would be interesting to cover this under this topic, also. Thanks
Jorge Tavares
UmZero - SoftwareHouse Brasil/Portugal
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36140
16 May 23 01:20 PM
|
Joined: Sep 2003
Posts: 4,158
Steve - Caliq
Member
|
Member
Joined: Sep 2003
Posts: 4,158 |
Easy peesy question, is this compiler only or I would need to also update the run time ashell at customers too?
============================================================================
A-Shell Release Notes Version 6.5.1728.0 (16 March 2023)
============================================================================
1. Language enhancement -- new storage class GRIDMAP is a two-dimensional
variation of ORDMAP with a two-part key (row and column), suitable for
handling datasets consisting of rows and columns.
- Declaration:
dimx $map, gridmap(int; varstr; varstr) ! num row, str col, str value
dimx $map, gridmap(int; int; varstr) ! num row and col, str value
I could use this in something im doing now, but may hold back if i need update the Customer Ashell Version.
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36141
16 May 23 01:30 PM
|
Joined: Sep 2003
Posts: 4,158
Steve - Caliq
Member
|
Member
Joined: Sep 2003
Posts: 4,158 |
actually im more after the two-dimensional part of the gridmap..
so i can store: ITEM (key), Month1 value, Month2 value, Month3 value, Month4 value, Month5 value, Month6 value
dimx $items (varstr; int; float) ! Item , month , value
Last edited by Steve - Caliq; 16 May 23 01:37 PM.
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36142
16 May 23 02:29 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
Sorry for lack of clarity there, but you definitely need an A-Shell update. (VERSYS on the run will give you the exact runtime edit required, which currently stands at 6.5.1728.)
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36143
16 May 23 02:32 PM
|
Joined: Sep 2003
Posts: 4,158
Steve - Caliq
Member
|
Member
Joined: Sep 2003
Posts: 4,158 |
Thanks, I'll hold back then as they running 6.5.1654.1 and its not worth the extra over-head of updating them. (yet).
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36144
16 May 23 02:39 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
As for your $items gridmap, currently the only two variants are:
gridmap(int; int; varstr) ! integer row, integer column, string value
gridmap(int; varstr; varstr) ! integer row, string column, string value
You could probably just swap your item and month in the key...
dimx $items gridmap(int; varstr; varstr) ! month, item, value
... but the value would have to be a string. The original motivation behind gridmap was to represent objects like spreadsheets or result sets, where the individual cell values would naturally have some kind of string representation. And the row part of the index would naturally be integer, whereas the column part of the index might be numeric or it might be a string, e.g. column names. But perhaps there might be good reason to expand it out to allow for varx values so you could store arbitrary objects in the grid.
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36145
16 May 23 02:41 PM
|
Joined: Sep 2003
Posts: 4,158
Steve - Caliq
Member
|
Member
Joined: Sep 2003
Posts: 4,158 |
I'll continue using my own ordmap two-dimensional feature...
DIMX $ITEM'POINTER, ordmap (varstr; varstr) ! item=array position DIMX ITEM'ARRAY(500,6),F,6,AUTO_EXTEND ! array position ,month values
Last edited by Steve - Caliq; 16 May 23 02:42 PM.
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36146
16 May 23 02:44 PM
|
Joined: Sep 2003
Posts: 4,158
Steve - Caliq
Member
|
Member
Joined: Sep 2003
Posts: 4,158 |
Posts crossed, OK Thanks, one for the be nice but not totally necessary list please.
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36147
16 May 23 02:54 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
Although there may be multiple ways to skin a cat, being a messy job, it's always nice to have to just the right tool!
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36148
16 May 23 03:02 PM
|
Joined: Sep 2003
Posts: 4,158
Steve - Caliq
Member
|
Member
Joined: Sep 2003
Posts: 4,158 |
Half a tool is better than no tool
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36149
16 May 23 03:12 PM
|
Joined: Sep 2003
Posts: 4,158
Steve - Caliq
Member
|
Member
Joined: Sep 2003
Posts: 4,158 |
slight diversion from topic but is there a feature i missed that sorts an ORDMAP Key sequence .
DIMX $ITEM'POINTER, ordmap (varstr; varstr) ! item=array position
but after populating i wanted to sort it first before I read down it in item sequence.
foreach $$i in $ITEM'POINTER() XCALL MADEVT,2,"VALUE="+$$i+" KEY="+.key($$i) next $$i
|
|
|
Re: Two Dimensional Ordered Maps?
[Re: Jack McGregor]
#36150
16 May 23 04:05 PM
|
Joined: Jun 2001
Posts: 11,794
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,794 |
Hmmm.... Sorting the ordmap (or gridmap) key sequence would be like sorting an ISAM index. It's already sorted in the only way that makes sense for it. If you re-sorted it, it wouldn't be able to find anything in its own index. So the key here would be to format the keys so that they are in the desired sort order but can still be used randomly. Usually that's just a matter of right justifying the parts of the key. For example...
map1 kx,b,4
dimx $items, ordmap(varstr; varx)
$items(key) = some'value ! index not in numeric order
$items(key using "#####") = some'value ! index now in order
...
foreach $$i in $items()
! items now in numerical order
Gridmap, with it's integer key(s) avoids this problem, but you can otherwise handle it yourself. In a similar vein but with different motivation you may want to create multi-part keys, e.g...
map key$,s,15 ! cust # + item #
dimx $custprices, ordmap(varstr;varx)
key$ = (custno using "#####") + str(itemno)
$custprices(key$) = special'price
...
foreach $$i in $custprices()
! items now in order of item# within cust#
Or did I completely misunderstand the question?
|
|
|
|
|