Previous Thread
Next Thread
Print Thread
Page 1 of 2 1 2
Two Dimensional Ordered Maps? #35982 03 Mar 23 08:17 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
I'm putting out an idea here for a possible enhancement to the existing Ordered Map feature to get a sense of the degree of interest and any other comments/suggestions. I'm not sure about the rest of you, but for me, ordered maps are possibly the best addition to ASB since ... named parameters? dynamic structures? millirows? ( laugh ) Seriously, just about any data processing task I undertake these days starts with building one or more ordered maps (typically of structures) and then manipulating them in memory. The days of one-record-at-a-time logic, particularly with SQL, but even with ISAM or other indexed files, are rapidly receding into the rear view mirror. With the massive amount of memory available today, it's almost always more efficient to assemble all the data relevant to a problem in memory (typically in ordered maps or other arrays) and work on it there.

But, one recurring annoyance is how to deal with two-dimensional grids (e.g. spreadsheets, or query result sets) where you don't know in advance the columns. If you know the column layout, you can define a structure to represent one row and then load the entire data set into an array of those structures. But what do you do if the columns aren't known in advance? One option would be to first input the column information, then create a dynamic structure to represent it. But in many cases that seems like a little too much effort. What would seem better would be the ability to just load the entire data grid into a two-dimensional ordered map, indexed by row and keyed by column. That's basically equivalent to a two-dimensional array, except that here we can use alphanumeric indexing for the columns (and maybe even the rows, although that's probably less of an issue).

For example, you could load an arbitrary CSV (assuming the first row contained the column names) into such a map and then directly access any one value via $map(row,column'name). You could iterate across rows using something like foreach $$i in $map(row,""), or possibly down columns using foreach $$i in $map("",column'name)

To take it a step further, the rows might also be alphanumerically keyed. For example, consider a set of student records. For each student, we'd have a variable list of classes taken and grades received. The student/rows would be indexed by student ID and the columns by class ID. $strec(student'id$,class'id$) would directly tell you whether the student had taken that class and if so, what grade was received. foreach $$i in $strec(student'id$,"") would allow you to list all of the classes/grades for that student. foreach $$i in $strec("",class'id$) would allow you to list all the students that have taken that class. (We'd probably need a new variation of the .key($$i) function, perhaps .key1($$i) for the student'id$ (first key dimension) and .key2($$i) for the class'id$ (second dimension)?

You can handle the direct lookup aspect with the existing one-dimensional ordered maps by creating a composite key for your map, e.g. student'id$ + class'id$. (I do this a lot, often concatenating several fields to make a composite key for temporary indexing of some dataset; two-dimensional maps won't eliminate the utility of that technique.) But it's a little awkward to iterate across rows or columns with composite keys. (Basically you have to embed some filtering logic inside your foreach loop to skip over the items outside your virtual row or column.) It might even be the case that if we implemented a more natural two-dimensional ordered map syntax, it would still be represented internally as a single map with a two part key. (A single map with a composite key is more efficient for lookup and iterating across columns, a nested structure consisting of multiple linked maps might be faster for iterating down columns.)

So in summary, the main advantage of the two-dimensional map may just be another form of "syntax sugar", i.e. making the code easier/cleaner/simpler without necessarily expanding the range of capabilities. In terms of implementation, it would mostly consist of expanding the dimx syntax for declaring these maps...
Code
dimx $map, ordmap(varstr;varstr)             ! traditional one-dimensional map of string values
dimx $map2n, ordmap2(int;varstr;varstr)      ! two-dimensional map of strings with numeric 1st dimension
dimx $map2S, ordmap2(varstr;varstr;varstr)   ! two-dimensional map of strings with string 1st dimension

And we need the variations of of the foreach and .key syntax alluded to above. And probably a few other quirks here and there. Definitely non-trivial to implement; definitely complicating the logic involving passing maps to functions; possibly contributing to the likelihood of coding errors related to getting the dimensions mixed up, confusion over when you need one or two arguments to the map references, etc.

Last edited by Jack McGregor; 06 Mar 23 09:26 PM. Reason: Fix typo in $map2S code comment
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35984 03 Mar 23 09:10 PM
Joined: Sep 2003
Posts: 4,158
Steve - Caliq Offline
Member
Offline
Member
Joined: Sep 2003
Posts: 4,158
I’m not sure anything will possibly beat millirows right? But it’s a close second.
I like your idea ! Not that I can think of a use my end for it (yet) if it’s a toss between moving virtual code about your head while drinking a pint and a bbq go for that over hard work at a desk actually doing it, there again , it could well be useful….
I’ll let you and others ponder on what task this weekend. :-)

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35985 03 Mar 23 11:39 PM
H
Herman Roehm
Unregistered
Herman Roehm
Unregistered
H
I doubt I’ll be doing a lot of programming, but can certainly see uses for it. If I do some more for Gregg, I’d use it. Maybe not as much as millirows, but I certainly think it would be a very useful tool.

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35986 04 Mar 23 11:50 AM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
Wow!!! cool
After millirows, I put Functions/Procedures on the same position of Ordemap, but the list of "the best one" is huge we would need a range by year.
Now back to the subject in discussion, this would be a must and I would use a lot, lot, lot
By now I workaround this using both techiques that you've mentioned, multiple ordmaps one by each "column" and composed keys in a single ordmap.
Like you, ordmap for me is like the startup of any program, I idealize all the logic arround how I will fill ordmaps, how many, what kind of sort to prepare everything to load into some xtree.

I would imagine something like:
dimx $row'and'columns, ordmap(int; varstr; varstr)
input csv #channel, $row'and'columns()
total'rows = .rows($row'and'columns())
total'columns = .cols($row'and'columns())

This would be very useful and amazing, thanks for bringing it, I'll spend the rest of the day on the beach taking several Caipirinhas while wondering about possible usages for this. wink


Last edited by Jorge Tavares - UmZero; 04 Mar 23 11:53 AM.

Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35987 04 Mar 23 11:59 AM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
I remembered now that I use ordmap to build combo lists, that's practical because I can use the same technique to build <code,description> or <description> lists and get them ordered the way I want in a single step.
It would be nice that we could use an ordmap for combos (AUI and XTREE) instead of a string or the function FN'build'combo$($mylist())
Would it be complicated?


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35988 04 Mar 23 05:37 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
Anyone else out there surprised that Jorge likes this idea? laugh

Your suggested dimx and input csv syntax was exactly what I was thinking. (Great minds...?) I'm undecided about the advantages of introducing new dot-functions .rows() and .cols(), vs. making them both variations of .extent(), but that's a minor detail.

I hadn't really thought about the idea of using an ordmap to specify the contents of a combo, but that seems reasonable. I guess for the simple version (description only), we would use the key of the map and ignore the value?

As an aside, an ordmap with only keys and no values, or vice versa (i.e. a 'set') might logically be declared as dimx $set, ordmap(varstr). (If ordmap(int;varstr;varstr) is a two-dimensional ordmap, would that make the set version zero dimensional? Or is that "one dimensional", making our new proposed idea "three dimensional"?) Whatever you call it, I didn't think that it was worth adding special syntactic support for the set case since it is so easy to just represent as an ordmap with values set to "" (not to be confused with .null, which would delete the key from the set). Particularly since if we didn't use the assignment $set(key) = "" to add items to the set, we'd need yet another special add-to-set operator.

But back to our new toy, the only obstacle left is for me to figure out how to switch places with you Jorge. I'll take care of the Caipirinhas and you get to work ....! cool

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35989 06 Mar 23 05:20 PM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
Good morning, still with coffee ...

Regarding the combos builder, yes, ordmap with "key+value" solves both flavours for combos, where key=description and value="" for "description only" lists.

As for the "zero dimensional" ordmap, I can't see much benefit vs the blank assignment to an one-dimentional ordmap and, like you mentioned, we would need a different synthax for assignments.
Also, if we have an one-dimensional w/o value, why not have a two-dimensional w/o value too?
In resume, my suggestion is to not change anything on the existing ordmap, just jump into the two-dimensional one wink


Last edited by Jorge Tavares - UmZero; 06 Mar 23 05:20 PM.

Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35990 06 Mar 23 07:24 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
You'll be happy to know that I implemented the ordmap option for the INFLD setdef parameter over the weekend (while you were laying in your hammock sipping Caipirinhas!). So you'll soon be able to just specify your $map() variable in place of the setdef parameter and INFLD will automatically expand it into a list of valid inputs or List of paired inputs, depending on the type codes. I don't think it makes sense for any of the other setdef variations (and the implementation probably could some further error handling and testing to make sure that a mismatch doesn't send it off a cliff).

I don't see how that can easily work with XTREE though, since the list specification is embedded in the larger coldef parameter. (I suppose we could create a .ordmap'to'delimited'list$($map()) function to make it easier to insert your map/list into the coldef, but there wouldn't be that much advantage to such an embedded function compared to a user-defined one.)

And let's drop the zero-dimension (aka set) discussion for now, and concentrate on the 2D ordmap.

At the moment I'm leaning towards calling it ordmap2 to reduce the chance for confusion...
Code
dimx $map, ordmap2(int; varstr; varstr)

As for counting the number of rows and columns, the existing .extent() syntax appears to already handle multiple dimensions, so it makes sense to just stick with that, i.e.
Code
total'rows = .extent($map(),1)   ! extent of the first dimension (rows)
total'cols = .extent($map(),2)   ! extend of the second dimension (cols)

You could always create a function wrapper if you prefer to call it fn'rows()...
Code
function fn'rows($m() as ordmap(int;varstr;varstr)) as i4
    .fn = .extent($m(),1)
endfunction

The syntax for loading a CSV directly into an ordmap2 would simply be: input csv #ch, $csvmap().

The syntax for referencing an individual value would be: $csvmap(row,column$), where row is an integer 1-n, and column$ is the column header/title.

The main problems yet to be resolved involve iteration. First, two limitations of ordmaps as a container for a CSV need to be acknowledged:
  • The column order will always be determined by their titles. The only way to preserve the original column order would be to use column numbers instead of names/titles. But in that case, you might as well just represent it as regular two-dimension array instead of an ordmap.
  • The columns have to have unique titles/names. Actually I suppose we could also support an ordmap2m variation, but that would be awkward and unlikely to really be useful, since it would obfuscate the original separation of columns with the same title. Note that this issue would not likely apply to query result sets, nor for that matter, to 'well-formed' CSVs, but if it did, the input csv operation would end up saving only the last of any duplicate column titles.


Assuming you can accept those limitations, the remaining issue is exactly how to iterate. Since the rows are identified by integers, we can just use a regular for/next loop counter for the rows, but we do need to extend our foreach syntax to specify which row we are iterating across. One option would be to just repurpose the existing starting key parameter to mean the row when applied to an ordmap2, i.e.
Code
foreach $$c in $csvmap(row)   ! iterate across the columns of specified row

That might invite confusion though, since without knowing whether $csvmap() was an ordmap or an ordmap2, you wouldn't be able to tell from the syntax alone whether (row) was meant to be a starting key or a row number. I guess the other alternative would be new variations of foreach -- foreach'row and foreach'col?

And we need an enhanced version of .key() to extract the row and column from the iterator. Again, one possibility would be to just make .key() be context sensitive, i.e. returning the column when iterating across the columns in one row. Or maybe introduce new functions .key1() and .key2() (reminiscent of Thing1 and Thing2 from The Cat in the Hat)?

Some possibilities then for iterating through the entire grid might be:
Code
! context-sensitive without new keywords...
for row = 1 to .extent($csvmap())   ! default is 1st dimension, so this is rows
    foreach $$c in $csvmap(row)     ! here row is 1st dimension of key, not starting key
        ? row, .key($$c), $$c       ! row #, column title, value
    next $$c
next row

! using explicit new keywords...
for row = 1 to .extent($csvmap())   
    foreach'col $$c in $csvmap(row)     
        ? row, .key2($$c), $$c       
    next $$c
next row

It's tempting to use a nested pair of foreach loops...
Code
foreach'row $$r in $csvmap()
    foreach'col $$c in $csvmap()
        ? $$r, $$c         ! ???
    next $$c
next $$r

... but that raises the problem of what the $$r (row iterator) value is. The entire row? At one level that makes sense, but if so, then the nested foreach statement should be foreach'col $$c in $$r. There's a certain elegance there, but I'm not sure it's worth all of the complications that it introduces. I think I would prefer that we stick with the idea that the value of an iterator $$i is always a scalar value. But in that case it doesn't make sense at all to iterate through the rows, and you'll have to stick with the for row = 1 to n approach for the outer level.

That also means that the only ordmap-style iteration that makes sense in the ordmap2 is across a row, i.e. through the columns. And in that case, we probably don't need .key1() and .key2() functions; the regular .key() would always refer to the column titles, and you'd always know what row you were on since you'd need to specify the row to start the iteration. The only exception would be if you wanted to iterate through the entire grid in one pass, as it was a regular ordmap. In that case, the ability to extract the row # and column name for each value might be useful.

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35991 06 Mar 23 11:24 PM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
Before I open the wine for the dinner and download the "Cat in the Hat" , let me thank you for the INFLD upgrade and comment the mentioned topics.

The ordmap usage in XTREE for combos it's not that important because I use already a function there so, maybe it doesn't worth too much effort to invent a different but similar solution.

In my opinion, it's easier to forget the 2 in ordmap2 than get into any confusion, I think that ordmap() for all the variants will be fine.
Regarding the .extent() variants to get the rows and columns, it's perfect.

My idea about the input csv #ch, $csvmap() don't have restrictions, at least those you described because the returned result would be the numeric rows and the alphabetic sequence of letters for the columns.
The header, if present, could be returned in $csvmap(0, A), ... $csvmap(0, ZZ)
We could inform input csv that we want the header specifying zero in the argument like: input csv #ch, $csvmap(0)
For the iteration topic, I would be presumptuous to propose something w/o think better about all the things envolved because what first crosses my mind, influenced by the .extent() logic above, is something like:

Code
foreach'cell $$i in $csvmap()
? $$i                     ! the cell value
? .key($$i, 1)        ! the row number
= .key($$i, 2)        ! the column letter
next $$i


I'll elaborate this better on each sip cool

Many thanks


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35992 07 Mar 23 01:46 AM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
I'm not sure if it's wise to get into a debate over ordmap vs ordmap2 as it might distract you from the more important business at hand (wine)! I don't feel that strongly about it, and I'm sure that over the course of a few glasses you could easily convince me, but just to explain my reasoning better:

- (Human engineering) When two things are syntactically very similar but semantically different, it's often a good idea to separate them in a couple of ways, lest it becomes too easy to think about one while typing (or reading) the other. (I know you would rather run that risk than have to type an extra character, but we have to think of others.) An example would be instr() vs .instrr() -- the dot messes up the symmetry, but it significantly decreases the chances of mistaking one for the other. In this case since we are only talking about int as the type of the first dimension, arguably that's enough to avoid confusion. But it's conceivable that we'll get to a (varstr; varstr; varstr) version, and maybe even an ordmap(int; varstr).

- (Nomenclature) Since the keyword itself often becomes the name used to refer to the topic (for example, we talk about "dimx" arrays rather than describe them as "dynamically allocated arrays"), having a standardized different name may make it easier to document without having to add a qualifier like "two dimensional ordmap". And I think one could argue that this "2D ordmap" is different enough from the other 4 variations of ordmap that the distinction will come up in the documentation.

- (Laziness) It makes it slightly easier for the compiler. (Ok, that's a bad excuse - the compiler should always be made to work harder so the developers can work easier.)

Let's set that aside, at least until the bottle is empty, and focus on the more functional issues...

Regarding the "restrictions", I think we agree here. But I just wanted to make clear that if the columns are indexed by name, then the order of the columns is going to be determined by the collating sequence, and not the original order. For example, if we start with this source...
Code
animal, mineral, Vegetable, fruit, language
dog, gold, rutabaga, apple, english
cat, silver, kale, orange, spanish
squirrel, aluminum, potato, kiwi, french

and read it into an ordmap(2)...
Code
input #ch, $csvmap()

and then iterate through the map to write it back out again, the output version would look like this...
Code
Vegetable, animal, fruit, language, mineral
rutabaga, dog, apple, english, gold
kale, cat, orange, spanish, silver
potato, squirrel, kiwi, french, aluminum

(Vegetable is first because upper case sorts before lower case; if we don't like that we might add some kind of flag to fold the keys to upper case, but we've never discussed that with the standard ordmap. And fruit and language sort before mineral.)

I don't think this is a major problem, since a large part of the attraction of the ordmap over the array is the ability to index values by their keys rather than by their ordinal position. But it could matter to some people or some applications. One example might be ASQL. If we were to return the entire result set in a two dimensional ordmap (see, it would have been easier to just say "ordmap2" there), then we wouldn't be able to preserve the original order of the result columns, and that might be a slight annoyance to the user (since alphabetic column order isn't necessarily the one that makes the most sense to the user, who probably prefers to have the most important columns first, similar columns in proximity, etc.) The application could overcome that and put the columns in any order by retrieving the cells via explicit $sqlmap(row,colname$) references, but that would be a bit less efficient than iterating.

As for whether to include a zero row containing just the headers, perhaps that should be an option. But it seems unnecessary, since you can extract that same information by just iterating across any row...
Code
foreach'col $$c in $csvmap()
    ? .key($$c,2)        ! or .key2($$c)
next $$c

I'm also not sure it makes sense to provide an option to not return the headers, since without headers, we can't build the map. (The headers are the 2nd part of each key.) You would have to supply a separate ordered map associating the column names with the column numbers, and make sure it really did match the data. Seems like the best way to minimize confusion is to just require that the first row of the data set contain the headers.

Iterating over the entire map using your foreach'cell makes perfect sense to me. In fact, here I think we could even let you get by with less typing and just use the regular foreach $$i in $csvmap() syntax. Here's a slight variation of your loop and the output when applied to the above example set:
Code
foreach $$i in $csvmap()
    ? .key($$i, 1);", ";key($$i,2);" --> ";$$i
next $$i

Code
1, Vegetable --> rutabaga
1, animal --> dog
1, fruit --> apple
1, language --> english
1, mineral --> gold
2, Vegetable --> kale
2, animal --> cat
2, fruit --> orange
2, language --> spanish
2, mineral --> silver
3, Vegetable --> potato
3, animal --> squirrel
3, fruit --> kiwi
3, language --> french
3, mineral --> aluminum

I still think it would be more common to iterate across one row at a time though. Going back to the result set example, a report program might start by retrieving the selected data into the map, then it might assemble the cells for one row at a time into some kind of print structure...
Code
< -- get data set into $dataset() -- >

output$ = fn'output'headers$($dataset(),header'mask$)   ! output the column headers
for row = 1 to .extent($dataset(),1)  ! for each row
    output$ = fn'output'row$($dataset(),row, row'mask$)
    <inter-row processing -- subtotals, formatting, printing, etc.>
next row
...
function fn'output'headers$($d() as ordmap2(int;varstr;varstr), mask$ as s0) as s0
    foreach'col $$c in $d()
        .fn += .key($$c,2) + ","   ! get the column headers
    next $$c
endfunction

function fn'output'row$($d() as ordmap2(int;varstr;varstr), row as b4, mask$ as s0) as s0
    foreach'col $$c in $d(row)
         .fn += $$c + ","          ! get the values
    next $$c
endfunction

Or, maybe a report generator would build it's own list of columns in the order it wanted to print them, and then would just pull each cell individually from the map...
Code
dimx $dataset, ordmap2(int; varstr; varstr)   
dimx coldata(0), s, 0, auto_extend     ! array of field buffers, one per column
dimx colnames(0), s, 0, auto_extend  ! array of column names

< -- get data set into $dataset() -- >
< -- construct the colnames() array based on report generation logic and $dataset() -- >
! output the rows, cell by cell
for row = 1 to .extent($dataset(),1)  ! for each row
    for col = 1 to .extent(colnames())
      coldata(col) = $dataset(row,colnames(col))
    next col
    < print the row>
next row


Well, I've given you enough of a head start on the wine. it's time for me to open my own bottle...

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35993 07 Mar 23 08:33 AM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
Back to the coffee, hope you had a nice wine after such (great) explanations, at least better than mine that didn't happened.

In resume, I agree with almost everything, the model that you've already designed to ordmap2 it's an huge upgrade to the existing ordmap (see how easy it is to reference each one cool )
Anything we can add over it, if you prefer, could be considered for an upgrade after the first release and after playing with the real thing, we all know that new features are very different in our minds than in our hands.

The only thing that I'm still reluctant is about the specific case of input csv and the:
1. inability to get the columns in the exact order they are in the file
2. the possibility to loose columns due duplicated headers
3. loose the first row of data in case we are handling a file w/o an header row

The only way I can see to avoid these issues is:
1. consider the column letters
2. inform input csv if the first row is an header

An alternative to your model, keeping it untouched about everything, is to consider the possibility of using a regular 2D dimx array in input csv but that would bring the need to AUTO_EXTEND the second argument of the array what can be even a bigger effort than the current project.
Anyway, what I mean is:
1. input csv #ch, $csvmap()
build the keys for the ordmap2 with the rows number and the content of the first row in the file

2. input csv #ch, array'csv()
index1=row number, index2=col number

With the two options above, I could decide if I want to handle the columns by their names (option 1) or by their sequence in the file (option 2).
An alternative to option 1 that would preserve the sequence and guarantee the uniqueness of each column is to add the prefix with the column letters in key2 (eg: $csvmap(1, [A]animals))

Now it's time to enter in the real world and help my customers to deal with the basic problems, like configure a new printer cry


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35994 07 Mar 23 05:56 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
Ah, so the order of the columns is a problem for you. (I guess by "column letters" you were referring to the Excel-style column identifiers A, B, ..., rather than the column titles. But even those column identifiers don't sort properly, unless we right justify them.)

One option would be to prepend the column # to the column names, in which case my example above would become something like this (with a 2-digit column number prepended to the names)...
Code
1, 01:animal --> dog
1, 02:mineral --> gold
1, 03:Vegetable --> rutabaga
1, 04:fruit --> apple
1, 05:language --> english
2, 01:animal --> cat
2, 02:mineral --> silver
2, 03:Vegetable --> kale
2, 04:fruit --> orange
2, 05:language --> spanish
3, 01:animal --> squirrel
3, 02:mineral --> aluminum
3, 03:Vegetable --> potato
3, 04:fruit --> kiwi
3, 05:language --> french

That preserves the order (animal,mineral,Vegetable,fruit,language), but does introduce some complications, namely how to specify such an option, and how to determine a suitable format for the column-number prefixes. Since it isn't necessarily unique to the INPUT CSV operation (it could for example apply to an ASQL query result set), the option probably should be associated with the map declaration rather than the operation that loads it. And in order for the column numbers to sort alphabetically, we need a sufficient number of right-justified digits, but how many? One possibility might be to allow the varstr keyword to include a format specification, something like...
Code
dimx $csvmap, ordmap2(int; ##:varstr; varstr)    ! column key prepended by two-digit row # and :

That would allow a lot of flexibility in how the column key gets formatted, but I'm afraid it's too much flexibility (if there is such a thing!). Ironically, it would actually reduce the flexibility inherent in passing ordmaps to functions, since this precise format would have to match. Perhaps the biggest obstacle is that the variable descriptor structure doesn't currently allow for open-ended attributes (like "##:") as part of the variable description in memory. So it would require a massive shakeup.

We could however just offer a single format option, maybe one of these...
Code
dimx $csvmap, ordmap2(int; #varstr; varstr)    ! column key prepended by column #
dimx $csvmap, ordmap2(int; int+varstr; varstr) ! column key prepended by column #

If you think about it, the same problem already occurs with the first dimension (the row #'s). There, I figured we just have to allow up to 32 bit integers to be safe, even if it's overkill in most cases. We could adopt a similar compromise with the column #'s and treat them all as 16 bit integers (supporting up to 65536 columns).

Another consideration here is that this kind of automatic key formatting would only be practical in a context like loading a CSV or result set where the load operation knows the original column order. So it wouldn't make sense for ordinary ordmaps, adding another degree of separation between the ordmap2 and ordmap, although it now makes me wonder whether gridmap might be a better name.

Yet another problem is that regular assignment statements, e.g. $gridmap(row, colkey$) = value$ would not have any way to know what number to prepend to the colkey$. You'd have to be responsible for formatting the colkey$ yourself, which creates a bad dependency internal implementation details and application details.

Or, maybe the gridmap is really a 3-D ordmap and should be declared as:
Code
dimx $csvmap, ordmap2(int; int; varstr; varstr)    ! row#, col#, key$, value

In that case, assignment would require 3 arguments:
Code
$csvmap(rowno, colno, colkey$) = cellvalue

This is starting to get overwhelming!

Or, as you suggest, if you really care about the column numbers, then maybe a regular two-dimensional array makes more sense.,,,
Code
dimx ary(0,0), s, 0, auto_extend
input csv #ch, ary()

There are complications here too though. One is that we already have a input csv #ch, ary() statement, but it is meant to only input one row at a time. (Conveniently, this allows you the flexibility to handle the header row separately.) While it's true that the runtime can determine whether the specified array has one or two dimensions and act accordingly, this is another example of a situation where it's far too easy to misinterpret the code when reading it later. Another problem is that auto_extend only works for the first dimension. We could overcome that within the input csv routine by first counting the columns in the first row, and then internally using redimx to set the 2nd dimension. But it would mean that the returned array would have uniform-extent rows even if the original data source did not. (Any longer rows would be truncated, while shorter rows would be filled with null elements.)

It's too early for wine. I think I'll have another cup of coffee and try to think of something else for awhile!

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #35996 07 Mar 23 08:34 PM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
In fact I really mean the Excel-style column identifiers A, B, ..., for the prefix but you're right, we must add two spaces [ A], [ B], ... [ AA],...,[XFD] if we want to support the maximum range of columns in Excel.
And I insist in the possibility of a file w/o a row with headers, how would we handle those without these identifiers?
Note that, I'm talking about these prefixes exclusively for the input csv case, when an ordmap2 is in place for that purpose, the column identifiers are prefixed to the key.
But I agree that, even in input csv, there are cases where the column order is not important and the prefixes should not be there so, we should have an option to inform input csv to not include them.
I would not go for complex syntaxes to allow many variations of this (maybe one day I will regret for this), taking your example I could suggest ordmap2(int; prefheader; varstr) for this specific case and ordmap2(int; varsr; varstr) for the normal case.


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36003 14 Mar 23 11:46 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
It's time for a status report on this little project. It's not quite ready to let out of the lab, but here's what I've got so far: two variations of ordmap2...
Code
dimx $map2i, ordmap2(int; int; varstr)      ! numeric row & columm
dimx $map2s, ordmap2(int; varstr; varstr)   ! numeric row, string column titles

Both variations use integers for the first dimension (rows). It would be possible to add a string version, but for the kinds of grids we are talking about (CSV files, spreadsheets, query result sets), rows typically are identified by number. The columns on the other hand, could be indexed numerically, but are more likely to have alphanumeric identifiers (column or field names).

Loading the Map - You can do it one cell at at time, just like you do currently, only now you need to specify two indices:
Code
    $map2i(rowno, colno) = value$
    $map2s(rowno, colname$) = value$

You can also load an entire CSV file into one of these maps using input csv #ch, $map(). Depending on whether the map was declared as (int; int; varstr) or (int; varstr; varstr), the result will either be indexed by row and column numbers, or by row numbers with column names (in which case it will assume that the first row contains the column names.) For the (int; varstr; varstr) version though, I'm not (yet) convinced that we need to have embedded support for Excel-style column identifiers (A-Z, AA-ZZ, AAA-ZZZ, etc.) as an alternative to names. I would say that those Excel-style column identifiers are really just numbers in Base26, so might as well use the (int; int; varstr) variation, along with a pair of conversion functions, e.g. Fn'Dec'to'Base26() or Fn'Base26'to'Dec(). At some point it might make sense to internalize support for variations of the int key type (radix, maximum range, etc.). But to avoid getting bogged down in minutiae, I think we should for now we should just start with plain old decimal integer representation, probably with a range of something like +/- 10 million rows, and +/- 100K columns. (It's not likely we would use negative row or column identifiers, but that's one advantage of the ordered map over the array: the indices are internally just alphanumeric keys and so are not limited in the way that array indices are.)

Iteration: I'm taking your advice here, thinking maybe we can get away without needing special foreach'row and foreach'col iteration types. Instead, if you want to iterate through the grid, just go through the entire thing, using the upgraded .key($$I,1) and .key($$I,2) functions to access the two key dimensions...
Code
dimx $map2i, ordmap2(int; int; varstr)
...
input csv #ch, $map2i()   ! load CSV file into the map

foreach $$i in $map2i()   ! iterate entire map, working across each row and then down
    ? "Row #"; .key($$i; 1); ", Col #"; .key($$i, 2);" -> "; $$i
next $$i

Extent: We previously discussed implementing a means of getting the number of rows and columns in the map, but I'm not sure it's really necessary if we aren't going to have special row and column iterators. I also suggested that for the rows (and also columns, if numeric), we might as well just use the traditional for row = 1 to x method. But the problem with that method is that for the ordmap, extent is not necessarily the same as the maximum row number (unlike the traditional array). It would be if we loaded the map using the input csv method, but in general, you could build this kind of map with arbitrary row (and column) index values. For example, imagine a grid where the rows are students, indexed by student #, and the columns are classes. The student #'s might be, say, 8 digit numbers, but the school has only a few thousands students. So the extent of the first dimension would be only a few thousand, but the maximum number might be 12345678. We avoid all of that confusion by just not trying to support separate by-dimension extents and iteration.

So, the $64 question: is this enough functionality to make the ordmap2() useful enough to release?

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36004 15 Mar 23 08:29 AM
Joined: Sep 2003
Posts: 4,158
Steve - Caliq Offline
Member
Offline
Member
Joined: Sep 2003
Posts: 4,158
Sure does, the CSV example is great!
Code
dimx $map2i, ordmap2(int; int; varstr)
input csv #ch, $map2i()   ! load CSV file into the map
foreach $$i in $map2i()   ! iterate entire map, working across each row and then down
    ? "Row #"; .key($$i; 1); ", Col #"; .key($$i, 2);" -> "; $$i
next $$i

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36005 15 Mar 23 08:35 AM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
Definitely YES, it's probably all that we need for a long time


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36006 15 Mar 23 04:53 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
Ok then! I need to do some further cleanup, testing, etc. but should have a beta version ready by tomorrow.

In the meantime, here's your opportunity to voite on the best name for this storage type...

  • ordmap2 - does a reasonable job of reflecting the fact that it is a variation of ordered map, and that it has two dimensions. (And if we ever decide we need a multi-map variety, we'd have to decide between ordmap2m and ordmapm2.)
  • ordmap - avoids introducing a new keyword, but might introduce some ambiguity or confusion into any mention of the term without clarifying whether we are talking about the original one-dimension or new two-dimensional variety.
  • gridmap - one less syllable than ("ordmap two"), and does a better job of evoking a mental image of its purpose; but might obscure its similarity to ordmap.
  • _____________ (write-in candidate)

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36008 15 Mar 23 05:36 PM
Joined: Jun 2001
Posts: 713
S
Steven Shatz Offline
Member
Offline
Member
S
Joined: Jun 2001
Posts: 713
gridmap

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36013 15 Mar 23 06:09 PM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
gridmap


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36016 15 Mar 23 06:26 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
gridmap is off to an early lead, but the polls are still open. And then we may have several days of counting the mail-in ballots (inevitably followed by challenges, accusations, lawsuits). So stay calm everyone.

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36026 16 Mar 23 09:20 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
Ok, the polls have closed and gridmap is declared the winner. The official inauguration will have to wait for all of the court challenges to be settled, but for those anxious to meet with the new president, you can do it in the beta room ...

ash-6.5.1728.0-w32-upd.zip
ash-6.5.1728.0-w32c-upd.zip
compil-6.5.1016-w32.zip
ash-6.5.1728.0-el7-upd.tz
ash65notes.txt





Last edited by Jack McGregor; 17 Mar 23 12:45 AM. Reason: Added standalone compil, -el7 versions
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36035 18 Mar 23 01:28 AM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
A wiseman once said: "Never trust a dot zero." I'm not sure what he was talking about, but in this case, the 1728 dot zero beta release had a compiler bug related to ordmapm, so if you downloaded, you definitely want to replace it with the 6.5.1728.1 (compiler 1018) version below:

ash-6.5.1728.1-w32-upd.zip
ash-6.5.1728.1-w32c-upd.zip
compil-6.5.1018-w32.zip
ash-6.5.1728.1-el7-upd.tz
ash65notes.txt

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36036 18 Mar 23 12:33 PM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
Some hidden force, push me into the ashnotes, made me read everything and leave for today the download, in the opposite of the usual


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36062 26 Mar 23 10:51 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
In case you've run out of reading material, here's a minor update and a few more sentences to read...

ash-6.5.1728.3-w32-upd.zip
ash-6.5.1728.3-w32c-upd.zip
ash-6.5.1728.3-el7-upd.tz
compil-6.5.1019-w32.zip
ash65notes.txt



Last edited by Jack McGregor; 26 Mar 23 10:57 PM.
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36063 27 Mar 23 01:51 PM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
I was out for a week in the paradise, alone with Viviane (important information grin), in "rescue mode" for professional subjects and no room for programming cool
Now, back to reality, it's time to put my hands on this beauty,

Thank you


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36065 27 Mar 23 06:04 PM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
Hi again,

I tried this in a program that import customer orders from XLS files, in resume, most of the work was processed in a few lines of (very simple) code using this new feature, like magic, everything working smoothly!
Now it's time to remove a lot of old-fashioned code from that function. cool

Many thanks for this.


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36067 27 Mar 23 08:16 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
Sounds like you've been working hard! laugh

On the XLS to gridmap conversion, I'm curious whether you first converted the XLS to CSV and then use INPUT CSV #CH $GRIDMAP()? Or used the LIBXL functions to load the grid one celll at a time?

(I have it on my to-do list to create a Fn'LIBXL'Import'To'Gridmap() function to import directly from XLS or XLSX into the gridmap, along with a variation of the ASQL SQLOP_FETCH_ROW operation to fetch an entire result set in one call.)

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36068 27 Mar 23 08:37 PM
Joined: Jun 2001
Posts: 3,406
J
Jorge Tavares - UmZero Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 3,406
Curiously, today I'm feeling very efficient and productive, no doubt that relaxing and enjoying life is far better than fighting with time to get things done, just let it flow whistle

Now that you mention it, I didn't realize that I've converted the XLSX to CSV using my old "VB netcaller" which still handles a few things in Excel.
I'm anxyous to get rid of this old module so, it's time to implement this XLSX/CSV converter, but if you intend to do it, surely, it will be far better than any function I could write, just let me know if I should wait or go ahead, it's not urgent.

As an aside, another function implemented in the VB Netcaller is to list the existing pages in the Excel Workbook, it would be interesting to cover this under this topic, also.

Thanks


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36121 24 Apr 23 06:02 AM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
Still haven't gotten to the the Fn'LibXL'Import'To'Gridmap() function mentioned above, but on a related note, I did add a new ASQL opcode, SQLOP_FETCH_GRID, which loads an entire result set into a gridmap in one step. It's available for beta testing with the ODBC interface via the links below; the MySQL version will be along shortly.


ash-6.5.1730.0-w32-upd.zip
ash-6.5.1730.0-w32c-upd.zip
libashodbc-1.6.119-w32.zip
ash-6.5.1730.0-el7-upd.tz
libashodbc.so.1.6.119.el7.tz
ash65notes.txt

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36140 16 May 23 01:20 PM
Joined: Sep 2003
Posts: 4,158
Steve - Caliq Offline
Member
Offline
Member
Joined: Sep 2003
Posts: 4,158
Easy peesy question, is this compiler only or I would need to also update the run time ashell at customers too?

Code
============================================================================
A-Shell Release Notes Version 6.5.1728.0 (16 March 2023)
============================================================================
1. Language enhancement -- new storage class GRIDMAP is a two-dimensional
variation of ORDMAP with a two-part key (row and column), suitable for 
handling datasets consisting of rows and columns. 

- Declaration:
    dimx $map, gridmap(int; varstr; varstr)  ! num row, str col, str value
    dimx $map, gridmap(int; int; varstr)     ! num row and col, str value


I could use this in something im doing now, but may hold back if i need update the Customer Ashell Version.

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36141 16 May 23 01:30 PM
Joined: Sep 2003
Posts: 4,158
Steve - Caliq Offline
Member
Offline
Member
Joined: Sep 2003
Posts: 4,158
actually im more after the two-dimensional part of the gridmap..

so i can store: ITEM (key), Month1 value, Month2 value, Month3 value, Month4 value, Month5 value, Month6 value

dimx $items (varstr; int; float) ! Item , month , value

Last edited by Steve - Caliq; 16 May 23 01:37 PM.
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36142 16 May 23 02:29 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
Sorry for lack of clarity there, but you definitely need an A-Shell update. (VERSYS on the run will give you the exact runtime edit required, which currently stands at 6.5.1728.)

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36143 16 May 23 02:32 PM
Joined: Sep 2003
Posts: 4,158
Steve - Caliq Offline
Member
Offline
Member
Joined: Sep 2003
Posts: 4,158
Thanks, I'll hold back then as they running 6.5.1654.1 and its not worth the extra over-head of updating them. (yet).

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36144 16 May 23 02:39 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
As for your $items gridmap, currently the only two variants are:
Code
gridmap(int; int; varstr)       ! integer row, integer column, string value
gridmap(int; varstr; varstr)    ! integer row, string column, string value

You could probably just swap your item and month in the key...
Code
dimx $items gridmap(int; varstr; varstr)   ! month, item, value

... but the value would have to be a string.

The original motivation behind gridmap was to represent objects like spreadsheets or result sets, where the individual cell values would naturally have some kind of string representation. And the row part of the index would naturally be integer, whereas the column part of the index might be numeric or it might be a string, e.g. column names. But perhaps there might be good reason to expand it out to allow for varx values so you could store arbitrary objects in the grid.

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36145 16 May 23 02:41 PM
Joined: Sep 2003
Posts: 4,158
Steve - Caliq Offline
Member
Offline
Member
Joined: Sep 2003
Posts: 4,158
I'll continue using my own ordmap two-dimensional feature...

DIMX $ITEM'POINTER, ordmap (varstr; varstr) ! item=array position
DIMX ITEM'ARRAY(500,6),F,6,AUTO_EXTEND ! array position ,month values

Last edited by Steve - Caliq; 16 May 23 02:42 PM.
Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36146 16 May 23 02:44 PM
Joined: Sep 2003
Posts: 4,158
Steve - Caliq Offline
Member
Offline
Member
Joined: Sep 2003
Posts: 4,158
Posts crossed, OK Thanks, one for the be nice but not totally necessary list please.

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36147 16 May 23 02:54 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
Although there may be multiple ways to skin a cat, being a messy job, it's always nice to have to just the right tool!

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36148 16 May 23 03:02 PM
Joined: Sep 2003
Posts: 4,158
Steve - Caliq Offline
Member
Offline
Member
Joined: Sep 2003
Posts: 4,158
Half a tool is better than no tool smile

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36149 16 May 23 03:12 PM
Joined: Sep 2003
Posts: 4,158
Steve - Caliq Offline
Member
Offline
Member
Joined: Sep 2003
Posts: 4,158
slight diversion from topic but is there a feature i missed that sorts an ORDMAP Key sequence .

DIMX $ITEM'POINTER, ordmap (varstr; varstr) ! item=array position

but after populating i wanted to sort it first before I read down it in item sequence.

foreach $$i in $ITEM'POINTER()
XCALL MADEVT,2,"VALUE="+$$i+" KEY="+.key($$i)
next $$i

Re: Two Dimensional Ordered Maps? [Re: Jack McGregor] #36150 16 May 23 04:05 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline OP
Member
OP Offline
Member
J
Joined: Jun 2001
Posts: 11,794
Hmmm.... confused

Sorting the ordmap (or gridmap) key sequence would be like sorting an ISAM index. It's already sorted in the only way that makes sense for it. If you re-sorted it, it wouldn't be able to find anything in its own index. So the key here would be to format the keys so that they are in the desired sort order but can still be used randomly. Usually that's just a matter of right justifying the parts of the key. For example...
Code
map1 kx,b,4
dimx $items, ordmap(varstr; varx)
$items(key) = some'value     ! index not in numeric order
$items(key using "#####") = some'value   ! index now in order
...
foreach $$i in $items()    
    ! items now in numerical order

Gridmap, with it's integer key(s) avoids this problem, but you can otherwise handle it yourself.

In a similar vein but with different motivation you may want to create multi-part keys, e.g...
Code
map key$,s,15    ! cust # + item #
dimx $custprices, ordmap(varstr;varx)
key$ = (custno using "#####") + str(itemno)
$custprices(key$) = special'price
...
foreach $$i in $custprices()
   ! items now in order of item# within cust#

Or did I completely misunderstand the question?

Page 1 of 2 1 2

Moderated by  Jack McGregor, Ty Griffin 

Powered by UBB.threads™ PHP Forum Software 7.7.3