Hope everyone is doing well after the holiday break.
I am experiencing a DYNSTRUCT related BASIC error when running an SBX with 7.0.1765.9/64 on Debian 12 that was compiled with 7.0.1765.9 on EL7. The BASIC error is:
Quote
?Unable to bind dynstruct to defstruct at location counter &h15EF9 of SITESELECT7.SBX
When I compile the same program with 7.0.1765.9/64 on Debian 12, it will then work fine in Debian 12. It however will not run on EL7. In that case, it fails with this BASIC error (on EL7):
Quote
?Undefined dynstruct member at location counter &h15EF9 of SITESELECT12.SBX
These are the same program. I renamed the file from SITESELECT.SBX to SITESELECT7.SBX and SITESELECT12.SBX for testing.
Jack, I am going to email you the LSX files and the SBX files generated with each platform so you can look at it when you have time. I don't see any significant difference between the LSX generated with the EL7 A-Shell and Debian 12.
So far so good, but the year end to-do list still has many items to check off...
I've been able to confirm the error, and am pretty sure it's a 32/64 bit issue, since it also occurs between the Windows (32) and Debian (64) bit versions. I'm glad you spotted it now though as I was gearing up to release a couple of compiler updates to deal with the function wrapper and outputonly parameter issues.
I'm afraid you've uncovered a problem for which there is no painless fix, at least one that I can think of. The basic problem due to packing differences related to 64 vs 32 bit pointers, the embedded defstructs in the 32 bit environment (both compiler and runtime) are different than the ones in the 64 bit environment. Which means that if the program contains embedded defstructs, it has to be run in the same architecture as it was compiled.
With some effort (partially undertaken but now being re-evaluated), I could fix the 64 bit compilation so that the programs would be 32 bit compatible. But that would mean that any program with embedded defstructs compiled under 64 bit would no longer be compatible with earlier 64 bit run-times.
I could also set the minimum runtime version in the RUN header for any program containing embedded defstructs to be 1767, so that you would immediately know about the problem, but it would still mean that you would have to recompile all affected programs and update any affected run-time versions as the same time. We could further fine-tune that by allowing you to override the minimum runtime version, so in your case, for example, you might be able to get away with compiling under Debian 12 with A-Shell 7.0.1767 and running under older CentOS 7 versions, but it would help those with multiple 64 bit run-times.
The one bright side is that embedded defstructs are relatively new, rare and exotic. Combined with the slow migration to 64 bit, it may be that hardly anyone is affected. But I think this requires a bit more contemplation before taking action. As always, suggestions welcome!
The coding fix would be slightly simpler if it was compiled in 64 bit compatibility mode. (Adding filler to the 32 bit structure to make it line up with the 64 bit version is simpler than recompiling the 64 bit version to squeeze into the 32 bit structure.) The downside is that it breaks backward compatibility with the 32 bit version, which seems more likely to affect more installations than if we break backward compatibility with the 64 bit version.
Either way, the number of affected installations is probably rather small, as I don't think embedded defstructs have been widely used. (I'm not even sure that there are any such 64 bit installations, and probably won't be able to determine that until everyone comes back to work next year.)
Given that the problem has been latent for at least a year and you were the only one to notice, depending on how much of a problem it would be for you to recompile and update all affected programs/installations, maybe that should be the determining factor.
As mentioned previously, in addition to deciding which kind of compatibility break is least painful, we also have the question of whether to set the minimum run version to 1767 for any affected program on recompilation. The upside is that it makes the issue impossible to overlook. The downside is that it breaks backward compatibility for everyone (as opposed to only the 32 bit or 64 bit subset of installations). The latest compiler will allow you to manually control that via a switch, but that's probably not going to be very practical for anyone except maybe in very specific circumstances.
Is there a 64-bit version of Windows A-Shell? I didn't find one in the downloads. We could migrate the K&K salesman laptops to 64-bit if needed. We don't want to be alpha/beta testers for these 60+ laptops if 64-bit is not already working in production Windows A-Shell.
Would it be a problem for the time being to make compilation 32-bit by default? Maybe setting the minimum run version to 1767 would be enough to solve this problem? I'm just thinking out loud here.
We are currently, in the process of migrating from EL7 32-bit A-Shell to Debian 12 64-bit. We have 2 development VM's for the different operation systems to ensure we can test in both environments. It's pretty easy to make the mistake of compiling in the 64-bit environment and pushing programs to a 32-bit A-Shell install.
Sorry, no. There are a couple of DLLs which I'm not sure I can get 64 bit updates for, nor is there any apparent threat to terminate 32 bit apps under Windows, so there are no immediate plans to migrate the A-Shell/Windows.
But I'm happy to report that after checking around, I don't think anyone is going to have a problem with the previously described solution which will maintain 32 bit compatibility even if compiled under 64 bit. I think I have the coding done, but between that change and some tinkering with the parameter passing mechanism to fully implement the .ARG_PASSED(@arg) function (discussed in this thread), I think I need a full day of testing here before let this out of the lab.
There's just one decision yet to make, and that is whether to automatically set the minimum runtime version (embedded in the RUN file) to 1767 for any newly compiled program with embedded defstructs, and allow you to override it via the new _MIN_RUN_VER symbol (i.e. compil prog/c:_MIN_RUN_VER=1600), or default to no change in the minimum version and let those who might be straddling the before-and-after versions in the 64 bit environment manually set the minimum version. I would be inclined toward the safer (former) option, but that largely undermines the convenience of maintaining backward 32 bit compatibility by forcing you to implement the new compiler switch in your compiler wrapper. Since I'm not aware of anyone on the other side of this problem, probably the latter approach is going to be the least disruptive for the most people.
If that's ok with you, it's ok with me. But it does mean that you'll either have to update all those laptops to 7.0.1767 if you plan to distribute any newly compiled programs (with embedded defstructs) to them. Either that or use the new _MIN_RUN_VER mechanism to override the minimum run version.
Would it make sense to release the 32bit compiler changes before the arg_passed(). The arg pass change is going to require a fair amount of code fixes to be able to compile.
Hi Jack, I did not mean to abandon the conversation here, but can see it looks like Stephen and you have come up with a reasonable solution. I agree that setting the minimum runtime version seems to be a good idea. Does this mean that one would be free to compile with either the new 32 or 64 bit versions of A-Shell and the compiled program would run on either one, or would we need to compile with only the 64 bit version going forward? Also, would there be any potential problems if the ARM version that you had released for Raspberry Pi a few years back was brought back?
Stephen - I think it's probably not practical at this point to release them separately. The change to make OUTPUTONLY ignore any parameter passed in was in compiler edit 1047, A-Shell 7.0.1765.9, and I'm about to release 7.0.1767.0, compiler edit 1053. But I guess what I could do is add a ++PRAGMA IGNORE_OUTPUTONLY that would effectively ignore the :OUTPUTONLY qualifier, which I think would give you back the behavior you were counting on. Perhaps you could insert that into your version of ashell.def as a temporary measure, or just into each program that you recompile prior to updating the code to use .ARG_PASSED()?
John - To be clear, once you update to 7.0.1767.0, any programs compiled under either 32 or 64 bit platforms would have the same hash, and thus work the same in either environment. But, because of the new minimum runtime version, any programs recompiled with embedded defstructs would require the 1767+ runtime version to run. (Even though technically, since we are sticking with the 32 bit embedded defstruct format, you could get away with overriding the minimum runtime version and running those programs on older 32 bit A-Shell.)
As for the Raspberry Pi, I think it needs to be updated in any case, since the last release of it was 6.4.1548, before embedded defstructs were introduced. I'll put it on the to-do list to try to bring it up to 7.0, after which, in theory, it should be compatible with the other platforms/architectures.
OK, thanks for the info Jack. Sounds good. Just to be clear, I'm certainly not requesting a new build for the Raspberry Pi. I did want to bring it up as I wasn't sure if the ARM architecture would have a bearing on the decision here.
Ahh - got it. I don't think ARM vs. x86 should matter. It's mainly the 32 vs 64 bit issue. The only Raspberry Pi release of A-Shell (so far) was 32 bit, but the newer Raspberry Pi's are 64 bit. Byte order might be an issue. But since we stopped producing the AIX version, all of the A-Shell platforms are little-endian. So even though the goal remains to be byte-order-independent, it's possible that a dependency has crept in somewhere without notice. Still, if we ever produce a new big-endian version, it would have to be made compatible with any existing byte-order-dependent constructs.
After further discussion with Stephen and reflection on the pros and cons, I've decided to make one further adjustment to the compiler (now edit 1055) which essentially backs out edit 1049 so as to allow any default value with OUTPUTONLY parameters. (Edit 1049 generated an error if the default value for an OUTPUTONLY parameter was anything other than 0 or "".) The only reason for that edit was to flag code that was depending on the misguided behavior introduced in edit 830 and then backed out in edit 1047, during which time the presence of a default value was effectively causing the OUTPUTONLY qualifier to be ignored. (That misguided behavior was motivated by a desire to use an unusual default value to help determine whether the parameter was actually specified, based on whether the unusual default value got overridden by the caller. That may have been a workaround for the lack of any other way to determine if a parameter was actually passed, but it's counter-intuitive, and was known to be triggering bugs when functions were called multiple times, with the initial value of such a parameter changing based on the caller even though coded as OUTPUTONLY.)
At this point, I think we are reasonably convinced that the universe of programs counting on the misguided behavior is small enough that it makes more sense to just clean it up to use the new .ARG_PASSED() function instead. And in order to find that code, you can compiler 1054, which is available in standalone form below, as well as embedded in the earlier version of the 7.0.1767.0 executable posted above. (Those links have been updated to point to files with -1054 suffixes, and new versions of 7.0.1767.0 with compiler 1055 are below.)
Note that the version remains 7.0.1767.0; the only obvious difference between these and the ones posted a couple of days ago will be the release date, file date, and compil version (displayed if you execute COMPIL with no arguments).
I was about to casually respond that you had nothing to worry about, but then decided maybe I'd better do another test just to make sure. So I expanded your example a bit to...
And was promptly greeted with the following barrage of compiler errors...
Quote
?Illegal nesting (missing end to FN'APPLY'CHANGES'TO'SQL?) (37) - FUNCTION FN'AP PLY'CHANGES'TO'SQL(F'APPLY'COUNT AS F6:OUTPUTONLY, F'ERRORS AS F6:OUTPUTONLY, F'SKIPPED AS F6: OUTPUTONLY) ?Unmapped variable: (39) - ? tab(5);"FN'APPLY'CHANGES'TO'SQL";"(";F'APPLY'COUNT; ",";F'ERRORS;",";F'SKIPPED;")" << F'APPLY'COUNT >> ?Unmapped variable: (42) - F'ERRORS += 2 << F'ERRORS >> ?Unmapped variable: (43) - F'SKIPPED += 3 << F'SKIPPED >> Phase 2 - Adjust object file and process errors Undefined function or procedure - FN'APPLY'CHANGES'TO'SQL Undefined function or procedure - FN'APPLY'CHANGES'TO'SQL Undefined function or procedure - FN'APPLY'CHANGES'TO'SQL Undefined function or procedure - FN'APPLY'CHANGES'TO'SQL
After starting down a rabbit hole trying to figure out what I'd broken, I decided to actually read the message ("missing end to FN'..."), upon which I realized you'd tricked me by replacing ENDFUNCTION with EXITFUNCTION! Once that was fixed, it all acts as it should...
Quote
.RUN TSTOUTONLY First call, 0 , 0 , 0 ... FN'APPLY'CHANGES'TO'SQL( 0 , 0 , 0 ) (Adding 1,2, and 3 respectively...) Returned values: 1 , 2 , 3 ...
Second call, 1 , 2 , 3 ... FN'APPLY'CHANGES'TO'SQL( 0 , 0 , 0 ) (Adding 1,2, and 3 respectively...) Returned values: 1 , 2 , 3 ...
Third call (named params), F'APPLY'COUNT= 1 ,F'ERRORS= 2 FN'APPLY'CHANGES'TO'SQL( 0 , 0 , 0 ) (Adding 1,2, and 3 respectively...) Returned values: 1 , 2 , 3 ...
1) The issue here only affected OUTPUTONLY parameters with default values, which you function didn't have. In earlier versions of the compiler, adding the default value was effectively causing the OUTPUTONLY to be ignored. That allowed clever programmers to detect whether a parameter had actually been passed by setting the default value to something that would never be passed. But the downside was that default values could be overridden by the caller, which led to functions acting differently on subsequent calls due to the default values not being respected. So the reason for the multiple calls in the example above was to verify that the function was essentially idempotent.
2) To illustrate the problem, I inserted a default value for the last two parameters...
Code
FUNCTION FN'APPLY'CHANGES'TO'SQL(F'APPLY'COUNT AS F6:OUTPUTONLY,&
F'ERRORS=0 AS F6:OUTPUTONLY,&
F'SKIPPED=0 AS F6:OUTPUTONLY)
I think everyone would agree that adding a default value of 0 shouldn't change the behavior, since that's the default anyway. But prior to this most recent series of compiler updates, it did. Here's what the program acted like in 1756.12 for example...
Quote
.run tstoutonly First call, 0 , 0 , 0 ... FN'APPLY'CHANGES'TO'SQL( 0 , 0 , 0 ) (Adding 1,2, and 3 respectively...) Returned values: 1 , 2 , 3 ...
Second call, 1 , 2 , 3 ... FN'APPLY'CHANGES'TO'SQL( 0 , 2 , 3 ) (Adding 1,2, and 3 respectively...) Returned values: 1 , 4 , 6 ...
Third call (named params), F'APPLY'COUNT= 1 ,F'ERRORS= 4 FN'APPLY'CHANGES'TO'SQL( 0 , 4 , 0 ) (Adding 1,2, and 3 respectively...) Returned values: 1 , 6 , 6 ...
Note that the first parameter was unaffected because it didn't have default value. But the addition of the default value for the other two caused them to accept the values passed to them by the caller. So any such function was potentially at risk of misbehaving on subsequent calls, unless the function explicitly initialized the parameters within the function body.
Stephen - are there different versions/databases for Windows Defender? I don't get any complaints downloading it on either of my W10 or W11 machines. I'm not sure this will be any different, but I've uploaded it as a bzip2 file in case the compression somehow is responsible ...
The zip file doesn't get quarantined anymore. May have been due to me restoring from quarantine.
I'm seeing a bind error.
Code
Unable to bind dynstruct to defstruct in line 805 (at location counter &h1ACA15) of OE01.RUN
Code
This is Debian 12
.ver
-- A-Shell 7.0.1767.0/64 Up and Running --
.compil/v
A-Shell Compiler Version 7.0(1055)
Syntax: COMPIL <file>{/switches} (/? for help)
.
I just compiled this program with this version to test. I'm not sure you want me to send the OE01.LSX. That's the largest program on our system.
I have an older copy of that program here, and it does generate the same RUN hash on both Debian 12 and Windows, but I can't actually run it so can't easily test this error.
Does the error happen when compiling/running in -el7?
Do the VERSYS stats change when compiling with the new version compared to the old?
I guess you should send me the LSX. I can adjust it to jump directly to the offending routine, which is probably the fastest way to get to the bottom of it.
The plot thickens -- although the LSX appears to compile, the output RUN is only 1 block long. That problem appears to go way back though - at least a year. That suggests that something about the LSX format has gotten messed with, since older LSX files do compile. Unless you have a simple example of embedded defstructs that fails, I may need to go off on this detour first to figure out what is going on with the LSX.
It looks like there's an unrelated issue in the LSX generator that has appeared in a recent version. The symptom is a function that has been shaken out but somehow appears in the LSX an endfunction, like this example...
Code
000357 FUNCTION fn'ashell_test'if'ate'at'least'given'version(test'ate'ver$ as s14) as f6
000357 !{shaken out by /px}
000357 ++IF CONF_COOP_SYSTEM <> 1 AND CONF_CHEMSTATION_SYSTEM <> 1
000357 ++ENDIF
000357 ++ENDIF
If you have an older version and can re-generate your OE01.LSX, I can maybe address both issues in one update. Otherwise I'll first fix the the LSX issue and then we can return the structure binding issue.
Never mind, here's an update that fixes the LSX generation bug. (It was a side effect of edit 1050 which involved eliminating error messages related to function bodies when the function header had an error.
Use one of those to regenerate the OE01.LSX and send it to me again. Once I get the defstruct issue resolved, I'll post a more complete set of updates.
Ok, I can now compile the LSX and get the same hash on both Windows and Debian 12. The location 1aca15 where the error was reported is actually just the point where the error trap was triggered, which in this case is a call to an outer function...
Since untrapped errors within functions percolate up the stack to the caller, the actual error is somewhere nested below that. It's exact location can be see in the ashlog.log file. I'll need those details to see if I can modify the program to go there directly, bypassing all of the file dependencies which are otherwise getting in the way of running it here.
08-Jan-25 13:36:01 [p263434-1]<OEMENU:35157> Exec(CHAIN): /var/data/kkvet/ashell/vm/miame/dsk40/160001/oe01.run
08-Jan-25 13:36:01 [p263434-1]<OE01:1b8198> Exec(AMOS): DO TSKAAA.CMD
08-Jan-25 13:36:01 [p267550-0]<:0> ----------------
08-Jan-25 13:36:01 [p267550-0]<:0> A-Shell 7.0.1767.1/64 launched on pts/1:267550 by ashvm_kkvet
08-Jan-25 13:36:01 [p267550-2]<:0> In: Nodes=1/2/85 [L], ip=192.168.17.1 0:0:0:0:0:0, (ashvm_kkvet) inodes: si=0,sm=1,rsv=0(0,0,0), j=3e804b2,q=3e80051, rc=0
08-Jan-25 13:36:01 [p267550-2]<:0> Exec(CMDLIN): DO TSKAAA.CMD
08-Jan-25 13:36:01 [p267550-2]<DO:135a> Exec(CHAIN): /var/data/kkvet/ashell/vm/miame/dsk0/001004/flit.lit
08-Jan-25 13:36:01 [p267550-2]<DO:135a> Exec(CMDLIN): DSK0:FLIT.LIT[1,4]
08-Jan-25 13:36:01 [p267550-2]<FLIT:0> Exec(CMDLIN): ISMBLD TSKAAA
08-Jan-25 13:36:01 [p267550-2]<ISMBLD:217b> Out: Nodes Remaining = 1P/1L, 15 reads, 31098 writes, 0 kbd bytes
08-Jan-25 13:36:01 [p267550-2]<ISMBLD:217b> Final exit
08-Jan-25 13:36:29 [p263434-1]<OE01:aa6b7> *** OUT OF MEMORY: UNABLE TO ALLOCATE -306195712 BYTES!!! ***
08-Jan-25 13:36:29 [p263434-1]<OE01:aa6b7> Trapped Basic Error #73 (Unable to bind dynstruct to defstruct) at location counter &hAA6B7, last proc/func: fn'order_trx_set_unit'cost
08-Jan-25 13:36:29 [p263434-1]<OE01:aa6b7> Call stack trace, from program OE01 :
From line #3041, loc 18e39c, Gosub @18e73d
From line #3041, loc 18fa13, Gosub @19f032
From line #3041, loc 19f077, Gosub @19f0dc
From line #3041, loc 19f0e4, Gosub @1abe35
From line #805, loc 1ac183, Gosub @1ac91f
From line #805, loc 1aca15, Call Proc() @147b27
From loc 147bb1, Call Proc() @14824e
From loc 148295, Func() @aa686
Just a status update - I've been able to reproduce the problem by inserting a call to the function where the automatic binding is taking place. So it seems I have everything I need from you. There's a lot going on here today though, plus it's time for lunch, so I may not get it resolved before later today/tonight.
Another status update - I think I have it nailed down, but need to do more testing. (Obviously there wasn't enough the first time around!) Should be able to release some time Thursday...
WARNING: I just discovered a related problem. The compiler is now setting the minimum run version to 1767 regardless of whether there are embedded defstructs. That may not be a problem for you if you're planning to update all your production users at once. But it will likely not be appreciated by those not using embedded defstructs. Furthermore, I've already run into a site where they are using them, but since they're still in the 32 bit world, bumping up the minimum version when compiling one of the affected programs is making it difficult to roll out the new version incrementally.
So I think I need to: A) fix the minimum run version to only be set when using embedded defstructs (that was always the plan anyway). And B) add some kind of switch allowing that to be disabled. We have the /C:_MIN_RUN_VER capability, but that overrides the value to a specific version number, when what I think some people are going to want is to just disable the automatic increase to 1767. Since it's a temporary issue, maybe a regular switch is overkill; probably another special symbol. Maybe _NO_MIN_1767?
I just found a problem with the 1768.0 (actually it started with compiler 1051 in 1767.0, related to the new .ARG_PASSED() function) -- declaring a function parameter default value to be an actual variable results in a runtime illegal syntax error. That's a pretty rare case (I can only find one example in my code), but it reinforces Steve Evan's motto ("never trust a point zero"). I have a fix ready but am still evaluating it. Stay tuned for a 1768.1 update later tonight.
That problem should have been fixed by compiler edit 1057. If you execute COMPIL by itself at the dot prompt, it will display its edit number. The 1768.1 update should have contained compiler 1058.
Oops. That's a bug. To be fixed shortly. In the meantime you can work around it by defining _NO_MIN_1767=1, either in the source or on the command line.