I considered that but between the difficulties already encountered in getting this right, and the fact that the parameter passing rules for embedded subroutines are a lot more flexible than for functions/procedures, I wasn't sure it was a good idea to make the rules too tight and risk triggering a bunch of compiler complaints in programs that were formerly considered good.
One problem is that the compiler cannot force you to use the array notation in the DEFXCALL because it has no idea about the true nature of the parameters accepted by the XCALL. And especially in the case of embedded subroutines, as a result of evolution in parameter passing, we now have a variety of overlapping schemes for passing parameters that could be arrays or scalars. XTREE, for example, accepts either a scalar or an array element for the answer parameter, but the array element is actually treated as equivalent to an array passed by reference.
Most of those variations could probably be handled by the DEFXCALL |alias mechanism, but it probably requires a bit more review to be sure.
Assuming that the DEFXCALL syntax options can unambiguously cover all the scenarios, then I suppose it would make sense for the compiler to complain if you attempt to pass an array to a non-array-named parameter, or vice versa, as in either of these cases...
FYI - I discovered another problem which goes back a couple of months to the introduction of the embedded DEFSTRUCTs. The problem is that any DEFTYPEs used in place of the type,size specification of a structure member are resolved before the structure definition gets embedded, so the DEFTYPEs no longer show up in the DYNOP_INFO operation.
That's apparently not a very commonly used technique, but does come in handy in conjunction with DYNFUNC to generically associate formatting or other logic with member field types.
It's expected to be fixed in 6.5.1715.9 later today...
If possible, please send an ashlog and corresponding LSX file(s) so I can try to identify where the issue lies.
On a related aside, the latest compiler edit (986) fixes a problem related to unmapped variables, but it has been around for a long time, and I don't think your programs use unmapped variables, so I don't think it's related to what you are reporting here.
That would appear to be within the instantiation of the ordmap, even before the first element assignment.
I tried reproducing that in a very simple test but without success (or without failure). But before digging further, let me just confirm that the problem also happens when you compile under 1715.9 (which should be compiler 985, as opposed to compiler 983 that your LSX's were compiled with)
Also, do you any ashlog/location references for the SEGV and/or for whatever error the OE17B program is getting?
I'm still a little confused as to the OE17B program. Yes it is about 2 MB larger than the OE01, but it's a totally different program. And according to the LSX headers, both were compiled with compiler edit 983, which corresponds to 6.5.1715.7.
Going back to the error in OE01 at 10387, at least in the LSX generated by the 1715.7 (983) compiler, it occurs here...
The error doesn't make much sense, given that the procedure expects a string and is being passed one. Possibly there is a connection with the initialization of the PRIVATE section (which would take place at the start of the filedebug_set_trc() procedure, assuming this was the first call to a routine within that filedebug.bsi), but in that case I would have expected the error location to change. But if the 1715.7 (983) and 1715.9 (985) compilations are different, then perhaps the error location 10387 isn't on the CALL but within the procedure?
I think the OE01.LSX from the 1715.9 (985) compiler might be more helpful here.
I'm going to veer off on a tangent to debug some obscure issue with compiling these LSX files. Unfortunately they're rather extreme examples, making them difficult to work with, so that may lead to another detour finding or creating better examples to work with.
In the case of the OE01, the one thing that jumps out between the two versions is that there are three seemingly identical calls like this:
But in the 1715.9 version, one of them ends up occupying an 3 extra bytes, which is almost certainly a sign of some kind of breakdown.
In the OE17 case, the SEGV appears to occur on a MAP initialization, almost at the very start of the program. For that one it might be useful to see the two RUN files, since the difference, and cause for the SEGV, should show up before the program requires any external files, meaning I could probably reproduce it here.
I'm stuck on a detail which is interfering with my ability to compile your LSX files, but I haven't been able to identify the underlying cause or reproduce it. The proximate issue is the use of the ++INCLUDE'ONCE'IF'EXISTS statement (which seems to occur just once in all of the LSX files you've sent recently). It ends up like this in the LSX file ...
(The special <<<<<<<< and >>>>>>>> lines marking the beginning and end of included modules are critical to the LSX compilation, which, although it doesn't actually need to include external modules, it does need to act as if it was including them so that it can handle module-dependent features like private variables.)
I'm curious if you can search through your LSX files to see if all of them have this problem, or whether it is something new and/or limited to particularly large programs. I've tried compiling some monster programs here, but they always come out with the right format.
This may be a red herring and/or a symptom of a deeper problem, but it is making it difficult to work with the LSX files you send. Although I could easily fix the format, it is also throwing off the file # and nesting level which is critical in keeping private variables that have the same names but are in different files separate, leading to bogus duplicate variable errors. And fixing those would be a herculean task since your sample programs have literally thousands of ++includes nested up to a dozen levels deep.
Another possible option, since it appears you only use that directive in one place (in ddsas:directory.bsi[160,5]), and since the file probably always exists for you, would be to just change it to a regular ++INCLUDE and see if that bypasses the issue.
I should add that the OE17B.RUN compiled under 1715.9 definitely has a corrupted header, which explains why it bombs out immediately. That may or may not be enough of a clue to track down the issue, but it may be difficult without being able to recompile the LSX for testing.
In any case, the window for working on it this weekend is about closed; Mother's Day and visiting relatives beckon. Hopefully next week I can get this cleared up.
Sorry this is dragging on so long, but if nothing else it's resulting in some refinements to the LSX files. The issue getting in the way of being able to recompile your LSX file is that although the file in theory has everything needed to recompile, it failed to unambiguously deal with conditional compilation directives that were dependent on the environment. ++INCLUDE'IF'EXISTS was one example, but there are others, such as ++IF LOOKUP(...). Even conditional that are explicitly dependent on just the prior source code (like ++IFDEF) could be indirectly dependent on prior conditional includes. So I've made a few refinements to the LSX format in this latest update, 6.5.1715.11 (compiler 987) to hopefully make it possible to reproduce a prior compilation in a different environment. So when convenient, please generate another LSX of your OE17 using this latest version...
With that maybe I'll be able to at least recompile the program and then focus on what the code generation issue is. (I've tried this with some other large LSX files but so far haven't been able to reproduce the kinds of issues you're seeing.)
As an aside, I'm somewhat surprised you don't archive the LSX files corresponding to those in production. It would seem that the ability to track down unexpected runtime errors based on the location reported in the ashlog would be worth the extra storage requirement.
thanks - there's definitely some improvement here, as I only have a couple of spurious errors to deal with. But I forgot to ask for your RUN file, which will still be useful (unless it hasn't changed from the .9 version).
Status report: The good news: your LSX now compiles under A-Shell/Windows 6.5.1715.12 (compiler 988) and the RUN looks ok (as far as I can tell). The bad news: compiling under Linux (both CentOS 7/32 and Debian 11/64) generates the same bad RUN file that you have.
That suggests some difference in the Windows vs Linux C compilers or standard libraries, but I've only reproduced it so far in a couple of your very large LSX files. (We started to see this kind of issue a few patches back, with the workaround being to roll back the C compiler optimization levels. But now it happens even in a debug compilation. Which suggests... some kind of edge condition perhaps.)
I'm going to have to ponder the situation before coming up with an approach to narrowing it down. (Unfortunately, comparing against prior compiler versions isn't really feasible without the latest LSX handling adjustments.) But I might reapply the latest patches to the 983 version and work forward from there. It's going to take some time.
Correction: the compilation looks good under Debian 11/64 (Ubuntu 20).
But it's bad under CentOS 7 and 8 (in the same way). One other potential clue: it doesn't seem to matter under CentOS 7 or Debian/IUbuntu whether the C compilation uses full optimization or not; the RUN hash is the same. But under CentOS 8, it does (although both outputs are bad)
That all suggests some issue with the gcc compiler. CentOS 7 is using 4.8.5; CentOS 8 is using 8.2.1, and Debian/Ubuntu 20 is using 9.4.0.
Note that as part of the detour through the woods looking for clues I ended up adding a number of details to the VERSYS.LIT utility. I'm not sure the extra details are necessarily that interesting in general, but in this case it provided an immediate indication of the fact that the RUN file was corrupted and and some clues about exactly in what way. (Try it on your bad vs good RUN file.) (Use UPDCUR to bring down the LIT updates.)
Those programs compile and run now. We'll have to test other programs, but good so far.
Might be nice to have a mode for VERSYS.LIT, or another utility to process all .run/.sbx files in a directory and tell which ones are corrupted. The memory required is a nice addition as well.
Good point. I was thinking of making some kind of list file option that would be suitable for capturing relevant details for a batch of programs (particularly memory and minimum version requirements); adding a column that would show warnings would be helpful there. It's on my to-do list now...
Ok, if you run UPDCUR again you'll get version 3.1(110) of VERSYS.LIT which supports a /CSV switch that puts all the information into a CSV file rather than displaying it on the screen, with one row per wildcard match and columns for all the various details. Most of them correspond exactly to the display version, except there is a Notes column which will be blank normally, but otherwise might display error/warning messages like "*** NOT RUNNABLE ***" or "*** PROGRAM APPEARS TO BE CORRUPTED ***".
Other handy uses for the output would be to quickly check if any of your RUN programs require a higher version of A-Shell, or more memory, than expected. It also identifies programs with dynamic functions, embedded defstruct definitions, and automatic dynstruct binding (column DefStructIdx).
Example:
Code
.versys *.run/csv
VERSYS.CSV created
.shlexc versys.csv
Command launched successfully
These appear to be another variation of the previous problem, i.e. one that shows up in the -el7 version but not Windows or -deb11. Investigation in progress...
Here are a couple more that are failing with 6.5.1716.1.
I guess you didn't change the COMPIL version from 989. I double-checked the A-Shell version after seeing that in the new LSX files and it is 6.5.1716.1
Right -- I hadn't actually made any changes to it other than recompiling with optimizations disabled. And as feared, it appears that the band-aid didn't completely cover up the problem. Still investigating...
... is clearly a case of the P in PRINT having been lost, at least in the LSX file.
Can you check that it's there in the testdyn1.bas, and if so, maybe send me the original source to that function to see if I can find any clues as to how that may have happened.