As a rough comparison of the performance benefits of these two SBX enhancements (XGETARGS and the improved memory allocation), I ran 500 iterations of FACT(30) (i.e, 15000 subroutine calls, nested from 1 to 30 levels deep). On a 1.7MHz laptop, with the previous method, the elapsed time as about twelve seconds versus about two seconds for the new method, i.e. a six-fold improvement, with the new method doing about 2500 SBX calls per second. At 2500/sec, the overhead of SBX calls is negligible enough to not affect many kinds of operations. (But performing the same logic within a single program using global memory variables is still an order of magnitude faster, so you might not want to use nested SBX functions within a CPU-intensive calculation.)
On another test, I wrote a simple SBX which received a channel number and a string and output the string to the specified print file. I then compared the performance of outputting to the sequential file via the XCALL and directly, i.e.:
xcall XPRNT, channel, pline
versus
PRINT #channel, pline
This ran about 7K lines per second with the XCALL to XPRNT.SBX, and about 140K lines per second directly. Again, the message is that using an external XCALL is an order of magnitude slower than doing the same thing directly, but, depending on the situation, that may not be significant. (In an application that generates a 1-5 page printout based on user input, it won't make any difference; in an application that generates a 10,000 page report, the difference will be measured in seconds.)
Obviously, the effect of the overhead is progressively reduced as the amount of work done in each iteration of the subroutine is increased. For example, if instead of merely outputting a line of text to a file, our above XPRNT.SBX routine created an entire page, then the total overhead of the SBX call for even a 7000 page report would be only one second. Or, a more realistic example would be using a subroutine to format and display a field of data. Since you can hardly squeeze more than 100 or so fields of data on a single screen, the total SBX overhead for displaying the 100 fields on the screen would be a few milliseconds, which is not going to matter in that context.
Considering that performance is no longer a consideration in most interactive business applications, the modularization and reusability benefits of external subroutines combined with their reasonably low overhead, argue for their greater use in A-Shell/Basic software development.