Optimizing File Sorting

If you do a lot of sorting, or have very large files to sort, or would like more sorting capabilities, we recommend that you switch from BASORT and SORT to the OptTech sort routine, which was developed by third party specializing in sorting. It supports up to 100 sort keys, as well as complex operations like removing or exporting duplicate records, and is much faster than BASORT (particularly with files that do not fit in memory). It has some down sides as well, such as not being interface compatible with BASORT, not supporting files which have block padding, and there is a per-copy licensing fee. Contact us for more details if interested. Otherwise read on for tips on getting the best performance our of SORT and BASORT.

The main trick to optimization of BASORT.SBR and SORT.LIT is to specify SBR=MALLOCSORT in MIAME.INI. This causes A-Shell to dynamically allocate a large chunk of memory to allow the file to be quick-sorted or tag-sorted directly in memory, instead of having to use the much slower, disk-based, polyphase merge sort. We recommend that you always specify SBR=MALLOCSORT. However, there is a limit to how much memory the sort routine will allocate. By default, this limit is 8MB. If you plan to sort larger files, you might want to increase this limit using the MALLOCLIMIT statement in miame.ini.

With modern Windows workstations (which typically have 256MB of memory), there is probably no harm in bumping MALLOCLIMIT up to 64MB. On UNIX boxes, or in Windows Telnet or Terminal Server environments where many users are sharing the same memory, there is definitely a tradeoff between to be considered. If you allow one user to allocate too much memory, system performance may suffer, at least until that memory is fully returned to the system.

In many operating systems environments where multiple users are sharing the same physical memory, even though a user frees up a temporary memory allocation, the system might not make the memory immediately available to others. This is due partly to historical experience suggesting that a process that allocates and then releases a chunk of memory is likely to re-allocate it again before it exits (and thus it is more efficient to "reserve" the freed memory for re-use by the user that freed it.) The other reason is there may be significant overhead in the "garbage cleanup" necessary to consolidate the freed memory with other available segments. The one guaranteed way to force the system to re-use freed memory is for the process to exit. (That is, if you have a single program that needs to sort a huge file, let's say 256MB, then you could launch an A-Shell session using MALLOCLIMIT=260MB, sort the file, then exit that A-Shell session so that the memory will be immediately made available to the rest of the system.)

Also note as an alternative to changing the MALLOCLIMIT, you can simply change your current memory partition size, dynamically, with the MEMORY.LIT command. (BASORT.SBR will use your partition if there is sufficient memory available there for a quick sort or tag sort.) So in the case just described, it would be easier to just use a command file to set the partition size for the process, rather than using a special version of the miame.ini. The one potential downside of that approach is that if the requested amount of memory is not available, MEMORY.LIT will simply fail, whereas in the case of MALLOCLIMIT, BASORT will try for a smaller allocation if the original request fails.

As of Build 833, BASORT contains some minor optimizations to the tag sort routine that come into play when the file being sorted is opened with the span'blocks modifier or when the record size divides evenly into 512 (or is larger than 512, in which case span'blocks is automatic). So when laying out a file to be sorted, it makes sense to use record sizes like 16, 32,64,128,256, etc., or to use span'blocks.