Compressed Indexes

As of A-Shell 4.9.948, compressed indexes are supported. These have no effect on your application code, and only serve to decrease the size of the IDX file on disk if they contain keys which are conducive to one or more of the following compression schemes:

• Compress leading duplicate characters. Here, a single byte is used to represent the number of leading characters that have been duplicated from the prior key. This is beneficial when the average number of leading duplicate characters is greater than one. (For example, an alphabetical name index.)

• Compress trailing blanks. A single byte is added to the key to represent the number of blanks on the end. This is beneficial when the average number of trailing blanks is greater than one.

• Compress duplicate keys. This only makes sense when duplicate keys are allowed. A two byte duplicate key flag is used to replace duplicate keys.

Combining all three gives you "maximum" compression.

To specify that you want a compressed index, you must rebuild the file using ISMUTL.LIT 1.4(129) or higher, which allows for a more detailed response to the "Allow Duplicate Keys" question:

Are duplicate keys allowed for this index? (Y/N{+LTD})

   For compression, add L(eading), T(railing), D(duplicates) to Y/N

   Ex: YL (dupes allowed, compress leading dup chars)

   Ex: NLT (no dupes allowed, compress leading dup chars, trailing spaces)

 

For example, answering the question "Y/LT" will allow duplicate keys and activate compression of Leading and Trailing spaces. "N/T" will disallow duplicate keys and activate compression of trailing spaces only. "Y" or "N" by itself gives you a traditional non-compressed index. (OK, so it's a bit cryptic; the objective was to avoid breaking existing command files that execute ISMUTL.)

Compressed indexes should only be used with indexes containing string keys, and generally only makes sense with keys longer than about 8 bytes. You pay for the disk savings with some CPU overhead, so you have to use your judgment to decide when the payoff is in your favor. (In general, systems are much more disk bound than CPU bound, so if you can achieve a 25% or more reduction in the IDX size, compression will probably improve performance.)