Compressed Indexes

ISMUTL.LIT now supports an option to create compressed indexes. There are three types of compression which can be used individually or in combination:

  Leading duplicate characters. Here, a single byte is used to represent the number of leading characters that have been duplicated from the prior key. This is beneficial when the average number of leading duplicate characters is greater than one.

  Trailing blanks. A single byte is added to the key to represent the number of blanks on the end. This is beneficial when the average number of trailing blanks is greater than one.

  Duplicate keys. This only makes sense when duplicate keys are allowed. A two byte duplicate key flag is used to replace duplicate keys.

Combining all three gives you "maximum" compression.

Compressed keys do add some processing overhead, but since file access is usually disk bound rather than CPU bound, the decrease in the size of the index generally more than makes up for any increase in CPU activity. They only make sense, though, with string keys, longer than about six bytes.

To change the compression on a secondary index, you delete the key and then add it back, using the DELETE and ADD2ND options in ISMUTL. To change the compression on the primary index, you need to dump the file and then recreate and reload it.

Note that the compression option has been appended (somewhat cryptically) to the "Duplicates allowed?" question in ISMUTL. This way it does not change the sequence of prompts and will not break an existing command files the execute ISMUTL. The ISMUTL STAT function will display output a new line indicating if the index is compressed.