IDX LEVELS (/L:#)

Increasing the IDX levels helps balance the tree, avoiding the top-heavy situation where we have to sequentially scan many blocks at the top (root) of the tree in order to find the right branch. The optimum number of levels is that which results in only one block at the root of the tree, and is related to the number and size of keys, and inversely related to the size of the IDX blocks. ISMBLD (and ISMDMP) will work out the math for you and display a list of the projected IDX blocks at each level, e.g.:

.ISMBLD TSTNEW

Key size: 50

Key position: 1

Size of data record: 64

Number of records to allocate: 500000

    62415 index blocks will be allocated

Empty index blocks to allocate: 62415

    Projected IDX blocks by level: 4000,20000,100000

    Warning: IDX is top heavy; consider adding levels or increasing block size.

 

The above example shows building a file with 500000 records each with a 50 byte key. Using the default values (512 byte IDX block and 3 levels) the estimated number of blocks at the top level is 4000 which is extremely top-heavy. (The top level must be scanned sequentially until we find our branch, sort of like turning the pages of the dictionary one by one to find the right page.) So in this case, each key lookup will take an average of 4000/2 or 2000 disk operations just for the top level, then typically one each for the remaining levels.

(The numbers work out nicely in this example because assuming an average loading efficiency of 60%, that gives us 5 keys per 512 byte index block, hence the ratio of 5X between each level of the IDX.)

Since 2000+ disk operations per key lookup is going to result in TERRIBLE performance, let's hit Control-C here (before the IDX is built) and try to optimize it.

To optimize this IDX, we can either increase the block size, the number of levels, or ideally both. Using our heuristic for the ideal block size of 20-50 times the key size, we could either go with 1024 or 2048; let's try 1024 and 6 levels:

.ISMBLD TSTNEW/B:1024/L:6

Key size: 50

Key position: 1

Size of data record: 64

Number of records to allocate: 500000

    29415 index blocks will be allocated

Empty index blocks to allocate: 29415

    Projected IDX blocks by level: 1,5,50,500,5000,50000

 

As you can see, now we have a projected IDX tree with only 1 block at the top or root, 5 below that, and then a 10X factor between each of the rest. This is vastly better than the original default configuration, reducing the average number of disk operations per key lookup from over 2000 to about 7.

If you have the time to experiment, you might want to try a block size of 2048 (/B:2048) and 5 levels, which produces an IDX level breakdown something like: 1,3,47,1034,22728. It is hard to predict whether the decrease of one level combined with the increase in block size will be a net gain or loss; it will probably depend on your hardware, but probably won't make a major difference.

Projected vs. Actual IDX structure

The projected number of IDX blocks by level resulting from a particular combination of records, key size, block size and levels is just an estimate based on an average distribution of keys and the number of records specified. In a production environment, you might build the file with a small initial number of records and then rely on the auto-expand feature to increase the file size as needed. Unfortunately, unlike the case with ISAM-A, the auto-expand does not adjust the IDX structure, it only adds blocks to it. Thus what might be an ideal configuration for an initial file size of, say, 10,000 records, might turn out to be badly top-heavy if the file grows to 500,000 records. Thus, you may want to experiment using ISMBLD to specify the number of expected records (i.e. the file size you'd like to optimize the IDX for), in order to figure out the best number of levels. (Just ^C before creating the actual file.) Then you can use the appropriate /B:### and /L:# values, and don't worry if the initial index appears to have more than the optimum number of levels (e.g. 1,1,3,30,3000). Keep in mind that the performance penalty for too many index levels is just one disk operation per level, while the penalty for too few levels is essentially unlimited.