Use Larger Record Sizes

This may sound like strange advice, especially for those of us that grew up squeezing our record sizes down to the absolute minimum possible. Surely, all else being equal a larger record is no more efficient than a smaller one, but it may not be any less efficient either. As was hopefully made clear in the discussion about memory mapping and local copies, the biggest bottleneck in disk I/O is not the transferring of the bytes or the network bandwidth but the various forms of overhead related to initiating the operation. Those forms of overhead are not related to the number of bytes transferred in an operation. In other words, the time required to read a 768-byte record is, for practical purposes, about the same as the time require to read a 16 byte record.

Consequently, it doesn't make sense to try to squeeze bytes out of your records, especially if it requires you to go to extra trouble to pack data into compact formats. For example, whereas it might once have made sense to store dates in a compressed 2 byte format, packing and unpacking them as needed, it probably makes more sense to store them as 8 bytes, or even 10 (with slashes).

The only reason to minimize record sizes is if, in doing so, you make the file easier to back up (i.e. it fits on a convenient medium) or you make it more practical to load into memory. But an increase of 25% in the size of the file is unlikely be significant in either case.

Similarly, don't worry about 512 byte blocking or the 512 byte record limit. If you need to expand a record, just expand it to whatever size is convenient. A-Shell supports the span'blocks option on the OPEN statement (see previous topic), and invokes it automatically for record sizes larger than 512 bytes. You're much better off with a single record of 900 or 1600 bytes, than with two or three smaller records. (There might be a slight advantage to record sizes that are even divisors or multiples of 1024, but this is a minor factor.)

As an example of where this logic might apply, let's say you are creating a file to store invoice history for each customer. Standard design principles would suggest creating a file with one record per invoice. That way, if a customer had 500 invoices, it would use 500 records, but if he had only one, then only 1 record would be needed. Let's say that the record size is 64 bytes. An alternative approach suggested by the "use larger record sizes" strategy would be to store the invoice history records in "super-records", each holding several invoices. For example, we might go with a 1024 byte record that held up to 16 invoice items. It might take a bit more logic to access the file this way, and we would waste, on average, half a record (512 bytes) for each customer. On the other hand, we cut the number of file operations needed to fetch the invoice history for a customer by a factor of 16, making this design much faster, even allowing for the fact that we would be, on average, transferring twice as many bytes (due to the half-empty records).

You would have to decide for yourself if the performance advantage was worth the negative baggage (waste of space, extra logic required, etc.) Quite likely it is not. The point of this exercise was not to suggest a goofy database design, rather to make clear that the main disk bottleneck is in the number of records accessed, not in their sizes. If performance is important to you, then try to get more accomplished with fewer individual disk I/O operations.

One more example of this principle is worth noting. Many applications have one or more utility programs, such as file rebuilds or reports, that read every record sequentially in a file. Although such programs will probably benefit from one of the methods described previously (e.g. memory mapping, local copy, read-only mode, etc.), yet another approach would simply be to read more than one record at a time. Let's assume again that we plan to read a sequential series of 64 byte records. Instead of reading them one at a time, we might do something like the following:

map1 XREC

  map2 REC(16)                 ! array of 16 file records

map3 KEY,S,10

map3 DESCR,S,50

map3 CODE,B,4

 

OPEN #1, "MYFILE.DAT", RANDOM'FORCED, 1024, RECNO

LOOP:

RECNO = RECNO + 1

IF RECNO > MAXREC GOTO DONE

READ #1, XREC

! now process the 16 records individually….

FOR I = 1 TO 16

   <process REC(I)…>

NEXT I

GOTO LOOP

 

The above technique needs some refinement. For example, it ought to deal with the case where the file doesn't contain an even multiple of 16 records. And it needs to be careful about whether reading multiple records at a time interferes with the 512 byte blocking logic. (This is another good reason why you should always use span'blocks mode, because it eliminates these pesky 512 byte blocking concerns.) The point, again, is that although this technique does not reduce the amount of data transferred, it reduces the overhead by reducing the number of individual disk operations, and thus will run faster and create less load on the server.