Last month I wrote some basics about IBM's Virtual Storage Access Method (VSAM) datasets. This column will contain some specific recommendations for the few of you who're still using these things.
Use alternate indexes sparingly: An alternate index (AIX) may seem like a good idea at the time. What could be better than providing an alternate access path to data?
The problem is the alternate access comes at a price. If the AIX is set up to be an "upgrade" set, each update to the base cluster results in more I/O in the AIX. What's more, because AIX data is structured around alternate keys, they are hard to tune and may quickly become poor performers due to control interval (CI) and control area (CA) splits.
If you must use alternate indexes, try to keep them away from online systems. If you must have them on the online system consider not using the upgrade option to prevent extra I/O to the AIX. If you must keep the AIX current consider limiting the number of alternate indexes to one or two.
Keep string number low for ESDS: Many applications use an Entry Sequenced Data Set (ESDS) as a processing log. On very active systems this means dozens of transactions per second vying for access to the file.
Once you look at the statistics for such a file your first impulse is to increase the file's string number to allow for more concurrent usage and reduce string waits. However, this is counterproductive. Once there are more than two or three concurrent transactions they will start fighting each other for access to the last CI in the dataset. In detailed terms, CICS may have to re-drive the I/O multiple times before the task gets exclusive control of the last CI and completes the write. In a worst-case scenario I once saw in an auxiliary trace, CICS re-drove the I/O dozens of times, thus increasing both CPU and response time. In these situations string waits are more efficient
Use Local Shared Resource (LSR) wherever possible: LSR mode allows VSAM to maintain dataset buffer pools to reduce I/O through lookaside processing. Lookaside is a buffering technique where VSAM looks for a record in storage before getting it off of disk. VSAM manages the buffers with a least recently used (LRU) algorithm meaning the buffer that's gone the longest without being touched is reused.
CICS has a lot of options for taking advantage of LSR, and if you have the virtual storage above the 16M line you can get your buffer ratio up to 90%. Consider the following:
There are two reasons not to use LSR. First, sequential I/O, browsing, is more efficient with non-shared resources (NSR) processing. Second, a transaction can hang itself up while processing an LSR dataset if it does a series of READNEXT commands followed by a get for update on one of the browsed records. As far as I know, this is still a restriction.
Size is important: The CI size you pick for the data component of the cluster is a balancing act between space and performance. The simple thing to do is divide the average record size into the available CI sizes until you get a reasonable number of records in one block. However, there are two other things to consider. First, each CI contains a four byte CI Definition Field (CIDF) and at least one three byte record definition field (RDF) for each record in a variable record length cluster. For fixed record clusters there are just two. Although these are small fields you should take them into account.
The second consideration is free space. Free space allows you to set aside some room in the CI for inserted records and thus avoid CI splits. You don't have to allocate any free space if the dataset is read-only or new records are always added to the end of the file. Otherwise you should probably leave room for at least one record. Note that free space is meaningful only when you are loading the dataset. After that you're at the mercy of the application because when VSAM undertakes a CI split it evenly divides the records evenly between two CI's and doesn't take free space into consideration.
You should avoid larger CI sizes for online systems. Not only does it take more processor to move around large chunks of data, VSAM locks at the CI level. Therefore, a large CI containing many records is more likely to cause waits.
For seeming to be a relatively simple access method VSAM can be very subtle. However, following a few simple rules, VSAM performance and cost can be good enough for most applications. Remember, however, VSAM is not a database. Even with innovations like record level sharing (RLS) and transactional VSAM (TVS) you look to a real DBMS for a robust solution.