Problem solve Get help with specific problems with your technologies, process and projects.

Optimizing VSAM datasets on the mainframe: Specific recommendations

CICS expert Robert Crawford offers some specific recommendations for VSAM users including using AIX sparingly, keeping a low string number for ESDS, using LSRs sparingly, and size.

Last month I wrote some basics about IBM's Virtual Storage Access Method (VSAM) datasets. This column will contain some specific recommendations for the few of you who're still using these things.

More on CICS:
Securing CICS for newbies

Mainframe Management all-in-one guide

Use alternate indexes sparingly: An alternate index (AIX) may seem like a good idea at the time. What could be better than providing an alternate access path to data?

The problem is the alternate access comes at a price. If the AIX is set up to be an "upgrade" set, each update to the base cluster results in more I/O in the AIX. What's more, because AIX data is structured around alternate keys, they are hard to tune and may quickly become poor performers due to control interval (CI) and control area (CA) splits.

If you must use alternate indexes, try to keep them away from online systems. If you must have them on the online system consider not using the upgrade option to prevent extra I/O to the AIX. If you must keep the AIX current consider limiting the number of alternate indexes to one or two.

Keep string number low for ESDS: Many applications use an Entry Sequenced Data Set (ESDS) as a processing log. On very active systems this means dozens of transactions per second vying for access to the file.

Once you look at the statistics for such a file your first impulse is to increase the file's string number to allow for more concurrent usage and reduce string waits. However, this is counterproductive. Once there are more than two or three concurrent transactions they will start fighting each other for access to the last CI in the dataset. In detailed terms, CICS may have to re-drive the I/O multiple times before the task gets exclusive control of the last CI and completes the write. In a worst-case scenario I once saw in an auxiliary trace, CICS re-drove the I/O dozens of times, thus increasing both CPU and response time. In these situations string waits are more efficient

Use Local Shared Resource (LSR) wherever possible: LSR mode allows VSAM to maintain dataset buffer pools to reduce I/O through lookaside processing. Lookaside is a buffering technique where VSAM looks for a record in storage before getting it off of disk. VSAM manages the buffers with a least recently used (LRU) algorithm meaning the buffer that's gone the longest without being touched is reused.

CICS has a lot of options for taking advantage of LSR, and if you have the virtual storage above the 16M line you can get your buffer ratio up to 90%. Consider the following:

  • CICS allows for eight separate buffer pools. Normally one is sufficient, but you can move a very active dataset into its own pool to prevent its records from crowding out others.
  • You may specify buffers for index and data control intervals. This provides the opportunity to create an index-only buffer pool large enough to contain all of your index CI's, which makes keyed access much faster.

    There are two reasons not to use LSR. First, sequential I/O, browsing, is more efficient with non-shared resources (NSR) processing. Second, a transaction can hang itself up while processing an LSR dataset if it does a series of READNEXT commands followed by a get for update on one of the browsed records. As far as I know, this is still a restriction.

    Size is important: The CI size you pick for the data component of the cluster is a balancing act between space and performance. The simple thing to do is divide the average record size into the available CI sizes until you get a reasonable number of records in one block. However, there are two other things to consider. First, each CI contains a four byte CI Definition Field (CIDF) and at least one three byte record definition field (RDF) for each record in a variable record length cluster. For fixed record clusters there are just two. Although these are small fields you should take them into account.

    The second consideration is free space. Free space allows you to set aside some room in the CI for inserted records and thus avoid CI splits. You don't have to allocate any free space if the dataset is read-only or new records are always added to the end of the file. Otherwise you should probably leave room for at least one record. Note that free space is meaningful only when you are loading the dataset. After that you're at the mercy of the application because when VSAM undertakes a CI split it evenly divides the records evenly between two CI's and doesn't take free space into consideration.

    You should avoid larger CI sizes for online systems. Not only does it take more processor to move around large chunks of data, VSAM locks at the CI level. Therefore, a large CI containing many records is more likely to cause waits.

    For seeming to be a relatively simple access method VSAM can be very subtle. However, following a few simple rules, VSAM performance and cost can be good enough for most applications. Remember, however, VSAM is not a database. Even with innovations like record level sharing (RLS) and transactional VSAM (TVS) you look to a real DBMS for a robust solution.

  • Dig Deeper on IBM system z and mainframe systems

    Start the conversation

    Send me notifications when other members comment.

    Please create a username to comment.