Despite avalanches of new technology, some of the world's most important information still resides in VSAM. However, even after 30 years of service, there's still room for improvement. In z/OS 1.12, IBM made one improvement to VSAM with control area (CA) reclaim.
A VSAM key-sequenced data set (KSDS) consists of data and index components. The index is an inverted tree made of control intervals (CIs), with the number of levels depending on the size of the cluster. The higher-level CIs point to lower-level CIs until we get to the lowest level, called the sequence set. The sequence set contains one CI for every CA in the data component. Each record inside a sequence set CI points to one data CI within the CA. VSAM chains sequence set CIs together in key order like a train.
As an application executes, records may be deleted or inserted into the cluster and VSAM must update the index CIs accordingly. If a data CI empties, VSAM deletes the corresponding record in the index sequence set CI belonging to that data CA. If the entire CA becomes empty, all the records in its sequence set CI are likewise gone.
This is where the trouble comes in. An empty CI maintains its position in the sequence set chain. If the application re-inserts records within the key range owned by the empty CI, they will go there. However, if the application inserts new records in other key ranges, VSAM puts them into already-occupied data CAs, leaving the orphaned CA
The CA is never reclaimed. In fact, if the application inserts keys in with continuously higher values -- say the key is a timestamp -- there will be a steady march of empty CAs all the way down to the end of the data set's allocation. As one systems programmer complained, it's entirely possible to end up with a full KSDS without a single record in it.
CA reclaim to the rescue
CA reclaim solves the problem by maintaining a chain of index CIs that own empty CAs. VSAM hangs the list off of index CI2, whose position may vary depending on the number of index levels and the size of the data set.
When VSAM detects an empty CA, it removes the index CI pointing to it from the sequence set chain and adds it to the free chain. Other index CIs will be added to the chain as CAs become empty. Note that higher-level index CIs may be added to the free chain as well if all the sequence set CIs they point to are free.
As an application adds new records, VSAM will get an empty CA from the free chain and re-insert the index CI into its place in the sequence set chain instead of going to the end of the data component. Not only does this save space, but it improves sequential performance and means fewer periodic re-orgs.
Of course, recovery is a big concern, as CA reclamation involves a lot of processing and I/Os during which any number of things can go wrong. According to IBM, if an ABEND occurs in the middle of a reclaim, VSAM will do its best to complete the operation. If the best effort fails, the reclamation algorithm ensures the KSDS will not be broken, although there may be some orphaned space along with some IDC* messages about the cluster's status.
Enabling CA reclaim is a little tricky because it can be activated at several different levels with different defaults. By default, reclaim is off at the system level. At the data class and data set level ,the default is on. Therefore, installations will need to do some planning and coordination before activating this feature.
At the system level, changing the IGDSMS* CA_RECLAIM parameter to DATACLASS turns reclamation on. There are two ways to manage CA reclaim at the data set level. The data class CA Reclaim attribute applies to groups of data sets and does not take effect until a cluster is deleted and re-defined. The IDCAMS ALTER command can toggle reclamation at the data set level and becomes effective upon open.
IBM updated other VSAM utilities as well. EXAMINE will report the number of empty CAs in a KSDS to help a user decide if CA reclaim would be useful. It also issues new messages if it finds an incomplete CA reclamation. LISTCAT will show you whether CA reclaim is active as well as how many times it's happened. After years of neglect, VERIFY now gets the RECOVER parameter, which completes an interrupted reclaim.
CA reclaim isn't the best choice for every data set. Data sets whose applications insert and delete records at random will not see any benefit. On the other hand, data sets where a range of keys are deleted but not subsequently re-inserted will see the biggest boost. Applications that delete records in narrow key ranges, only to re-insert them later, may actually see a performance hit. Also, note that CA reclamation is not retroactive. Even when switched on, data sets with existing orphan CAs will not lose them until someone re-orgs the data set.
Finally, be sure to have all the compatibility PTFs applied if you're going to mix z/OS releases in the same Sysplex. Failure to do so may result in data integrity problems.
ABOUT THE AUTHOR: For 24 years, Robert Crawford has worked off and on as a CICS systems programmer. He is experienced in debugging and tuning applications and has written in COBOL, Assembler and C++ using VSAM, DLI and DB2.
What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at email@example.com.
This was first published in September 2010