In the age of virtualization where storage performance is critical, a mainframe operation can get new life thanks to an old technique. The use of data striping attempts to reduce device contention and increase I/O parallelism by breaking up sequences of adjacent blocks across several different volumes. This allows more I/Os to run simultaneously across multiple devices. In this tip, we’ll discuss the IBM software implementation of
Data striping basics
To understand the benefits of data striping, it’s worth examining how this technique works. Consider Figure 1, which shows an ordinary cluster where VSAM puts control intervals (CIs), the basic unit of VSAM I/O, in sequence inside of a control area (CA). In this case, the CAs — such as CA1, CA2 and so on — consist of 15 tracks aligned on a direct access storage device (DASD) cylinder. If the cluster extends to another volume, VSAM follows the same scheme of laying the CIs — such as C46, C47, C48 and so on — in order in the CA.
Figure 1 – Conventional disk organization places adjacent blocks on the same disk volume.
There are a couple of potential bottlenecks inherent in this structure. For instance, to process the data sequentially the system must queue I/O against one device one track at a time. Direct access may queue also if two online processes want to go after two different CIs that reside on the same track.
Now consider the implications of Figure 2 showing a dataset striped across four volumes.
Figure 2 – Data striping reorganizes CIs to allow parallel disk I/O for better storage performance.
In this example of data striping, VSAM logically extends the CA across four volumes instead of restricting it to a cylinder on a single volume. Additionally, VSAM interleaves CIs round-robin style across the devices — C1 on volume 1, C2 on Volume 2, etc. If the dataset expands, VSAM will allocate four new extents and preserve the striping structure across four volumes.
The striped arrangement improves sequential processing because the system can initiate parallel I/Os against multiple volumes simultaneously. Striping helps keyed access as an online application can access data from one volume without waiting for I/O to complete on another.
VSAM also supports the concept of “layers,” which IBM defines as the volumes that will participate in an “I/O packet.” In effect, a layer is a set of dataset extents that share the same key range of records. When a dataset expands, the new extents may end up on different volumes than the primary allocation. VSAM may gain further optimization knowing on which volumes each layer belongs.
A customer can stripe nearly every kind of VSAM cluster except for alternate indexes (AIXs). IBM also restricts striping to the data component of a cluster.
IBM warns VSAM supports up to 16 stripes, meaning a CA may stretch over 16 tracks instead of the traditional 15 and contain more CIs. For a key sequenced dataset (KSDS), this means that the index CIs that point to the data blocks — the index sequence set — may not have enough room to reference all the CIs in a CA. This will manifest itself as unused CIs at the end of CA and some wasted space. To fully utilize the space the storage administrator must override the default index CI size with something larger.
Defining a striped cluster
Striped clusters must be systems managed storage (SMS) managed. A combination of data and storage class attributes define how a cluster is striped as summarized in Table 1 below.
|Dataset Name Type||Guaranteed Space||SDR||Number of Stripes|
|EXTended||Yes||> 1||# of volumes in allocation list up to 16|
Table 1 – How a cluster is striped depends on data and storage class attributes
First, a striped cluster must be defined in a data class with the dataset name type set to EXT for extended format. After setting the EXT attribute in the data class, the storage class controls the details about striping.
In the storage class, there are two ways to enable striping depending on whether the “guaranteed space” attribute is set or not. If guaranteed space is off, one can choose striping by setting the sustained data rate (SDR) value greater than one. VSAM divides the SDR by four to get the number of stripes. Thus, a storage class SDR of 12 creates clusters with three stripes. Additionally, when VSAM allocates the dataset it evenly distributes the primary allocation equally across all the volumes.
If guaranteed space is enabled, any SDR greater than zero will cause VSAM to allocate a number of stripes equal to the count of volumes in the storage classes allocation list, up to 16. In contrast to non-guaranteed space, VSAM creates the cluster with the full primary space on each volume.
The usefulness of VSAM data striping
Today, RAID technology combined with the enormous caches found on modern DASD frames have all but eliminated the disk “hot spots” that used to bedevil mainframe performance when too many active datasets got onto the same device. IBM also introduced parallel access volumes (PAVs) that basically clone device control blocks to allow simultaneous access to DASD units. With the lessened criticality of dataset placement, one might wonder if access level striping is still relevant.
I would argue it is because bottlenecks still exist within DASD frames. As one example, processors inside the DASD frame that interface with the I/O channels may become too busy servicing the link to respond promptly. In addition, optimization algorithms in DASD frame microcode may favor large data blocks and sequential processing, typical of batch, over direct I/O and smaller blocks, thus impacting online performance. In both of these cases, a striped dataset on carefully placed volumes can ensure the interface processors aren’t too busy and spread online data across so it spends less time competing with batch.
Second, I/O performance is still sensitive to application data access patterns. For instance, an online application might update a relatively small cluster dozens of times a second and create bottlenecks both inside and outside the DASD frame. Striping might help in this situation because the interleaved CIs spread across several volumes put less strain on any one path through the I/O infrastructure.
Thus, striping at the access method level should still be considered one more tool for managing mainframe performance.
Robert Crawford has been a systems programmer for 29 years. While specializing in CICS technical support he has also worked with VSAM, DB2, IMS and assorted other mainframe products. He has programmed in Assembler, Rexx, C, C++, PL/1 and COBOL. The latest phase in his career finds him an operations architect responsible for establishing mainframe strategy and direction for a large Insurance company. He lives and works with his family in south Texas.
This was first published in January 2012