It's better than a heap but not as glamorous as a database. IBM created it in the 70's but didn't fix it for another...
ten years. Despite its age and simplicity it still holds mountains of the world's data. I'm talking about Virtual Storage Access Method (VSAM) datasets. Because it's still important to this day this column will deal with some VSAM basics as a basis for a later column about VSAM performance.
Basic VSAM datasets come in three flavors, Key Sequence datasets (KSDS), Entry Sequenced datasets (ESDS) and Relative Record datasets (RRDS). In my experience RRDS's are the least used. An RRDS consists of preformatted, fixed length slots that may or may not have records in them. RRDS's can be processed sequentially or directly through a relative record number (RRN). Note that when a record is erased the slot still holds its place in the dataset, unlike a KSDS where VSAM reclaims the empty space.
An ESDS is essentially a sequential dataset where new records are always inserted at the end. You cannot access records directly with a key. However, you may access specific records with a relative byte address (RBA), if you have or can calculate it. In fact, it's fairly common to see CICS application use an ESDS as a log file to write records in sequential order while retrieving the new record's RBA from the RIDFLD operand of the EXEC CICS WRITE command. If the application squirrels away the RBA it can later retrieve the log record directly. Also note that you cannot physically delete a record from an ESDS. Instead, most applications utilize a "logical delete" scheme where a field in the ESDS record is set to a value indicating the record is no longer valid.
KSDS records can be retrieved directly with a key. The key is specified when you define the dataset and can be up to 255 bytes at any offset into the record. A KSDS is made up of two components, the data and an index. The data component contains, the, er, data. The index contains the keys and pointers into the data component. Note that a KSDS uses a sparse index which assumes the records are in order within the data blocks which is why you can't load a KSDS with records that are out of order. The data and index components are linked together in a logical entity known as a cluster. Usually processes refer to the cluster name to use a KSDS. Unlike an ESDS or RRDS, deleted KSDS records are physically removed and the space is reclaimed.
In addition to this, VSAM also supports alternate indexes (AIX), which can provide a couple of things. First, it supplies alternate keys into a KSDS. For example, you could have a KSDS whose primary key is social security number with an AIX supporting a secondary key of last name. Second, it can also allow keyed direct access into an ESDS. In this case of a credit card application, the ESDS would be a sequential log of financial transactions with an alternate index over the account numbers. This enables any application to get at transactions by time or account number.
In physical structure the AIX is a lot like a KSDS where the key is the secondary key and the data is an RBA into the target dataset. As you can guess from the examples I gave above, an AIX must be a dense index as the alternate keys will not be in sequence in the base cluster. There may also be duplicate keys in the target dataset meaning the key value in the AIX may be followed by as many RBA's as will fit. The four byte RBA means that you cannot define alternate indexes for extended addressability VSAM datasets which can grow beyond the old 4G limit.
As you imagine, all this fabulous alternate access isn't cheap. AIX's can be defined as upgrade sets, which means any update to the base cluster will be reflected in the alternate index in the same unit or work, potentially causing one I/O to turn into many. If you have, for example, an ESDS with two alternate indexes defined for it, one write to the ESDS will cause one I/O to each alternate index, possibly more than that if the AIXs' indexes need to be updated as well.
Ultimately you define a logical construct called a path to bind the AIX together with its base cluster. Applications can use this path just as if it were a KSDS by referring to the path name. The paths, AIX's and base clusters are all referred to as a sphere. For being a relatively simple thing, you can see that even these VSAM basics get a little complicated. Next month I'll go into more detail about VSAM structure and performance.