cutimage - Fotolia
Direct access storage is cheap and getting cheaper. New switches and fiber networks extend the limits of I/O system throughput. Between RAID and enormous cache sizes, data positioning hardly matters anymore. So IBM recently designed a hardware data compression feature.
IBM's hardware compression offering, zEnterprise Data Compression (zEDC), shows that data compression is still valuable to many mainframe shops. Disk storage, which is comparatively cheap for data centers, is still a limited resource in need of close management. Storage fabric throughput and cache have shrunk I/O wait times, but today's Internet world often requires shaving every extra millisecond off of e-commerce transactions.
IBM's zEDC is a hardware feature designed to fit into the company's mainframe z13 and zEC12 GA2 processors' Peripheral Component Interconnect Express, or PCIe, drawers. It reduces data set size and, by extension, lessens clutter on the direct attached storage device (DASD) farm. Compressed records spend less time on the storage fabric, increasing overall throughput and reduce batch elapsed times. Since zEDC compresses the data offstage, deflating data sets doesn't significantly increase central processor (CP) use and lessens memory occupancy.
It requires a type of data set, extended format version 2, introduced with operating system z/OS 2.1. A qualifying Central Electronics Complex can sport up to eight zEDCs. Up to 15 logical partitions can share one of these hardware compression features.
The zEDC utilizes an industry-standard data compression format using the Lempel-Ziv and Huffman coding algorithms. The algorithm is considered dictionaryless because the compression dictionaries are including in the data stream. IBM implemented support for hardware compression with zlib software library compatible modules. This industry standard means that the IBM z System may pass compressed files to other platforms, where they will be successfully inflated. The zlib implementation makes compression available to high-level programming languages as well as through the standard Java package java.util.zip.
IBM doesn't replace its CP's hardware compression implementation entirely with zEDC; the mainframe divides the labor based on size. CPs handle short pieces of data (e.g., a DB2 database column), while zEDC digests the large ones. Routing large blocks to zEDC allows the mainframe to compress data in bulk without significantly increasing general CP usage and, by extension, Monthly License Charge software expenses. A processor without access to a zEDC feature can decompress a data set. The catch is that expensive CP cycles perform the inflation and the data set is rewritten uncompressed.
Both types of compression requests can use the same application programming interface, which decides where to forward the request.
Should you use zEDC hardware compression?
Purchasing zEDC data compression hardware requires some planning. One zEDC feature may be enough for some IT organizations, but IBM recommends two to maintain availability. More cautious companies will want multiple zEDCs for every processor as well as at least one for the disaster recovery site.
Currently zEDC supports sequential (BSAM and QSAM) data sets as well as System Management Facility (SMF) log streams. IBM also touts using zEDC for Hierarchical Storage Manager (HSM) migrated and backup data sets, as well as IMS database and transaction management and DB2 relational database management log archives. IMS and DB2 archive processing will be of particular interest for enterprises moving to tape-on-DASD storage. The uses for HSM and database log archiving are particularly compelling as both types of data can be huge and, by definition, rarely read after archiving. Compressing large sequential files also makes sense if your IT organization has a lot of long-running batch jobs that process large, sequential master files.
Data sets must be under System Managed Storage, policy-based storage management, to use zEDC. ASD administrators enable zEDC by creating or updating a data class compaction attribute to zEDC Preferred (ZP) or zEDC Required (ZR). IBM made similar global options available in the Storage Management Subsystem parameter library member IGDSMSxx. ZR fails data set allocation requests if a zEDC feature isn't available. ZP is a little more forgiving. The data set must also meet a minimum space allocation of five or eight megabytes without a secondary allocation for compression.
Capacity and storage planners can use new fields in Resource Measurement Facility type 74 subtype 9 records to measure the benefits of the feature. SMF type 14 and 15 records contain bits indicating zEDC use and fields allowing the calculation of a data set's compression ratio.
If you don't know whether your mainframe ops will see a benefit from zEDC, IBM supplies zSystem Batch Network Analyzer. The zNBA tool plows through the mainframe's SMF data looking for possible candidates.
Taking the IBM z13 mainframe out for a spin
Dedupe: From here to ubiquity
How to choose hardware for data center redundancy