Of all the utilities that the mainframe makes available, those used for backup are typically the most useful. To understand why, let's go back briefly to the basics of backup, on the mainframe and in general.
There are two ways to recover from a system crash: rollback/roll-forward (which uses backup) and backup/recovery. In rollback/roll-forward, you keep track of a stream of transactions in a log; in recovering after a crash, you "roll back," or undo, transactions until you reach a state that is "stable" (contains no invalid or inconsistent data; e.g., a backup contains a stable state), then "roll forward" transactions until you get as close as possible to the state of the system at the time of the crash. In backup/recovery, you periodically take a snapshot of the state of the system, including all data, and store it in backup storage (typically tape or disk). To recover, you copy the latest possible snapshot (backup version) into the system.
The aim of the game is to simultaneously minimize the processing lost because of the crash -- by backing up, recovering and/or rolling forward as close as possible to the time of the crash -- and minimize the performance overhead resulting either from backup/logging while the system is online or from taking the system offline to do backup (the backup window).
Since the 1960s, the amount of storage, and hence the amount of storage that needs to be backed up, has grown very rapidly. By some estimates, the total amount of computer storage in the world has now reached a zettabyte. Brute-force improvements in backup processing power and parallelism simply cannot keep up with this rate of increase. Moreover, in the same time period, the requirements for systems to be online have moved from weekdays to 24/7. In many cases, backup windows are simply not permissible. And finally, shrinking IT budgets have meant that users cannot simply buy more disk/tape and processors so as to keep up with increased backup demands.
Today, in order to meet these needs, backup strategies must be both comprehensive and fine-grained. They must be comprehensive -- covering as much storage as possible -- so that tweaks to "win the game" are as successful and cost-effective as possible. They must be fine-grained -- dividing the data into as many "cases," such as structured, semi-structured and unstructured data, as possible -- so that each case can be optimized.
The mainframe does well in providing comprehensiveness. Its ability to handle hundreds of virtual machines and very large amounts of disk and tape storage means that it usually outpaces other platforms in its ability to apply one coordinated backup strategy to a large part of an enterprise's data. However, in order to achieve fine-grained control over backup, IT must to some extent adopt a "do-it-yourself" approach by tuning mainframe utilities to the particular mix of data and business transactions of a given organization. And that's why mainframe backup utilities are so important.
Mainframe backup utilities are often associated with "batch" jobs. In the mainframe's early days, users discovered that operations like backup could be deferred until the end of the day and then applied to a data store all at once in a batch, without interruption by other processes. This meant extremely fast processing of the task; it also meant that no other process could change the data while it was being backed up. As a result, every weeknight and weekend became a backup window, in which systems were taken offline (i.e., communication with users was severed in order to avoid interruption by other processes) to carry out backup, among other tasks. These backup utilities have been modified for 24/7 processing so that they can be carried out online, adjusting to interruptions by other processes. At the same time, storage and database vendors have implemented their own backup schemes that must be coordinated with the new online/offline mainframe backup utilities. Looking at a guide to mainframe utilities, it appears that all a backup utility does is a simple offline copy of all storage. Underneath the surface, each backup utility is doing a delicate balancing act to handle different cases, online processing and coordination with other backup processes.
With this in mind, let's take a brief look at IEBCOMPR and IEBCOPY. At first, it's not obvious why a utility like IEBCOMPR that sees if two files are identical has anything to do with backup. However, a fine-grained, low-performance-overhead backup solution follows the principle of "if it ain't broke, don't fix it" -- in this case, if it hasn't changed, don't back it up. IEBCOMPR, therefore, can play a critical role in determining whether data on the online system is the same as backup data in backup storage, and hence whether it needs to be backed up at all. Another faster way of doing this is looking at the system log to see if any transactions have changed the data, but in many cases, the system log does not have that information. So a smart administrator will use IEBCOMPR as a scalpel rather than an axe, carving out those files where file comparison typically yields major savings in backup copying and where system-log information isn't available.
IEBCOPY is likewise more than it appears to be -- a simple file-copy program -- on the surface. The key here is IEBCOPY's ability to "merge" data sets. This means that while the actual copy from the online system to backup storage is taking place, IEBCOPY is squeezing out redundant data between the two data sets, resulting in a smaller amount of data to be stored on the backup disk or tape. In other words, after IEBCOMPR or a look at the system log has eliminated the data that at a broad level (the file) does not need to be backed up, IEBCOPY is eliminating the data that at a fine-grained level (the record or block) does not need to be backed up. Combined, the two can achieve major reductions in the amount of backup needed.
Of course, the administrator can use these reductions in many ways. In the short term, IT can increase the amount of storage to be backed up, reduce the backup window or move more backup processing online. In the long term, IT can defer processor or disk acquisition and use the money saved to improve system performance by buying a more high-performance solid-state disk. Whatever IT does, mainframe backup utilities like IEBCOPY and IEBCOMPR have an important place in the administrator's repertoire.
ABOUT THE AUTHOR: Wayne Kernochan is president of Infostructure Associates, an affiliate of Valley View Ventures. Infostructure Associates aims to provide thought leadership and sound advice to vendors and users of information technology. This document is the result of Infostructure Associates-sponsored research. Infostructure Associates believes that its findings are objective and represent the best analysis available at the time of publication.
What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at [email protected].