Information lifecycle management (ILM) is now a widely accepted method of managing the storage hierarchy more effectively. It has its greatest effect on users of enterprise systems. ILM forces those with major expenditures on mainframes and mainframe storage to rethink and redesign their architectures and processes for greater effectiveness.
But ILM is more than this. By simply refocusing vendors and users alike on information rather than storage, it is causing a seismic shift in the computer industry -- one with potential benefits well beyond the storage area.
First, some background. ILM deals with data at a different level -- that is, as records and files stored on disk and tape. The aim of information lifecycle management (ILM) is to optimize use of data on disk and tape by classifying it (tiering it) and grouping data in each classification (pooling). The most common classification scheme is by the operations performed on the data -- e.g., active changeable (can change the data), then active archived (can access but not change it), then deep archived (will rarely if ever access it).
ILM metadata allows policy-driven management of the data at a business, not a storage, level. Thus, users may query the ILM management software to find the data classification active archived/needed for SOX compliance. Moreover, ILM metadata can have strong positive effects on policy-driven information management and on the performance and cost-effectiveness of storage. To understand why, we should note that there can be really four levels in the storage hierarchy today, not two:
- Online disk relatively costly and high-performance.
- Nearline disk relatively inexpensive but less high-performance.
- Nearline or midline tape, used for slightly slower backup/recovery, less expensive than nearline disk.
- Online tape, used when recovery is rarely if ever needed, less expensive than nearline tape.
Table 1: Today's storage hierarchy
|Storage level||Operation type||Data protection solutions used|
|Online||Active changeable||Continuous data protection (CDP) or snapshots|
|Nearline||Active archived and active changeable||Virtual or real tape library|
|Midline||Active changeable||Real tape library|
|Offline||Deep archived||Storage media, such as tape, outside the I/O path|
Source: Mesabi Group and Infrostructure Associates, March 2006
Infostructure Associates personnel studies suggest that by creating the right proportion of nearline to online disk, a user can achieve 90% of the performance of nearline disk at about half the cost. By creating the right proportion of all four levels, a user can more than double the speed of backup/recovery using only online disk and offline tape, with a less than 10% increase in cost. In other words, effective ILM delivers far better storage price/performance, with no sacrifice in robustness or flexibility.
Archiving through a new lens
At the same time, ILM does require careful rethinking of how an enterprise architecture's storage (and especially an enterprise system's storage) is put together. In the past, users have tended to think of storage as having three functions: ongoing computing, recovery/disaster recovery, and archiving (long-term storage in case the data is needed again). ILM makes users think of storage as repositories of data which has a lifecycle, moving inexorably from birth to old age. In this model, ongoing computing and archiving deal with young and old data, respectively, with recovery/disaster recovery acting as insurance against data loss in both young and old data. As a result, the key question in designing a storage architecture is not "how do I meet robustness goals cheaply?" but "how do I determine when the next stage of the lifecycle is?"
ILM divides the storage infrastructure into two: active changeable storage and active archive storage. Online and Midline pools of storage are for production data (with data protection added-in). Nearline retains its data protection origins from tape automation solutions, but also adds in Nearline disk (which, while gaining acceptance, is still not typical today). Offline remains data protection data.
In other words, in order to implement ILM, the user must classify existing data according to its stage in the lifecycle, reallocate each level of storage so that it can optimize price/performance (with recovery/disaster recovery) for the data in each stage, and provide management software specific to each stage. More specifically, the user should provide new management software for the Active Archive stage, with new data protection and data retention capabilities, and with much more metadata gathered and managed. All of this is a lot less onerous than it sounds, because the user can choose the pace at which ILM is implemented — but it is a fundamentally new way of looking at storage.
In order to determine which stage in the lifecycle the data falls in, an ILM can collect metadata about the type of data (unstructured, semi-structured, structured), the age of the data, or least recently used, a criterion that places the data last touched on fast disk under the theory that this data is most likely to be used again soon. The user may also want to collect metadata based on other criteria, such as how business-critical the data is (thus, we might store video on a different disk array than business-critical relational data for an ERP application). Much of this data is not readily available within existing storage management software. As a result, effective ILM should tap into data in other software: the operating system as it keeps track of files, databases (e.g., DB2) and data management systems (e.g., IMS and VSAM) as they keep track of business-critical data, and now content management systems and Web sites as they keep track of semi-structured and unstructured graphics and videos.
The role of the mainframe and system z9 in ILM
Because effective ILM needs data about data that storage cannot provide, and because the mainframe is the place where storage is largest and most important, the mainframe has a critical role to play in effective ILM. Mainframes can supply much of the metadata needed by ILM. Likewise, mainframes can support the management software necessary to run active archiving, and good mainframe-architecture design can ensure that the data management and file management systems on a mainframe distinguish between active changeable and active archived data — and optimize performance on each.
Right now, z9 and overall IBM capabilities supporting ILM include optimized storage environments with tiered storage platforms, policy-based retention management software, content and records management applications, and non-erasable, non-rewritable media. In other words, the storage, metadata-administration, and integrated platform are there — the active-archiving management software, the tailored metadata repository, and the mainframe fine-tuning for ILM are not fully fleshed out.
At the same time, it should be noted that no one (including EMC) is clearly further along the ILM path than IBM. Moreover, IBM has some notable potential strengths in enterprise-system ILM, including the ability to support a global metadata repository (e.g., for master data management), as pointed out in a previous commentary.\
ILM must deal not with physical data stored on disk and tape, but rather with information — i.e., with the IT and business meaning of the data. So, in order to provide an effective ILM solution, storage vendors must concern themselves not only with managing data throughout its lifecycle (hence systems management), but also with supporting unstructured data (hence content management) and with understanding the importance of each datum to the enterprise (hence the need for a repository of "metadata").
Consider, for example, EMC. A few years ago, EMC got most if not all of its revenue from storage hardware. Today, as it moves towards ILM, EMC gets 37% of its revenue from software and 17% in services, much of the service revenue being not for storage support or not for storage alone. To put it another way:
- The majority of EMC's revenue now is not for storage hardware, but for software and for services related to software.
- The fastest growing parts of EMC's revenue are for software and related services, and of these, the fastest growing parts are for software and services not focused on storage.
- More than 10 % of EMC's revenue is not storage-related (a conservative estimate).
So if EMC is no longer just a storage hardware company, or even just a storage company, what is it becoming? Answer: an information company. A similar picture emerges when we look at IBM or other vendors. Necessarily, ILM means treating data at the level of information, not just storage. And that means that vendors, and users need to begin to plan to integrate storage, data management, and other information-related activities into an overall information architecture.
By the way, when we look at storage vendors' business compliance solutions we see a similar pattern. In order to allow e-discovery of saved email and documents, these solutions must handle semi-structured email and document data (hence content management), provide new ways of combining archiving and disaster recovery (hence systems management), and make it easy for lawyers to access key data quickly (hence the need for a repository of metadata). We might also note that storage consolidation services need systems management, and not merely storage management, capabilities; and storage disaster recovery services need systems management and ILM, and not merely replication, capabilities.
ILM might seem to be a storage-only topic, divorced from mainframes and System z9 — but in the long run, it is not. ILM offers major improvements in price/performance and scalability, with more rapid backup/recovery for smaller downtime windows. However, effective ILM also requires rethinking the storage architecture (dividing it into active changeable, active archived, and deep archived), and amassing new metadata and software from outside the storage subsystem.
As a result, the lines between storage and information management, and between storage and systems vendors, are blurring. IBM is well positioned to take advantage of these new trends, and bring the benefits to its customers — but the game is only beginning. At its end, storage suppliers who have acquired and integrated the necessary software will triumph — but not as storage suppliers.
That, in turn, will blur the lines of responsibility in the enterprise between the storage expert and other IT personnel. The proactive IT department can begin by reassessing the storage architecture — but should then go on to consider how to focus their efforts and improve their expertise in such areas as a global metadata repository. There is upheaval ahead — but the game is worth the candle.
About the author: Kernochan is the president of Infrostructure Associates, LLC.