Today, data handling in the average organization receives an unusual amount of attention. Data quality is receiving new scrutiny and master data management products seek to improve the consistency of multiple customer data records. Information integration is now considered necessary to extend querying to new data sources and more timely operational data. Easier-to-use data analysis tools in the form of "business intelligence for the masses" have begun to gain traction.
Most recently, the increased interest in event processing indicates that timely information delivery is seen as a competitive advantage. However, these aspects of data handling are rarely seen as steps in a process of converting a constant inflow of data into information useful to the recipient. Even rarer are comparative assessments of the importance of each of these aspects to data usefulness.
I define data usefulness as the ability to deliver accurate, consistent and appropriate data to the right user in a timely fashion. My recent assessment of the state of data usefulness in the average enterprise convinces me that there are significant and growing problems at every point in the process of converting data into useful information.
Table 1 shows my take on the typical steps in the data-delivery process, the metrics by which the effectiveness of each step should be judged and the problems that many are seeing today at each step. The key takeaway point is that fixes to one or two steps will not fix the overall data usefulness problem in the long run. Rather, organizations need to take a comprehensive, long-term approach to ensuring data usefulness.
As the premier data server, the mainframe is or should be the focus of efforts to improve data usefulness. So what should proactive IT strategists do to improve data usefulness on mainframes? And how will mainframe data usefulness improvements affect overall enterprise information delivery (keeping in mind that there is a lot of key data that the mainframe will never contain)?
Table 1: The data delivery cycle
|Data entry||Accuracy||Percent of data items with errors||Majority of businesses report more than 15% of items with errors|
|Data consolidation||Consistency||Number of data items with multiple records and no master record||Majority of businesses report more than half their data inconsistent|
|Data aggregation||Scope||Percent of data sources on which a cross-data source query can be performed||Majority of businesses report they can't do cross-database query on more than 2/3 of company data|
|Information targeting||Fit||Percent of time data delivered that is not appropriate to end user||Majority of businesses report more than 60% of the time, data delivered to executives inappropriate|
|Information delivery||Timeliness||Time taken to deliver (entry to arrival on screen) to average user||Majority of businesses report a week or more average time to deliver|
|Information analysis||Analyzability||Percent of time user can't immediately do online analysis of data received||Majority of businesses report can't do immediate online analysis more than half the time|
|Process adjustment||Agility||Percent of new outside data sources not available within 1/2 year||Majority of businesses report more than ¾ of relevant new Web information not made available inside the company within ½ year|
Mainframe data usefulness state of the art
If one looks at the mainframe in isolation, it has a relatively good record for data usefulness because it is primarily focused on structured data. This data -- the operational and accounting data that business-critical applications ingest -- is typically more centralized and less siloed, has more years of stress-testing data entry, and is a frequent depository of the results of data center, server and database consolidation. However, this relative effectiveness at information delivery is primarily the result of applying the mainframe to what it's good at; it does nothing to improve the overall data usefulness of the enterprise. For example, 80% (and increasing) of the important data in a typical organization is semi-structured (documents, emails) and unstructured (video, audio, graphics, pictures of documents) -- and much of this data is highly distributed, much of it presently outside the purview of the mainframe.
In the past, a reflexive answer to increasing data fragmentation has been to try to impose a standard database or data architecture over the enterprise's entire computing resources, preferably one that is centered around the mainframe or high-end servers. It is now generally recognized that these strategies will never entirely succeed. Database migration has proven difficult; the number of database vendors continues to increase as well as the amount of unstructured data not flowed to the data warehouse, and frequent mergers and acquisitions that "paper over" differences in data handling continue to proliferate "data archipelagoes." For example, a European grocer that grew by acquisition found a couple of years ago that it had several hundred distinct and conflicting customer records and that, practically speaking, these were being added to faster than any consolidation effort could migrate them to a central repository.
It's becoming clear that the answer involves the concept of an information hub: One central site that keeps track of all the data, coordinates information delivery and directs business-critical transactions, but that does not physically store or process most of that data centrally. Table 2 shows the long-term solutions to each data usefulness problem that are emerging, and how they relate to the idea of an information hub.
Table 2: Using the hub to help improve data usefulness
|Problem||Long-term solution||Data hub tie-in|
|Accuracy||Global standards||Hub enforces standards|
|Consistency||Master data records, global metadata repository||Hub stores key (customer, product, supplier) master records, repository|
|Scope||Cross-data source metadata and transactions||Hub stores global metadata repository, possibly data warehouse cache|
|Fit||Role/expert-based access rights and security||Hub drives roles into enterprise security scheme|
|Timeliness||Event-driven business processes and service-oriented architecture||Hub serves as message-flow director and enforces business rules in event handling|
|Analyzability||"Business intelligence for the masses"||Hub allows inclusion of data warehouse information in non-data miner querying|
|Agility||Business strategy that emphasizes reaching out for information||Hub supports automated inclusion of new data sources and data types in the enterprise data architecture|
As today's mainframe takes on more of the role of a hub, it is surprisingly ready to support these long-term solutions as they arrive. The emphasis in recent years on support for computing standards makes it relatively straightforward to coordinate multiple operating systems' enforcement of international data format standards that go well beyond EDIFACT and XML. IBM master data management solutions and services are now available on the z10, as are Ascential-based global repository capabilities. Coordination of Resource Access Control Facility and Unix/Linux security together with support for all major databases and cross-data source access controls -- including IMS data management -- is well along, although role/expert-based information delivery is not yet fleshed out. The mainframe fully supports IBM's recent emphasis on event-driven architectures, although enterprise use of the mainframe as an Enterprise Service Bus hub is not yet typical. The addition of Cognos and its popular business-intelligence tools that allow (by themselves or combined with the Ascential/Information Integrator technology) queries including but not bound by the data warehouse and structured data means that the mainframe's present use as a data warehouse can be extended more readily to the masses. Only in the area of including new Web content is the mainframe not clearly distinguishable from other platforms, and IBM efforts to deliver dynamic infrastructure and on-demand information will improve the mainframe's reputation in that area over the nextyear.
It's easy to paint a rosy picture of the potential of the mainframe to improve data usefulness. In fact, the picture is darker than one might think. With 50 to 60% yearly growth in the amount of data stored means that incremental improvements in one or two aspects of data usefulness -- or even all aspects -- may fail to improve overall data usefulness, expressed as the percent of data that the end user could find useful, or that he or she does use effectively. Moreover, supposing that the mainframe gets it right in the next year and offers a full spectrum of improvements in every aspect of its data handling functions, much of the data at many organizations is out there on PC client machines and server farms, not to mention existing at loosely linked partners and suppliers.
The way to think about improving data usefulness in the short term is with a comprehensive, all-aspects effort aimed at cost effectiveness (by using existing solutions in a information hub) and "good enough" improvements that are better than one's competitors. If a competing organization takes six milliseconds to act on arbitrage opportunities that you react to in seven, that's a clear competitive disadvantage; but if you reduce your reaction time from 5 milliseconds to 3, additional benefits are less clear cut. Therefore, a data usefulness strategy is more important for the competitive disasters it avoids than the (significant) benefits it offers. Mainframers looking to implement such a strategy should accept that improvements will take time, and that there will continue to be data usefulness problems no matter how savvy the implementation.
That said, today's data usefulness problems are very important. Things can be improved and the mainframe can play a key role in those improvements. The keys to turning potential into reality are, first, an honest look at your organization using the metrics I cited above; second, a comprehensive, long-term plan for making things better; and third, a corporate willingness to accept that each quick-hit master customer record coordinated from a mainframe partition using zIIP does not solve the overall problem, and there is more still to do.
ABOUT THE AUTHOR: Wayne Kernochan is president of Infostructure Associates, an affiliate of Valley View Ventures. Infostructure Associates aims to provide thought leadership and sound advice to vendors and users of information technology. This document is the result of Infostructure Associates-sponsored research. Infostructure Associates believes that its findings are objective and represent the best analysis available at the time of publication.