During the last three years, enterprises have sought for the first time to make sense of the phrase, real-time...
enterprise. The idea is to create IT infrastructure that enables any member of the organization, at any time, to detect key new events (occurring inside or outside the enterprise) and react to them immediately with the appropriate decision and action.
It turns out that this is easier said than done. However, there is a new combination of technologies that can reverse the trend to data feudalization -- a combination that Infostructure Associates calls the virtual operational store (VOS). What is a VOS? How can it help? And how can DB2 play a role in implementing a VOS? The problem with present real-time enterprise strategies.
While definitions of real-time enterprise (RTE) abound, they have in common these characteristics:
- RTE supports on-demand access to key information.
- Effective use of this on-demand information means competitive advantage for the companies like Wal-Mart and Dell that seem to achieve it.
- To become an RTE, companies must make mission-critical business processes (and decision support/business intelligence capabilities) that access this information faster.
In other words, the prescription for becoming an RTE is seductively simple:
- Identify the information and business processes that are mission- or business-critical and for which improvements in speed will give you a competitive advantage.
- Speed them up.
Just like that.
But this RTE strategy is surprisingly likely to result in long-term failure:
- No database or data-access mechanism can deliver instantaneous access to the terabytes and even petabytes of mission-critical and business-critical information in the average or large enterprises. As the failure of the data warehouse proves, the situation is getting worse, not better. As storage size at Fortune 2000 enterprises doubles every two years or less, major-database speed improves by 40% almost every two years.
- The proliferation of in-house and extra-enterprise data sources that can be used for competitive advantage means that the faster the IT executive runs to include new operational data sources in the RTE, the more new data sources there are to probe. This means the enterprise is falling farther and farther behind the ideal RTE.
Meanwhile, RFID has begun to loose a flood of new time-critical data on Wal-Mart and its competitors, not to mention increasing use of data from extra-enterprise sources such as information from suppliers or customers, or even Web services that provide real-time information such as commodity prices, interest rates, or even weather.
By creating a virtual operational store (VOS), IT avoids the major problems with a typical RTE strategy:
- By accessing a much smaller VOS database, IT ensures almost-real-time response to data-mining queries, customer-facing updates, and business-process-speeding complex transactions.
- By semi-automating addition of new data sources and data types to the VOS, IT can keep pace with the rapidly growing demands of rapidly growing data sources.
The Nature of a Virtual Operational Store
A VOS is a relatively small amount of fast-access operational data that allows the system to deliver, on average, 90% of the performance speed of an ideal on-demand cross-enterprise database.
VOS harks back to the idea of virtual memory. Virtual memory provides a cache of fast main memory to front a much larger (by 1-3 orders of magnitude) disk storage area. When the processor in a computer accesses the main memory and finds that the data it needs is not there, it places some of the data in main memory on disk (swaps it out), accesses the disk to get the data it needs, loads that data into main memory (swaps it in), and then proceeds. By using sophisticated rules of thumb, the computer can ensure that the data it needs is in memory and therefore the computer will perform at 90% or more of its potential performance if all of the disk storage had been main memory instead. The old joke about virtual memory is to cup your empty hands and say "I have in my hands ten terabytes of virtual memory" because, effectively, the system acts as if all of storage was main memory.
In the same way, a virtual operational store aims to ensure that, most of the time, key data that is likely to be accessed on demand is available in the data store and can be retrieved at speeds comparable to that of a major popular database or faster, because an ODS is optimized for just such a situation. To achieve this, the VOS uses rules of thumb for swapping in and swapping out similar to the virtual-memory approach:
- Age (or Currency). The older the data, the less likely to be accessed for on demand purposes often, week-old data is unlikely to be accessed again except in rare circumstances. The typical enterprise without a VOS determines this ad-hoc, intermittently, and occasionally; the VOS determines this semi-automatically, flexibly, and continually.
- Locality of Reference. The RTE typically uses data in some data sources far more than others. The typical enterprise without a VOS sets up a data warehouse, which uses these data sources exclusively, comprehensively, and permanently. The VOS keeps track of usage patterns and adjusts accordingly, thus using all data sources as necessary, using only the data needed, and adapting dynamically as the patterns change.
- Least Recently Used. Not all data is used once and then forgotten; some (such as financial data) may be used frequently for at least a year or two. The enterprise without a VOS typically does not archive data based on least recent use; the VOS uses least recently used to avoid constant swapping of data based on its age.
- Data fixing. Sometimes, data must be rapidly available forever. For example, data to enable systems to get up and running quickly after a disaster, although rarely used and often quite old, needs to be rapidly available. The enterprise without a VOS typically solves this problem separately for each different data source; the VOS can data fix this information to remain in the data store until further notice, thus centralizing this type of key data and allowing readjustment over time.
In order to carry out its tasks, a VOS must therefore include:
- All of the key features of an ODS. These would include the ability to perform queries and updates rapidly, to scale, to import data from a wide variety of mission-critical and business-critical databases rapidly (with data transformation and cleansing), and to support access from key business applications such as data mining and reporting. They might also include the ability to synchronize with mission- and business-critical databases via two-phase commit, to export data to these databases, and to synchronize metadata with these databases.
- A global metadata repository that updates and expands semi-automatically. The repository would support a flexible data model that adapts dynamically to changes in the enterprise's data and is not constrained, for example, by relational schema definitions. This repository would be partially imported from, and refreshed automatically from, other databases' data dictionaries, and would contain information on the data's age, other locations, recent use, and whether it should be data fixed. The repository should allow data to be indexed in a way that allows business users to get the information they need. Ideally, a VOS will provide fully granular access to key information stored in business documents as well as relational databases. XML is a key enabling technology for reaching this goal, because every piece of data can be labeled for easy access (and indexing). Several EII suppliers offer most of these capabilities.
- Cross-database and composite-application development (and, if possible, administration) tools. Some ODSs and most Enterprise Information Integration (EII) suppliers now provide these development tools. These days, Web service development support is especially useful.
- Cross-data-source querying and updates. In some cases, it makes the most sense for the VOS not to load the data into the cache data store before carrying out a query. In these cases, most EII suppliers allow querying across a wide variety of data sources containing structured, semi-structured, and unstructured data types. Many EII suppliers also allow access to information outside the enterprise, such as data from trading partners, supply chain information, and data derived from Web services such as stock ticker data or even eBay prices.
State of the Art in VOSs
Where is the technology to support the VOS and the virtual RTE? It certainly is not in today's major data-warehouse solutions alone. These are often topping out and proliferating with no clear solution (despite the promise of grid computing) in sight. Moreover, these solutions are relational (optimized for structured data) and do not adapt easily to handling the increasing percentage of unstructured and semi-structured operational data, nor to storing XML's complex data types.
The technologies that offer real promise of solving upcoming RTE problems are EII tools and ODSs -- that is, an EII solution and an ODS specifically designed to handle a wide range of operational data, to scale, and to add new data sources and types semi-automatically. Separately, these two technologies have already proven themselves to be highly useful to the savvy enterprise. Together, they can deliver all the benefits of the separate products plus VOS capabilities that will enable long-term enterprise success in achieving the virtual real-time enterprise.
The Benefits of a VOS for the Real-Time Enterprise
The first benefit of a VOS is that it makes the real-time enterprise far more feasible. Therefore, a VOS should typically enable all the key benefits of an RTE:
- Rapid response to changes in the environment for competitive advantage and a better profit margin e.g., tweaking prices locally and hourly, identifying new opportunities based on changing market conditions, adjusting the supply chain rapidly to new orders, and detecting and reacting quickly to new regulations, competitive challenges, and disasters. In particular, the enterprise should see faster and more flexible reporting and analysis, complementing traditional BI.
- Speed-up of business processes to cut costs and increase customer satisfaction, because information from each stage of a process is supplied to the next more quickly (i.e., information latency decreases).
- Increased ability to leverage proprietary information for competitive advantage. Today's IT strategist, seeking to increase competitive advantage, can focus primarily on the application resources or the data resources of the enterprise. Until recently, it made most sense to focus on applications. However, these are increasingly easy for competitors to duplicate and proprietary information is not. Proprietary information is typically locked in widely differing and often aged data sources; a VOS makes that information more widely available.
Second, a VOS delivers significant side-effect benefits. For example:
- Today's storage efforts focus particularly on Information Lifecycle Management (ILM), attempting to move data from device type to device type (e.g., nearline disk to nearline tape) based on changing requirements for that data. Storage systems necessarily treat this data aging in a very crude and broad-brush manner, because they do not have fine-grained and up to date information about the likely uses of each piece of data in the near future. A VOS provides a centralized information base that ILM storage systems can plunder for insights into how useful various data sources and files will be to the organization in the near future, and therefore when they should be aged.
- A VOS can fit seamlessly into an enterprise's existing IT infrastructure, and leave existing databases intact and unaffected.
Third, if a VOS includes EII and ODS capabilities, it can deliver all the benefits that these technologies have already provided in the real world.
VOS and MDM
The smart reader will have already noted a surprising architectural similarity between a VOS and an MDM implementation. The VOS stores information about key data in a central repository; so does MDM. MDM typically stores part of that data in the repository and leaves part of it in existing data stores; so does the VOS. The key difference is that MDM focuses on access to master data; the VOS focuses on real-time access to actionable data of whatever type.
Table 1: Ways to use a VOS in key IT tasks
|Aim||Ways to use a VOS|
|Integrate business processes|| Replicate data from the data sources involved in the business processes to the ODS, allowing all related transactions in the business processes to be carried out at once within the ODS in a consistent fashion -- and then the resulting data is resynchronized with the original data sources.
Persist data views aggregated across applications within the ODS for improved query and update performance across the applications.
Program using the EII interface rather than the component applications, in order to carry out querying across business processes. The resulting code should use the EII tool's ability to combine data from different sources and allow analysis of data relationships across data sources, not just view data from different source side-by-side on the same screen.
|Implement Web services interfaces to existing information||Write Web services consumer code for each key enterprise application to invoke the VOS rather than each data source|
|Improve RTE capabilities by speeding time to react to current customer or supply chain data|| Replicate high-frequency-of-update customer or supply-chain data in the ODS.
Use the VOS to offload processing from back-end enterprise data stores and systems of record, speeding notification of new data and changes caused by reacting to new data. Leverage the VOS's extensibility to implement changes to a broader array of key data.
|Reduce time to develop new applications leveraging proprietary information, such as portals||Use an EII tool to front-end the data warehouse plus other proprietary information, allowing one set of code for all information access.
Use data persistence in the ODS to reduce the need for mapping and reconstruction programs for unstructured content.
|Audit all information in the enterprise for government requirements or Sarbanes-Oxley||Create a metadata repository in the VOS by importing or generating metadata from data warehouses/marts, ETL tools, and file systems and rich-media sources. Write audit queries to the resulting cross-data-source EII interface, or create a reporting application with modular report templates and reusable components.|
|Cut IT costs||Cut development costs via the VOS's ability to allow developers to write once to supply data access to many applications.|
|Attain better scalability of existing multi-tier applications||Implement a VOS to aggregate information (both structured and unstructured) from back-end systems of record and store it in the operational store. Synchronize the aggregated data views with back-end systems via the VOS. Thus, the user can offload query processing from back-end systems and satisfy the data requirements of multi-tier front-end applications with higher performance.|
|Merge companies||Using EII, create integrated reports about customers, suppliers, and the overall enterprise's financial position, e.g., over multiple data warehouses/marts.|
Infostructure Associates, December 2005
Nevertheless, the similarity of MDM and VOS means that users have an additional attractive option in implementing a VOS: first implement MDM, and then extend the scope of the data covered to all actionable data instead of just master data.
DB2's Uses in a VOS
The similarity between MDM and a VOS also means that DB2 can play the same roles as in MDM:
- Cache database.
- Metadata repository.
However, within these general roles, DB2's tasks will be slightly different.
IBM DB2 can be a highly useful cache database, for two reasons. First, it is able to deliver high performance for the relational format typical of common-format master data, while handling the more atypical semi-structured (text) and unstructured (graphics) that may occur. Second, it is well integrated with strong ETL, EII (Information Integrator), EAI (Ascential), and replication products, allowing high-performance conversion and transmission of data.
A VOS will typically have a higher proportion of semi-structured and unstructured data than MDM; so the XML and XQuery capabilities of DB2 will be more important to a VOS. Moreover, performance and scalability is of greater importance to a VOS than to MDM, because users want as large a proportion of the enterprise's actionable data in the central/cache database as possible for faster decision-making.
Again, DB2 can be a highly effective way to implement a metadata repository. Its scalability, robustness, and long experience with data dictionaries (per-database metadata repositories) make it a logical choice. Also, it is well integrated with Information Integrator, so that it can use Integrator's ability to semi-automatically go out and search for master data no matter what the data type, initially populate the metadata repository, and update the repository as new customer record types arrive at local sites.
A VOS's metadata repository must be more flexible in handling wide varieties of data types. Thus, DB2's XML storage capabilities are of high importance to the user.
Today, a VOS is not as important to a CEO as MDM, because while every CEO sees the importance of leveraging key customer or supplier data as much as possible, few see the real-time enterprise as more than a very long-term goal. Nevertheless, a virtual real-time enterprise is achievable much sooner than that -- and that it can be just as valuable as MDM. To prove the point, ask the CEO why today's popular dashboards can't show all the data that the CEO or CFO needs to see, and then point out that a VOS would solve that problem. And, of course, a VOS is less difficult to implement than MDM, because there is no requirement that all central data be stored in a common format, and therefore no need for enforcement committees.
So whatever happens with MDM, a VOS is valuable and doable. In fact, at a more small-scale level, suppliers such as Ipedo are already implementing VOSs. What DB2 brings to the table is IBM's clout, IBM's services, integration with Information Integrator (and replication), and the scalability of an enterprise database. In other words, DB2 and friends can be a natural fit for the VOS needs of many enterprises.
About the author: Wayne Kernochan is president of Infostructure Associates, LLC, a Lexington, Mass.-based analyst firm.