RFID (radio frequency identification) implementation continues apace. Standards for such tasks as handoff between suppliers are arriving, and the bulk of large enterprises are now poised between pilots and basic implementations. The set of leaders now includes not only WalMart and Department of Defense but also Procter & Gamble/Gillette.
Basic implementations use an RFID printer to slap a tag on a product or component during production or on entry into a warehouse, then monitor its progress through a warehouse by periodic pinging using stationary RFID readers. Thus, the core data in RFID implementations -- unique identification of a product -- is always accompanied by semantics, or metadata, or data about data, describing where that product is in a business process, or its physical location. The core data typically never changes; the metadata changes very frequently. To put it another way, RFID is metadata in motion.
Three other characteristics also make RFID data management different:
- RFID reading is imperfect, and likely to stay so. Thus, when stacked on a pallet, a widget is readable from some angles and not from others. A sudden inability to detect the widget may mean that it has been lost, or that it cannot be read. Sophisticated filtering software can reduce the amount of these false negatives, but data management at the local level must monitor, alert, and handle false negatives in order to aid the manager of the warehouse, while cleaning the
- data as far as possible so that it can be used by enterprise data miners.
- RFID data has the potential to be the flood to end all floods. Each widget is constantly generating a new ping as it moves through the production and distribution process. Even with sophisticated filtering software, there may be thousands or tens of thousands of updates of RFID data (i.e., changes to where tens of thousands of products are in a business process or where they are physically) per minute at peak load -- an exceptional OLTP situation. If we add the requirement that we track where each product has been, say, every half hour (a process trail), then we also generate an enormous amount of static data. One vendor touts its ability to handle 4 TB (terabytes) of data generated per day.
- More than any previous type of data, RFID data is metadata in motion across organizations. The intent is that each WalMart or Department of Defense supplier not only tag its products but also pass the RFID data for a product from supplier to customer, all the way down the supply chain. Thus, WalMart can query its supplier for the status of an item, or Dell can track a shipment from customer order to delivery -- and beyond. The standards organization epcGlobal is already defining standards for the handoff between organizations. Every RFID data store is potentially accessible from outside the organization; every RFID data management system potentially queries across organizational boundaries.
Because RFID has arrived, and because RFID data management is different from anything that has come before, databases must be retuned or re-architected (by the vendor or the user) to handle the new needs of RFID transactions -- as has happened with data warehousing, content management, and Web data in the recent past. Specifically, enterprise information strategists need to answer two questions:
- What is the appropriate database architecture for RFID data?
- What type of database is best for handling RFID-type transaction streams and data operations?
Note that in the long term, there is a third question about RFID data: how can my organization leverage RFID data most effectively, by itself or in combination with existing data?
RFID architecture: The three-level solution
The appropriate database architecture for an RFID implementation reflects the appropriate overall architecture for that implementation. Real-world implementations are tending toward a three-level overall architecture:
- At the lowest level, a buffer receives the data from RFID readers and printers, filters it for duplicates, cleans it if possible to eliminate false negatives, and adds semantics to the data for use at the next level up. As of yet, no database is typically attached to this level -- but an embedded database such as Sybase iAnywhere could be.
- The middle level is the local level. This typically involves one physical location, such as a warehouse, and the aim here is to allow the manager at the location (such as a warehouse manager) some ability to monitor, fix problems, and do analysis. Often the local level allows the user to define workflow within the physical location. A local database supports these tasks. The database may also add semantics to the core data, both for local workflow and to support global analysis at the top level.
- The top level is the organization or enterprise level. At this level, the user may monitor across locations (and across organizations), define workflow across the entire business process, and perform business analysis on the combined data from multiple local databases, either separate from or combined with other enterprise data. Thus, the RFID database may be separate from other enterprise databases, or may be flowed to a data warehouse or OLAP database. Note that at the organization level, a user may reach across not only to access another organization's enterprise RFID database, but also to probe its local-level databases.
As a result, the database architecture is often double three-tier, in which three client/application/database-server tiers operate at both the local and enterprise level.
The right database for the RFID job
RFID implementers should use the following criteria when choosing databases for the local and enterprise levels:
- The local-level database should be highly scalable, because of the potential for a flood of data, and close to real-time in its performance, because of the need for occasional quick reaction to RFID problems.
- The local-level database should support network and systems management capabilities, allowing alerts, console-type monitoring, and coordination with enterprise systems management solutions.
- The local-level database should support workflow semantics, allowing the user to track where in the process a widget is.
- The local-level database should provide some ability for the warehouse manager to analyze RFID data, e.g., to optimize the local part of the business process.
- The enterprise-level database, separately or combined with other data, should provide semantics to allow a combined analysis of RFID and non-RFID data (e.g., POS data and customer records).
- The enterprise-level database should support powerful analysis tools (e.g., OLAP).
- The enterprise-level database (or an EII tool) should allow access to other RFID data stores, at least at the enterprise level and if possible at the local level, in other organizations.
- The enterprise-level database should be highly scalable, to handle a flood of RFID data. It should do especially well at update- or insert-type transactions. It may make sense to move historical data to a separate static database, to avoid problems with the query from hell.
- The enterprise-level database should have some capability of pushing data down to the local level, as when the local manager needs to be alerted that products are scheduled to arrive. The ability to pull data (e.g., trigger a local-level upload of the latest data about a particular product) may also be useful.
Existing databases within the enterprise have been optimized for different transaction patterns than RFID affords. Therefore, users should plan to tune the chosen database for RFID transaction patterns.
Leveraging RFID data
Clearly, the major benefits of an RFID database accrue from use of an organization-wide RFID database at the enterprise level -- all that the local warehouse manager can improve is turnover for one warehouse.
In the long term, RFID infrastructure software -- including an RFID database -- can deliver positive bottom-line impact. RFID infrastructure software allows greater control over the supply chain, and therefore greater optimization for bottom-line expense-cutting. RFID infrastructure software ensures that RFID delivers a large amount of new actionable information to the corporate decision-maker, potentially both at the retail level (how does product placement on the shelves relate to buying behavior?) and at the production and distribution levels (are we forcing product on a vendor further down the chain?). In fact, RFID data, when enhanced by RFID-infrastructure-software semantics, can provide a source of customer satisfaction, allowing buyers to monitor shipment more closely across the supply chain.
To achieve this, RFID users should emphasize the analytic capabilities of their RFID database, either via BI tools or OLAP. Moreover, RFID implementers need to create a global metadata repository that captures the relationships between the RFID product-process data and existing customer and product data. An EII tool is effective at doing this; or if higher performance is needed, a new data mart may be created. It less likely that inserting RFID data in the organization's existing data warehouse will do the trick, as RFID data is only semi-structured, and the likely large size of the RFID data store may overburden an already-stretched data warehouse.
State of the art
How close are we to achieving an RFID architecture that can handle a flood of metadata in motion?
One easy way to tell is to examine the offerings of today's RFID infrastructure-software suppliers. These are the first line of defense against arriving RFID data ─ they offer at least the buffer level of RFID-data filtering and cleansing. Therefore, they are the minimum software necessary to handle a slap and ship implementation of RFID that requires at least one read from an RFID reader for each widget shipped. Since slap and ship is as far as most RFID implementers have gotten, these RFID infrastructure suppliers' offerings give a good indication of how far we have gotten towards achieving a good RFID data architecture.
As Table 1 shows, these suppliers depend on third parties or the user to design an RFID architecture for handling a flood of data at the local or business level. All depend primarily on third-party data connectivity for EAI (Enterprise Application Integration) and on third-party databases of the user's choice. In effect, it is up to the implementer (with some assistance from service providers) to choose the right databases and software for an RFID architecture and integrate them.
Table 1: RFID unfrastructure-software vendor data architectures
|Supplier||Buffer level||Local level||Business level||Connectivity|
|OATSystems||No database||In-memory database (handles moderate amounts of data)||Supports Oracle with Oracle OLAP Option or IBM, provides own database and analytics||Third-party partners, adapters|
|ConnecTerra||No database||User chooses database and implements transactional code||User chooses database and implements transactional and replication code||Third-party partners|
|GlobeRanger||No database||User chooses database and implements transactional code||User chooses database and implements transactional and replication code||Third-party partners|
How DB2 can help
DB2 holds promise as a global metadata repository, and as a well-integrated data mart. Also, DB2 Express is an attractive candidate for a local-level database.
As noted in a previous article, DB2 can be a highly effective way to implement a metadata repository. Its scalability, robustness, and long experience with data dictionaries (per-database metadata repositories) make it a logical choice. DB2 has long demonstrated scalability in TPC-D benchmarks (aimed at measuring database querying prowess). Its ability to store data in XML format and perform operations on that data, combined with this querying performance, suggests good performance in queries on process-flow metadata.
DB2 has also proven itself as a data mart, as evidenced again by its scalability in TPC-D benchmarks and in real-world data warehousing.
DB2 Express is attractive as a local-level database because of its ability to handle systems management, workflow-semantics data, and querying. Also, as part of the DB2 family it connects well to enterprise-level DB2, and its particular strengths are as an embedded database on top of which users can create warehouse-level applications.
We conclude that:
- An RFID architecture requires data management that is fundamentally different from previous data-handling tasks, because it must not only store a greater quantity of real-time data than ever before but also handle a new kind of data that we call "metadata in motion."
- Because RFID infrastructure suppliers typically defer to third-party suppliers in RFID data-handling, and because DB2 and DB2 Express can serve at the enterprise and local levels of an RFID data architecture, respectively, DB2 is in a strong position to act as the database for a wide range of RFID implementations, both slap-and-ship and long-term.
We would also urge that both suppliers and users turn their attention to data management as a key to long-term RFID success -- or failure. As Robert Reich notes in his book, "The Future of Success," this is an era in which the consumer has unprecedented power to demand satisfaction from his or her suppliers. RFID is the frontier of those demands: it can allow the consumer visibility into the status of his or her order across the supply chain, creates expectations of swifter delivery and less under-stocking, and connects POS data analysis to data on "physically close" inventory for possible "just in time" delivery. However, none of this will be possible without effective data management.
An old ad for Dunkin Donuts has the store manager drag himself out of bed early in the morning, groaning "Time to make the doughnuts." In the same spirit, we advise readers that, yes, it is time to redesign the database and data architecture for RFID -- and to consider DB2 when they do so.
This was first published in March 2006