Good test data is invaluable to the data center. Developers need disciplined, repeatable and reliable mainframe...
testing to ensure quality production code; maintenance programmers use test data to recreate and fix problems; capacity planners, database administrators and performance analysts rely on data to mimic and predict production behavior.
Mainframe testing data isn't easy to procure. Rapid application development teams require tailored, consistent data that can be repeatedly refreshed. Code developers need isolated data sets to ensure testing cycles don't conflict. As a result, programmers or their database administrators spend a lot of time cloning, cleansing and customizing data for each development cycle.
Maintenance programmers have their own problems. Recreating a production problem in tests isn't as simple as pulling all the rows for a given customer from one database. Production databases typically consist of intricately interrelated tables -- any inconsistency throws off the application. Failure to copy a particular table, field or segment could nullify the problem recreation attempt.
Systems programmers always try to predict production performance from test metrics. It usually doesn't work. Mainframe test data stores tend to be much smaller and more consistent than the real data and processing in production. For example, a DB2 table scan on a 2,000-row mainframe test table performs much better than it would on a billion-row production table.
Test systems' security is more lax than what's in place for production. This works fine for programmers and administrators who must get things done quickly, but lower security can leave customers' personally identifiable information (PII) exposed to misuse by persons not otherwise authorized to view it in production.
Roll your own
There are several ways to attack mainframe testing problems with differing levels of sophistication and expense.
Maintenance programmers need two sets of utilities: One to extract data from production and another to load it into test. The challenge is ensuring the utility harvesting data from production gets all the content needed for internal consistency. The omission of one data item or row might invalidate the test, causing it to examine something other than the intended target.
There are database utilities that will ease some of the maintenance that goes into RYO test solutions. Data unloaded with a database utility on the production side can transfer and load into test databases. Clever programmers can mask this data prior to testing by reverse engineering the unloaded rows.
Data masking involves overwriting PII -- names, addresses, Social Security numbers -- with randomly generated data. Work with the security department to identify fields containing PII and generate rules for anonymity. The hard part is ensuring all interrelated fields have the same random values. Internal data consistency is key.
The cheapest resolution is a set of home-grown utilities. Developers create custom utilities to generate test data for new or modified fields. For cyclical mainframe testing, other utilities clone data from a common source and customize it for a given development stream.
Mainframe testing tools
The people closest to the data should also be the best at manipulating it. However, while RYO saves money, it also means programmers who would otherwise be adding functionality for customers are devoting time to creating something that could be purchased. RYO utilities may also need a lot of maintenance depending on the application and underlying data's volatility. Mainframe testing tools from vendors could be the more economical and productive option.
Tools for test data management include IBM's Optim, DataVantage's masking product for z/OS, Grid Tools' Datamaker suite, Informatica TDM, Compuware's File-AID based offerings and more. Each of these tools delivers a mixture of abilities and opportunities for automation.
Just because you have an out-of-the-box testing tool doesn't mean your work is done. Programmers face the substantial task of telling the tool which databases to copy, their interrelationships and, if data masking is available, which fields to scramble.
With a commercial tool, the maintenance burden should be lighter and turnaround faster than with custom home-grown utilities. Maintenance entails updating the tool's rules to reflect application changes.
Performance testing in development is a different problem. Most mainframe shops are reluctant to spend money on two production-sized data farms for performance testing.
Some vendors offer tools that copy DB2 access path information from the production system to test. Once the information is in the test environment, the DB2 optimizer makes access path choices similar to those made in production. While this makes for more accurate tests, it does not solve the issue of the smaller test tables than reality in production.
About the author:
Robert Crawford spent 29 years as a systems programmer, covering CICS technical support, Virtual Storage Access Method, IBM DB2, IBM IMS and other mainframe products. He programmed in Assembler, Rexx, C, C++, PL/1 and COBOL. Crawford is currently an operations architect based in south Texas, establishing mainframe strategy for a large insurance company.