Encryption, speed push the modern mainframe into the future

michelangelus - Fotolia

This article is part of our Essential Guide: Emerging data center workloads drive new infrastructure demands

IBM Machine Learning for z/OS deepens mainframe data analysis

New IBM machine learning capabilities let data center teams pull insights from their z/OS mainframes. But concerns about data management and cost will likely arise.

IBM likes to tout statistics that show how mainframes host a large portion of the world's corporate data. They have also noticed that users love to mine that same data for insights into their own customers' behavior. However, most organizations that want to digest and pull insights from that information go through long and expensive extract, transform and load processes to move that data onto an analytics platform.

To change that, IBM introduced Machine Learning for z/OS, which allows customers to perform analytics right where the data lives -- on the mainframe -- to avoid expensive and time-consuming ETL processes.

On the surface, this sounds like a no-brainer. Making deep analytics immediately available to operational mainframe systems is an excellent way to pursue common machine learning use cases such as fraud avoidance and customized customer experiences. This same model could help the financial and insurance industries make complicated, analytical decisions faster and with the added security and reliability of the mainframe.

But there are some potential hurdles for data center teams to note.

The technical makeup of IBM Machine Learning for z/OS

Machine Learning for z/OS consists of many components spread across z/Linux -- also called Linux on z Systems -- and z/OS. For the most part, the development, user interface and scheduling services run in z/Linux. The rest runs in z/OS, the bulk of which is Apache Spark. Nearly all the software is open source.

On the z/Linux side, IBM uses Jupyter as a robust environment for model development, maintenance, documentation and collaboration. IBM offers several options for software to schedule and deploy the models.

Inside z/OS, an Apache Spark cluster does the heavy work, and the model metadata lives in DB2. Along for the ride is a WebSphere Liberty profile-scoring service. Security depends on a Lightweight Directory Access Protocol server, which IBM's own Resource Access Control Facility can supply.

The heart of Machine Learning for z/OS is the Mainframe Data Service for Apache Spark (MDSS). MDSS allows Apache to access DB2, information management system, virtual storage access method and Systems Management Facility (SMF) data. This is the crucial access point that allows modeling and analytics engines in Spark to read mainframe data directly.

At first, SMF seems like an odd choice for data mining, as it rarely contains any business information. However, it is valuable to systems programmers, capacity planners and performance analysts. Machine Learning for z/OS will provide insights into mainframe performance, which, in turn, saves on software expenses. Additionally, organizations could use the results for automation and problem determination.

Cost, data management challenges

One of the biggest impediments to Machine Learning for z/OS is the cost of running the software on z/OS. Modeling and analytics are CPU-intensive by nature, which, ultimately, adds to customers' Monthly License Charge (MLC) expense. Fortunately, IBM intends for Spark to run on at least four System z Integrated Information Processors (zIIPs) to reduce general Central Processor (CP) consumption and avoid the MLC increase. You'll still have to be careful not to let computing bleed over to CPs. A generosity factor, which determines how much of a subsystem's processor is zIIP-eligible, might also apply. However, MLC charges don't apply to the Integrated Facility for Linux engines on which z/Linux components run.

It may be problematic to get the data. Some IBM products, like DB2 Analytics Accelerator, gobble and store entire databases, then apply updates when the user desires. There isn't a similar process for Machine Learning for z/OS. Instead, each model execution may have to read entire databases or representative samples, which can be time-consuming. Mass ingestions of data during the online day could create problems for the transaction processing systems that serve external customers. There are several mitigations to this that include copying the pertinent data to a nonoperational system or scheduling ingestions overnight.

According to IBM documentation, it doesn't look like operations systems will have the ability to run the machine learning models themselves. However, an enterprise data scientist might be able to externalize a model's results in the form of a table, rules or COBOL program.

While this isn't IBM's first cut at mainframe analytics, running the analysis itself on big iron is a bold idea. IBM bets that mining operational data and making analysis available to operational systems without ETL is worth the gamble. It's worth watching this space to see how IBM evolves Machine Learning for z/OS in the future.

Next Steps

Manage and optimize z/OS software costs

Track mainframe performance with these tools

Z/OS Connect boosts communication with mainframe apps

Dig Deeper on IBM system z and mainframe systems