Evaluating mainframe system monitoring tools

There are many system monitoring tools available for mainframes. This tip can help administrators analyze and choose the tool that fits their needs.

Timely and comprehensive monitoring is an important part of mainframe management. Fortunately, the mainframe has a long, rich history of tools that provide deep views into the system. This tip discusses various types of mainframe system monitoring software and offers a sampling of real-life tools that you can put to work today. I break system monitoring tools into three broad categories defined below: application performance analyzers, reporting systems and system monitors.

Application performance analyzers
Application performance tuning can be tough. Sometimes a problem can’t be reproduced in the test environment. System traces cause too much overhead and don’t contain enough of the right kind of information.

Tools specifically designed to gather detailed information while the application runs in production are coming to the rescue. Not only will a good application performance analyzer report on the application’s activities, some also make recommendations for reducing I/O and increasing database efficiency.

A couple of these tools, such as Compuware Corp.’s Strobe and Trilog’s TriTune, use a low overhead sampling technique that extracts information from the subject process every few microseconds. Then, after analyzing the samples and referencing a memory map, they identify heavily-used routines, or “hot spots.” After an analysis, the application programmer can use the information to figure out how to make the routines more efficient.

System monitors
System monitors apply probes (hooks) into system and subsystem code to collect data at a very low level. These tools can display a range of information, from overall system performance to address space details, and may have the ability to display any byte of memory in the logical partitions. Many system monitors also provide facilities for altering system settings and zapping storage, which must be used very carefully.

Some vendors offer system monitors in suites for different subsystems that operate under a common umbrella. The company with a z/OS monitor may have a CICS monitor that can link into a DB2 monitor providing “synergy” between the different products.

There are many varieties of system monitoring tools that provide various levels of depth and rigor, limited only by the amount of money a customer is willing to pay. The famous ones are IBM’s Omegamon, BMC Software Inc.’s MainView and CA’s Sysview.

Also note that a lot of automation tools seem to be closely tied to monitors. This makes sense, as the monitors have a wide eye on the system and would be the first to report trouble that requires an automated response.

Reporting systems
Monitoring tools are good at showing point-in-time data displayed through dashboards on flashy webpages. However, other pursuits, like capacity planning and detailed post-mortem debugging, require tools that can aggregate and report on mountains of data. Most tools of this type digest information already provided by the mainframe subsystems. For example,MXG can readily assimilate nearly every kind of System Management Facility (SMF) record that z/OS creates. IBM’s Resource Measurement Facility (RMF) and its accompanying tools report and display detailed resource usage information down to 60-second intervals.

Of course, raw information is good, but aggregating the data for analysis is better. SAS Institute Inc.’s IT Resource Management (ITRM) summarizes MXG-collected data nicely. IBM also has several tools to summarize data using input from subsystems, such as the CICS Performance Analyzer and various DB2 performance tools.

Attributes of a good monitor
In my experience, monitors are one of the most expensive types of software with prices well into six figures. Thus, it is important to look carefully before plunking down the money. Below is a list of aspects to look for, in my order of importance.

  • Information: An enterprise shops for a monitoring tool because it wants insight into how their system and applications behave, such as processing performance under real working loads. Ensure the tool reports what you want to see. The software must also present the information in a clear, concise way without ambiguity. There is no single way to present information, but it should meet your specific requirements. Lastly, a good aggregator should enable you to collect, summarize and analyze data simply and elegantly.
  • Overhead: Live system and application monitoring does not come for free, as it will always cost some CPU or other computing resources. A good monitor will have a light touch, or at least an overhead commensurate with the volume of information collected. Some of the better ones allow users to choose a level of detail and, thus, the overhead. Imagine how embarrassing it would be to chase an elusive performance problem only to discover the monitor is keeping the system busy. Overhead is less of a concern for reporting systems, although performance can become an issue if the daily collection cycle runs for 36 hours. Also note that aggregators tend to keep a lot of information so the expense of data storage must also be considered.
  • System interfaces: To collect low-level information, monitors must apply software probes (or “hooks”) to the target system. There are many good and bad ways to set the probes. The best ways include utilizing IBM-provided exit points and documented interfaces. Software that requires relinking system modules or, worst of all, modifies system code in memory are less desirable. A bad interface implementation can cause a lot of problems (see below).
  • Reliability: System monitor or application performance tools operate on very sensitive applications on the mainframe, and this demands fast, nearly bug-free code. The powerful facilities that a monitor provides call for solid security and control. Consider that a bug in the monitor software could take out an important subsystem, such as JES2, whereas another monitoring tool without strong security could allow inexperienced or venal users to compromise the system. No one, least of all your management, would be happy to reload a system because a monitor went south.
  • Integration: As mentioned above, some monitors come in suites that work with various subsystems as well as the operating system. Along with a consistent user interface, a well-integrated interface could tip the balance in favor of one monitor. For example, a monitor that identifies a CICS region using too much CPU is good. A monitor that allows you to drill down from the system performance level to find the looping CICS transaction is great. Another form of integration allows aggregators to ingest output from monitors for seamless end-to-end reporting.
  • Ease of maintenance: Some tools may be difficult to maintain because of odd system interfaces (see above). However, good software will follow IBM's recommended installation and implementation procedures.
  • Presentation: Web user interfaces are pretty, easy to use and read. However, a pretty front end must be filled with relevant and extensive information. To me, it’s more important to get the necessary information, even if it means logging onto a text-based screen.

About the author: Robert Crawford has been a systems programmer for 29 years. While specializing in CICS technical support he has also worked with VSAM, DB2, IMS and assorted other mainframe products. He has programmed in Assembler, Rexx, C, C++, PL/1 and COBOL. The latest phase in his career finds him an operations architect responsible for establishing mainframe strategy and direction for a large Insurance company. He lives and works with his family in south Texas.

Dig Deeper on IBM system z and mainframe systems