In last month's column I wrote some details about CICS statistics records and what they contain. This month I'll concentrate on various ways they can be used to report and predict your system's behavior.
To get the most out of your statistics records you need software to gather, archive and report on them. IBM sells CICS Performance Analyzer to read statistics and store them in DB2. It also provides canned reports for the more interesting aspects of CICS performance. Merrill Consultants offers MXG which is a massively comprehensive SAS system capable of processing SMF, RMF, DB2 and CICS records. CICS itself comes with two freebies, DFHSTUP and DFH0STAT. DFHSTUP is somewhat limited and verbose, making it hard to find what your looking for and even harder to summarize. I don't have much experience with DFH0STAT, although I understand the reports are friendlier. It can also run online if you want to see the statistics as they're gathered.
In a fit of hubris I decided to write my own SAS programs. Not only does it give me total control of what I gather and when, I can easily create derived fields from the original data. In addition, I can summarize the variables as I want and control records archival. We configure our CICS to cut statistics records just before the hour. My system summarizes the hourly records into daily observations. The daily summary records are gathered into monthly tapes while I keep the hourly statistics for seven days. With this set up, in five minutes I can tell how many times one of my regions went short of storage in September 2004. Just in case anyone is interested.
Once you have a strategy for gathering and archiving statistics you may begin using them to study your system. First you must decide which resources mean the most to the smooth operation of your system. Is it temporary storage? Is FEPI important to any of your critical applications? Second, you may pick some resource usage thresholds that when crossed may serve as an early warning. First instance, you may want to see a report for any region that has reached 80% of max task (MXT).
We have a daily job that reads statistics gathered from the previous day and sends the report via e-mail to me and my team's group account. The report includes the following exception conditions:
- Short-on-storage (SOS) conditions
- Storage violations
- Transaction classes (TClass) limits reached
- MXT conditions
- File string waits
In addition we report on the following thresholds:
- Regions that reached more than 80% EDSA
- Over 70% of temporary storage or intrapartition transient data queue CI's used
- Regions that have reached 80% of MXT
- LSR buffer pools with less than an 80% read hit ratio
These reports are wonderful to get but you must be careful to limit it to thresholds and exception conditions you really care about. A report that is ignored because it's too long is almost as bad as no report at all. In addition, there may be some resources you either know will give you trouble or you have deliberately configured for limited access. For example, one of our systems used a VSAM entry sequenced data set (ESDS) as an application log. According to IBM's manual, you're supposed to set the string number of a write-only ESDS to 1. We did just that which meant I had to put logic in our reporting program to ignore string waits for that file so as to avoid cluttering up the report.
Although they are less useful for real-time problems, CICS statistics can serve as a wonderful back-end diagnostic tool. They will tell you about significant events you may have missed as well as show you where tomorrow's problems may be coming from. Just pick what you're interested in (I suggest remembering the last performance problem for which you got yelled at) and figure out a way to get them.