Tip

Troubleshooting mainframe application performance variables

Application tuning is a frustrating business. Changes that perform well in test have no effect when moved to production. A small modification to a program that isn't called very often suddenly inflates a batch job's elapsed time. Then there are the hours spent staring at CPU graphs whose values jump up and down for no discernable reason. All this brings up the question of how machines built to perform the same operations the same way in the same order can be so random?

Mainframe application performance can vary based on processor clock speed, the type of instruction being executed, and the potential for out of sequence instructions in the pipeline.

Processor clock speed
A principal -- and often misleading -- measure of processor speed is its clock. The assumption is that a faster processor's clock will have a higher frequency. This is true until you think about what the clock actually does.

The processor clock controls and synchronizes processor operations. It reminds me of when we ran wind sprints in P.E. We would all line up at one end of the field, and when the coach whistled we would run until he blew his whistle again. At that point we would freeze until the next signal. The operations inside the processor work on the same logic. At the start of a clock signal, data moves around, bits are tested or signals raised. At the end of the cycle, everything should end up where it it's going to be, awaiting the next interval.

    Requires Free Membership to View

The instructions
Knowing this, it becomes clear that some of the randomness we see in performance measurement has to do with the fact that what we call CPU is not measured by the number of instructions executed, but rather the number of clock cycles taken to complete an operation -- and the number of cycles may vary for many reasons.

For instance, some instructions take more cycles than others. A load address (LA) instruction is fairly straightforward, as it just has to add two registers together with an offset and put the answer in another register. A move character long (MVCL) instruction, on the other hand, is a miniature program in its own right, and the following steps are paraphrased:

  • Determine if the program has access to the source and destination addresses.
  • If the source destination number is zero, move the pad character to the destination address. Otherwise move the next source byte to the destination area.
  • Decrement the destination length register. If it is zero, the instruction is done.
  • Decrement the source length register if it is non-zero.
  • Increment the source and destination address registers.
  • Go back to step one.

Not only is the instruction more complicated, but a MVCL's execution time is proportional to the amount of data it moves. By extension, the CPU time reported in System Management Facility (SMF) for a program also depends on the amount of data it slings around.

The pipeline
Z architecture machines have an additional twist, as the processors use pipelines in an attempt to parallelize as much work as possible. This means an instruction or pieces of multiple instructions may be processing at the same time and out of order. This works very well until the processor reaches the point where one instruction depends on the results of another.

Say, for instance, a program has an LA instruction followed by a move character (MVC) that uses the value computed in the first instruction as one of its bases. Obviously the processor cannot execute the MVC until the LA completes, which brings things to a temporary stop. But even though there's a small pause in the processor, the clock still ticks, so the cycles spent waiting are counted as another quantum of CPU.

Thus, we understand instruction order impacts CPU time. Some of the smarter compilers and Assembler programmers know this. In addition to the instruction order, there is also processor cache. Z10 machines have levels 1, 1.5 and 2 cache, which take progressively longer to access from the processor's point of view. Then there's main memory, which, at CPU speeds, is nearly the equivalent of an I/O. Obviously it's nice if a program's favorite bytes are in level 1 cache. But, given limited space on a busy machine that spends a lot of time switching context (as mainframes are wont to do), this is not always possible. There will be times when the CPU pipeline must wait to fetch data from cache or memory, and each pause will be accounted for in clock cycles. Therefore, locality of reference also impacts CPU time because the same instruction may use varying amounts of processor, depending on where its data is.

Finally, the dispatcher can interrupt a program at any time. When the interruption occurs, the system spends lots of cycles handling the interrupt and redirecting the instruction flow to the proper handlers. Some of this may get charged to the victim depending on the type of interrupt and what needs to be done.

Conclusion
There are many reasons why the deterministic operation of a computer can look awfully random, and I can tell you from personal experience it will drive you nuts. However, tuning is not impossible, it's just a matter of eliminating the noise.

For anyone interested in knowing more about Z processor architecture, I highly recommend Robert Rogers' Share presentation, "How Do You Do What You Do When You're a CPU." Robert manages to clearly explain Z's pipeline architecture and its impact on performance. Anyone with access to www.share.org can find the presentation in the proceedings section. I have not been able to find it on the open Web.

ABOUT THE AUTHOR: For 24 years, Robert Crawford has worked off and on as a CICS systems programmer. He is experienced in debugging and tuning applications and has written in COBOL, Assembler and C++ using VSAM, DLI and DB2.

What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at  mstansberry@techtarget.com.

This was first published in June 2009

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.