Problem solve Get help with specific problems with your technologies, process and projects.

Dumb debugging on the mainframe

When debugging on the mainframe, programmers often think a problem is more difficult than it really is. Rule out the simple possibilities for error first, counsels Robert Crawford.

The title for this column comes from a help-wanted ad I saw in the 1980s seeking systems programmers skilled in the art of "dumb" rather than "dump" debugging. First, this example demonstrates why you should always spell things out when you talk to HR. Second, it's one of those mistakes that reflects real-life situations when we make things harder than they have to be. What follows are some rules of thumb for debugging tasks that have made my life easier.

Rule out the simple things first
Some people respond to a problem by trying to find the most technically challenging root cause. However, an S0C7 in an application program is usually not indicative of a microcode bug in the coupling facility. Most of the time, the answer is something much simpler. Take a deep breath. Try to figure out whether any files are closed, what changed since yesterday or whether there's an abnormal condition that might drive a program into little-used recovery code. Most importantly, get a dump, find the actual error and ask yourself if this should happen given the circumstances at the time.

A CICS debugging anecdote
When a patient complains to his doctor, "It hurts when I do this," the simple solution and response is "Then don't do that." As another example, consider a corrupt CICS file that won't open. This problem comes at a bad time; there's also a bug in CICS that interpreted the bad open return code as a severe management module failure, causing CICS to abend repeatedly. Fortunately, with the help of IBMLink, we identified the problem and the bug fairly quickly, but the possibility of a fix opened a can of worms. Do we cold-start CICS? Should we apply an authorized program analysis report as an emergency fix?

The answer was easy: Disable the file so that the program would fail with a disabled file condition and CICS wouldn't try to open it. Once CICS stayed up, we were able to deal with the corrupt file at our leisure.

The point is, we're sometimes victims of self-inflicted wounds, often from design decisions whose full import isn't felt until the thing hits production. In this case, the answer to the question "Should we take out this code that does a sequential search of a 500,000-row table?" is "Hmm, yes." Remember mainframe programming 101
If you don't have the source code for the failing program, debugging suddenly becomes much harder. While you can't read the mind of the programmer who wrote it, you might be able to figure out the code's intent and the reason it's not working.

Years ago, with procedural languages, we used things like loops, function calls, arrays, linked lists, queues and stacks. Now, in the 21st century and with the advent of object-oriented programming, we've evolved to using loops, function (sorry, method) calls, arrays, linked lists, queues and stacks. A simple hand disassembly of the object code may tell you what's going on. If there's a loop, you might be able to figure out the kind of data structure on which the program is operating and check the structure's integrity. If it's a long list of assignments, you might be able to figure out whether something corrupted the base registers. Save areas, just like Java stack traces, are handy ways to figure out what was called and with which parameters.

This is kind of a serenity prayer for programmers. Everything on a computer happens for a reason, and the processor does exactly what it's told. Therefore, no matter how bizarre the error, there is a reason it happened. Your job is to figure out the conditions leading up to that point that are consistent, logical and make sense in a given context. So unless you're the victim of the somewhat rare cosmic ray-induced bit flip, there's a bug in a program somewhere.

I can imagine two reactions to this column. The first might be "Duh! Tell me something I don't know." The second might be "This is all fine and good, but your glib advice won't help me fix this thing." I accept the criticism. The point of this column is not to give how-tos or specific guidelines for debugging Instead, these are things to remember when you're on the phone at 2:00 a.m., the Web site is down and the VP in charge of everything is loudly demanding to know when it will be fixed.

Debugging is one of the most challenging and rewarding parts of my job. But I find the task easier when I don't panic first. ABOUT THE AUTHOR: For 24 years, Robert Crawford has worked off and on as a CICS systems programmer. He is experienced in debugging and tuning applications and has written in COBOL, Assembler and C++ using VSAM, DLI and DB2.

Dig Deeper on IBM system z and mainframe systems

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.