Dumb debugging on the mainframe

Dumb debugging on the mainframe

Robert Crawford, Contributor
The title for this column comes from a help-wanted ad I saw in the 1980s seeking systems programmers skilled in the art of "dumb" rather than "dump" debugging. First, this example demonstrates why you should always spell things out when you talk to HR. Second, it's one of those mistakes that reflects real-life situations when we make things harder than they have to be. What follows are some rules of thumb for debugging tasks that have made my life easier.

Rule out the simple things first
Some people respond to a problem by trying to find the most technically challenging root cause. However, an S0C7 in an application program is usually not indicative of a microcode bug in the coupling facility. Most of the time, the answer is something much simpler. Take a deep breath. Try to figure out whether any files are closed, what changed since yesterday or whether there's an abnormal condition that might drive a program into little-used recovery code. Most importantly, get a dump, find the actual error and ask yourself if this should happen given the circumstances at the time.

A CICS debugging anecdote
When a patient complains to his doctor, "It hurts when I do this," the simple solution and response is "Then don't do that." As another example, consider a corrupt CICS file that won't open. This problem comes at a bad time; there's also a bug in CICS that interpreted the bad open return code as a severe management module failure, causing CICS to abend repeatedly.

    Requires Free Membership to View

    When you register, you’ll also receive targeted alerts from my team of editorial writers and independent industry experts with the latest news, tips, and advice to help you do your job more efficiently and effectively. Our goal is to keep you informed on the hottest topics and biggest challenges faced by IT professionals today working with data center technologies.

    Margie Semilof, Editorial Director

    By submitting your registration information to SearchDataCenter.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchDataCenter.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

Fortunately, with the help of IBMLink, we identified the problem and the bug fairly quickly, but the possibility of a fix opened a can of worms. Do we cold-start CICS? Should we apply an authorized program analysis report as an emergency fix?

The answer was easy: Disable the file so that the program would fail with a disabled file condition and CICS wouldn't try to open it. Once CICS stayed up, we were able to deal with the corrupt file at our leisure.

The point is, we're sometimes victims of self-inflicted wounds, often from design decisions whose full import isn't felt until the thing hits production. In this case, the answer to the question "Should we take out this code that does a sequential search of a 500,000-row table?" is "Hmm, yes." Remember mainframe programming 101
If you don't have the source code for the failing program, debugging suddenly becomes much harder. While you can't read the mind of the programmer who wrote it, you might be able to figure out the code's intent and the reason it's not working.

Years ago, with procedural languages, we used things like loops, function calls, arrays, linked lists, queues and stacks. Now, in the 21st century and with the advent of object-oriented programming, we've evolved to using loops, function (sorry, method) calls, arrays, linked lists, queues and stacks. A simple hand disassembly of the object code may tell you what's going on. If there's a loop, you might be able to figure out the kind of data structure on which the program is operating and check the structure's integrity. If it's a long list of assignments, you might be able to figure out whether something corrupted the base registers. Save areas, just like Java stack traces, are handy ways to figure out what was called and with which parameters.

This is kind of a serenity prayer for programmers. Everything on a computer happens for a reason, and the processor does exactly what it's told. Therefore, no matter how bizarre the error, there is a reason it happened. Your job is to figure out the conditions leading up to that point that are consistent, logical and make sense in a given context. So unless you're the victim of the somewhat rare cosmic ray-induced bit flip, there's a bug in a program somewhere.

I can imagine two reactions to this column. The first might be "Duh! Tell me something I don't know." The second might be "This is all fine and good, but your glib advice won't help me fix this thing." I accept the criticism. The point of this column is not to give how-tos or specific guidelines for debugging Instead, these are things to remember when you're on the phone at 2:00 a.m., the Web site is down and the VP in charge of everything is loudly demanding to know when it will be fixed.

Debugging is one of the most challenging and rewarding parts of my job. But I find the task easier when I don't panic first. ABOUT THE AUTHOR: For 24 years, Robert Crawford has worked off and on as a CICS systems programmer. He is experienced in debugging and tuning applications and has written in COBOL, Assembler and C++ using VSAM, DLI and DB2.

This was first published in December 2007

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.

    Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.