Problem solve Get help with specific problems with your technologies, process and projects.

Linux debugging with SystemTap dynamic instrumentation

In this tip, learn how SystemTap’s highly scriptable dynamic instrumentation has an edge over traditional Linux server debugging and performance monitoring.

When analyzing Linux server performance, traditional tools work fine if you simply want to find out what’s happening, but they don't allow you to delve into what really is going on in your system. However, SystemTap offers advanced low-level options to get to the core of the problem.

The essence of SystemTap is that it actually puts a tap on your system. Using these taps helps you to find out what really happens. To do so, SystemTap uses a dynamic instrumentation that works with tracepoints. But what is dynamic instrumentation? To answer that question, you need to know a little bit about the way programs are used in operating systems.

Today’s systems are complicated, and there may be many reasons for performance problems. In many cases, analyzing what a program is trying to do is too simple to find the cause of a performance problem. For instance, a program might use functions that you aren't aware of. That is where dynamic instrumentation can help you -- by integrating into the Linux kernel and tracing what exactly a program is trying to do. Using this approach has an additional benefit: It works without slowing down programs or interrupting the availability of machines. SystemTap is available for most recent Linux distributions. The information in this article is based on Fedora Core 12. If you haven’t installed SystemTap yet, you can install is with the following command:

yum install systemtap kernel-devel yum-utils

debuginfo-install kernel

Differences with traditional debugging methods
To find out exactly what a program is doing, you need a debugger. Traditional debugging techniques have some problems. For instance, an often-used debugger like gdb interrupts normal operation. Sometimes you even need to re-compile or re-install software. The most notable disadvantage of these traditional debuggers is that they only look at one executable or aspect at a time, not at the entire system. And by looking at one aspect only, you might miss important information. These disadvantages don't apply to dynamic instrumentation.

SystemTap isn't just dynamic instrumentation -- it is highly scriptable as well. That allows you to use conditional constructs, associative arrays and much more. Even better, you don't necessarily have to write the scripts yourself, you can use the default scripts that are provided. After installation, you can find some useful scripts in /usr/share/doc/systemtap/examples. Have a look at the index.html to get an overview.

The working of all SystemTap scripts is based on events and probe handlers. The idea is simple to understand: When an event occurs, the handlers produce some action. Below you can see an easy-to-understand example of a SystemTap script. First, the probe function is used to define the event, which in this case is (a read action that occurs on the file system). When this event occurs, printf is used to display a message and the script exits.

$ cat simple.stp

probe {

            printf(“read performed\n)



$stap simple.stp

read performed


To do its work, SystemTap uses tracepoints by putting in callbacks in strategic points in the kernel. In version 2.6.34, there are 282 of these tracepoints. (In 2.6.28, there were just 12 of them!) 

Some scripts, such as schedtimes.stp, use these tracepoints. This script monitors the scheduler, which is an essential kernel component that determines which task runs where at what moment. The interesting thing about using these tracepoints is that SystemTap talks directly to functionality in the kernel -- in this case, the sched.switch tracepoint, which monitors when there is a switch in the scheduler (which means that the scheduler is paying attention to another process that was waiting to be served). Another important tracepoint related to the scheduler is sched_wakeup. This happens when a process was waiting and gets moved into the queue of runable processes. This allows you to find out how long a process has been sitting in the queue before it could be served, which actually provides important performance data. Run it with stap process/schedtimes.stp, which gives a very nice overview of where each of the processes spends time with regard to the scheduler. Use -c [command] to monitor a specific command. The script starts, the command runs, and the command stops and produces its output, which allows you to see perfectly how the program in question interacts with the scheduler.

SystemTap can provide detailed information on system activity
[root@fedora examples]# stap process/schedtimes.stp

all mode

^C       execname:    pid    run(us)  sleep(us) io_wait(us) queued(us)  total(us) 


            events/0:      6        845    5366178          0       2743    5369766

            sync_supers:     12         15    3882049          0         20    3882084

            bdi-default:     13         20    3331703          0         70    3331793

            kblockd/0:     15         12    2881021          0        108    2881141

           ata/0:     19       5085    4162266          0       1210    4168561

            scsi_eh_1:     37       2783    4165170          0        451    4168404

            mpt_poll_0:    247         94    4758588          0        347    4759029

            kdmflush:    284        110    2880259        280        213    2880582

            jbd2/dm-0-8:    302        755    2880878          0         75    2881708 

            vmmemctl:   1072        241    4758661          0        133    4759035

            vmtoolsd:   1253       4498    5470363          0       1565    5476426

            rsyslogd:   1397       1259    5566454          0        792    5568505

            hald-addon-inpu:   1557       1034    4107784          0        575    4109393

            hald-addon-stor:   1571       2078    4165740          0        932    4168750

            hald-addon-stor:   1574        120    4168190          0        449    4168759

            Xorg:   1700      84825    3993184          0      31388    4109397

            sendmail:   1757        199    4046908          0          9    4047116

            rtkit-daemon:   1829        202    5079235          0         98    5079535

Another example: Big Kernel Lock
Big Kernel Lock was introduced in Linux 2.0, during the era in which multiple processors became common in computer systems. The idea was that on a multiple processor system, only one thing could be in the kernel at the same time. This caused scaling problems, and therefore the kernel developers have replaced it with fine-grained locking systems. For performance-analyzing purposes, you might want to know if certain kernel subsystems use Big Kernel Lock and, if so, how bad the rest of the system suffers from that.

Currently, some kernel subsystems still use BKL, like NFS (fixed in RHEL 6), SMB and TTY. SystemTap provides an example script with the name bkl.stp, which shows the number of threads that wait on the kernel lock. If the number of threads is exceeded, it can print the holding threads, including the name of the process, the PID and how long the process held that lock. This is useful information to fix performance problems.

Linux provides some excellent tools to monitor your system. Most of these tools are not capable of giving a generic overview of what is happening on your system. It is particularly difficult to find out how certain programs are interacting with vital kernel parts, such as the scheduler. To find out what exactly is going on, SystemTap is a very useful tool. Because it talks directly to check points that are set in the kernel, SystemTap gives you useful, up-to-date information that my help you to trace why certain programs are causing a performance problem.

ABOUT THE AUTHOR: Sander van Vugt is an author and independent technical trainer, specializing in Linux since 1994. Vugt is also a technical consultant for high-availability (HA) clustering and performance optimization, as well as an expert on SLED 10 administration.

Dig Deeper on Linux servers

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.