Most IT managers put Linux on a server and watch the server run for years and years without crashing. A few are not so lucky. For the latter group, and those who'd rather be safe than sorry, Steve Best lists his favorite Linux troubleshooting and fix-it tools in this interview.
You discuss five network debugging tools in the "System Tools" chapter of your book. How did you narrow it down to these five?
Best: The five (ifconfig, arp, tcpdump, ethereal, netstat) that I covered in the book fit into the category where you don't need to change the application and just want to view data on the system.
Can you describe situations in which an admin would use some of these tools?
Best: First off, tcpdump and ethereal can be used to find messages that have errors in them without modifying the source code (adding debugging statements); monitor/analyze network traffic for activity/load/latency measurements; save captured packets for later analysis; detect and monitor passive intrusion; and check the network setup, by determining whether the routing is or is not occurring properly.
Ethereal has a GUI which some people prefer over tcpdump and it can do everything that tcpdump does and more. Ethereal is easy for the beginner to figure out and also has the power for the experienced user. Both provide packet monitoring and analysis.
As for ifconfig, it can be used in two ways: to display the status of your network devices and/or to configure those devices.
Finally, netstat can show you a list of TCP/IP connections to and from a machine and display networking statistics. Combined with the uptime command, netstat can be used to get an overview of how much traffic the machine is handling on a daily basis. It's the program most commonly used by system administrators to quickly diagnose a problem with TCP/IP.
What are some other network debugging tools that you use often, and what are their capabilities?
Best: There are surely other network debugging tools. You could break them down into several categories. I'll list some network debugging tools that people might find helpful. Some of the new ones I've listed are helpful when you are building and developing a network application and not just debugging.
The first area is to do application level network call profiling. These tools profile network-related calls that applications make. One of these tools is netlog, which can be compiled with the application or you can use strace which provides similar information but doesn't have the requirement of recompiling the application. It is available on the Distributed Applications Support Team site.
The second area of network tools are packet monitoring and analysis. These type of tools capture packets coming into the network interface. They allow dissection of packets and display information to the end user. The most common two are tcpdump and Ethereal, which I've already covered.
The third area where network tools are useful is network interaction testing or simulation. These tools allow interaction with an application to simulate one side of the network traffic. There's netcat, available at the GNU NetCat site. Netcat allows connections (udp or tcp) on ports and provides the capability to pass data back and forth.
The fourth area is network data capture through a library. A couple of useful libraries are libpcap and libnet. Available at tcpdump.org, libpcap allows access to the underlying packet capture facility provided by the operating system. Available on the The Million Packet March site, libnet is a high-level toolkit which allows an application to construct and inject network packets.
You also cover the ps tool in your book. Why and when would an IT manager use the ps tool?
Best: The ps tool is very helpful. It can be used to check if a process if running or what state a process is at that point in time.
Depending what problem you are looking to solve with ps, a handy option is the wchan. The wchan option shows what a process is waiting on.
Why did you choose strace as an important system tool?
Best: strace intercepts and records the system calls which are called by a process and the signals that are received by a process. So, strace is important when you need to know which system call is failing or believe a system call is failing.
Your book spotlights the Magic Key Sequence. Why?
Best: Magic Key Sequence is a key combination directly intercepted by the kernel and can be used, among other things, to perform an emergency shutdown.
Before you can use the Magic Key Sequences, the kernel that you are running must be built with CONFIG_MAGIC_SYS-REQ enabled. Another option preformed by this functionality is to capture a back trace of all the processes in the system. This can be helpful during a debugging session to see where each process is in the system and the previous calls it did.