Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Put these troubleshooting tools in your toolbox

In this tip, Peter Harrison describes how to use Linux commands to get a quick fix on IT troublemakers

There's a problem in your IT systems, and you're tearing your hair out trying to find its origins. Don't worry! There are handy tools that can help you, says Peter Harrison, author of The Linux Quick Fix Notebook, a new book from Prentice Hall PTR. In this tip, Harrison describes how to use Linux commands to get a quick fix on IT troublemakers. -- Editor

In many cases, Linux servers are managed by groups of people. Configuration files are constantly edited; processes and daemons are frequently restarted. The likelihood of a configuration error causing a problem is often greater than that of a hardware failure. So, with this in mind, at the expense of seeming cavalier, I'll say that one of the most important troubleshooting tools is the less command.

The less command allows you to scroll back and forth through configuration and error log files. The command will also even allow you to search for text within files too such as timestamps.

Once you have identified a key piece of error information, you can use the grep command to do highly specialized searches for the error pertaining to your application or sub-system.

The less and grep commands go hand in hand. You should probably use the less command to search for an error message timestamp at the approximate time of the event in the files of your system's syslog directory. Once something interesting is found, the grep command can be used to specifically search for the occurrence of similar messages or timestamps in multiple log files for the purposes of event correlation.

Read these other tips by Peter Harrison

How (and why) to turn a Linux server into a router

Network gumshoe: Improving performance takes teamwork

Integrate and secure Linux without a silver bullet

Fix the lack of Linux documentation

I would say that next in importance is the man command, as it provides help on the commands you'll need to fix the problem. Books are often sufficient, but when you are under pressure, the man command will provide detailed information on known commands much faster. For me, these would be the two most important command sets.

There are other commands that are obvious but frequently forgotten when IT managers are under pressure. The ls command will help determine when last the configuration files were edited. The vmstat, top, ps, and free commands will give a good idea of the general CPU, memory and swap partition loads and could be used to help discover rogue processes that could be affecting performance. Telnet, traceroute and ping are also helpful in eliminating sources of network- related problems.

Once you have the error message, a possible configuration file change and some performance figures in hand, the best tool to use is a Web browser to check search engine results for possible clues as to what the problem could be. Remember to check both Web pages and user group results for better coverage of the problem. Also, remember to search the Web sites of your hardware and software vendors for information too. Books are a good resource too, but are usually not as quickly searchable.

Armed with this information, you can use some of the commands related to system and network performance I mentioned in the answer to the previous question to determine whether the condition that triggered the event may still exist. This will help to isolate the source of the problem with an aim to fixing it.

Familiarity with troubleshooting tools obviously will help to rectify problems quicker; but, equally importantly, it will make you aware of potential issues that may need to be fixed proactively.

Dig Deeper on Linux servers