Put these troubleshooting tools in your toolbox
In this tip, Peter Harrison describes how to use Linux commands to get a quick fix on IT troublemakers
Continue Reading This Article
Enjoy this article as well as all of our content, including E-Guides, news, tips and more.
In many cases, Linux servers are managed by groups of people. Configuration files are constantly edited; processes and daemons are frequently restarted. The likelihood of a configuration error causing a problem is often greater than that of a hardware failure. So, with this in mind, at the expense of seeming cavalier, I'll say that one of the most important troubleshooting tools is the less
command.
The less
command allows you to scroll back and forth through configuration and error log files. The command will also even allow you to search for text within files too such as timestamps.
Once you have identified a key piece of error information, you can use the grep
command to do highly specialized searches for the error pertaining to your application or sub-system.
The less
and grep
commands go hand in hand. You should probably use the less
command to search for an error message timestamp at the approximate time of the event in the files of your system's syslog directory. Once something interesting is found, the grep
command can be used to specifically search for the occurrence of similar messages or timestamps in multiple log files for the purposes of event correlation.
![]() |
||||
|
![]() |
|||
![]() |
I would say that next in importance is the man
command, as it provides help on the commands you'll need to fix the problem. Books are often sufficient, but when you are under pressure, the man
command will provide detailed information on known commands much faster. For me, these would be the two most important command sets.
There are other commands that are obvious but frequently forgotten when IT managers are under pressure. The ls
command will help determine when last the configuration files were edited. The vmstat, top, ps,
and free
commands will give a good idea of the general CPU, memory and swap partition loads and could be used to help discover rogue processes that could be affecting performance. Telnet
, traceroute
and ping
are also helpful in eliminating sources of network- related problems.
Once you have the error message, a possible configuration file change and some performance figures in hand, the best tool to use is a Web browser to check search engine results for possible clues as to what the problem could be. Remember to check both Web pages and user group results for better coverage of the problem. Also, remember to search the Web sites of your hardware and software vendors for information too. Books are a good resource too, but are usually not as quickly searchable.
Armed with this information, you can use some of the commands related to system and network performance I mentioned in the answer to the previous question to determine whether the condition that triggered the event may still exist. This will help to isolate the source of the problem with an aim to fixing it.
Familiarity with troubleshooting tools obviously will help to rectify problems quicker; but, equally importantly, it will make you aware of potential issues that may need to be fixed proactively.