Welcome to the world of Nagios, an open source network monitoring tool. Besides being free, powerful and flexible, it can save IT managers a lot of time by automating network monitoring.
In this section of my introduction to Nagios, we'll look at an example of a Nagios configuration. In part one, I discussed the usefulness and architecture of Nagios.
As the previous paragraph implies, configuration plays a large role in successful Nagios operation. The configuration mechanics are conceptually quite straightforward, but require attention to detail. Essentially, a hierarchy of hosts and services are defined, with options defined for what check should be run and what should be done after a failed check.
Here is an example of a host configuration file entry:
Most of the entries are self-explanatory. The machine has a name, address, a check that should be run (check-host-alive), and a maximum number of checks that should be performed before concluding a problem exists. If there is a problem, the group linux-admins should be notified via the options listed every 30 minutes at all hours of the day or night (24x7). So for this resource, the machine itself must be checked to see if it is up and running.
Here is an example of a service configuration file entry:
Again, most of the entries are easy to understand. This service runs on the host defined in the previous example. (Services must have an entry for the server they reside on.) A service description and the command is there to check whether it is up and running, a maximum number of checks, and so on.
An obvious question is, "Now that I'm monitoring all of my hardware and software, how do I find out what's going on?" In addition to the problem notification mechanism listed in each configuration entry ("notification_options"), Nagios provides a number of prewritten CGI scripts that provide monitoring information; in essence a system status dashboard. These scripts provide listings of overall system status, network problems, trends, and so on. Between the dashboard information and the notifications, Nagios enables you to take a more proactive approach to managing your IT infrastructure.
Like all network management tools, Nagios is fairly complex to set up and requires ongoing tuning to ensure that the level of information provided is right -- neither too much detail nor too little information. Here are some recommendations about how to get the best use of your Nagios implementation:
- Begin by planning what you need to keep track of, prioritized by most important resources first.
- Work incrementally, first getting those most-important resources under management before moving on to less-important resources. For example, in most organizations e-mail is more important than FTP availability, so begin by putting e-mail under Nagios management. Working incrementally can reduce the burden of implementing a network management system.
- Plan on regular reviews of the type and level of information you're getting, especially in the first few months. The purpose of these reviews is to get the system configured properly so that you can then use it on an ongoing (relatively) easy basis.
- Take advantage of the Nagios community. A large number of sample configurations, dashboard extensions, and custom plugins are available, which can make it easier to get your Nagios implementation up and running.
- Document your configuration. Comment your configuration files so it's clear what resources you're managing and what plugins you have running. Also, write up some external documentation on your Nagios implementation so that someone can pick it up later and get a good overview of how your management scheme works.
Nagios is very powerful and can make your life much easier after it's up and running. It's significantly less expensive than the commercial alternatives. And, best of all, because it's open source, it offers the ability to take advantage of the entire community's work with Nagios by sharing plugins and experience.
Bernard Golden is CEO of Navica Inc., a systems integration firm specializing in open source software. He writes a column for SearchEnterpriseLinux.com called Golden's Rules and answers your questions about open source software issues.