In many data center environments, Nagios has become the de facto standard for companies in need of an open source, fault-tolerant solution to monitor single points of failure, service-level agreement (SLA) shortcomings, servers, redundant communication
To better answer this question, it helps to look at Nagios' "simple" build philosophy (where "simple" refers to a "no frills" system design rather than "ease of use").
Editing the objects in Nagios' config files enables administrators to monitor, alert and perform event handling on any network service, host resource or environmental factor. At the same time, admins can also account for network complexities such as host hierarchies, flapping services and distributed monitoring. These config files do not sugarcoat certain details from the administrators, giving admins (whether they like it or not) a transparent, lightweight tool that can hook into other tools that share its build philosophy.
Critics of Nagios might consider implementing Nagios' approach time consuming, difficult to learn and not worth the effort in mission-critical environments. However, Nagios advocates such as Chris Penn, a sys admin of the NOAA Southern Regional Climate Center (SRCC), would disagree. Penn says that Nagios was the most affordable and flexible tool for monitoring the weather simulation supercluster he administers.
Leat Boafo, Principal Consultant of IT Duties, also thinks that Nagios' learning curve is worth the effort. Like Penn, Boafo uses Nagios to perform thousands of service checks in his data center.
Helpful Nagios plug-ins
To extend Nagios' native capabilities, Boafo integrates Nagios with third-party plugins like check_bl and check_AD. Other plug-ins include:
- NagiosQL (administration tool)
- Fruity (GUI front end)
- Cacti (trending)
- Splunk (root cause analysis)
- And DNX (distributed checks).
Even Windows admins, like Horizon Technology's Ryan Villa, have been able to use it to check the health of Exchange Servers and Active Directory using Nagios-friendly open source tools such as NSClient++ and NagiosPluginsNT.
At IT service provider Frontline IS, president and founder Robert Ford uses Nagios to monitor their Internet partners' uptime. "Nagios is what allows us to provide third-party verification of SLAs that telcos promise our customers. Our Nagios servers create a trouble ticket when network services are interrupted, and when they are restored, they close the ticket."
Nagios support options
When asked where they went for support, Penn, Boafo and Villa all answered "the community," and listed several online resources, including:
- Nagios documentation
- The Nagios mailing list
- Fellow Nagios members and wikis, such as NagiosWiki.com and NagiosExchange's wiki.
"Community support may not be adequate for larger enterprise environments," cautions Ford, whose company offers commercial Nagios support. "Quality control on Nagios integration projects on sites like SourceForge vary greatly, and enterprise customers often need the expertise of consultants who can help guide them in knowing when to use a stable tool with few features or beta test a more cutting-edge tool with more features."
Data center environments that are short staffed but still need Nagios might consider Nagios-managed services (which at the time of this writing, number about a half a dozen, according to a Google search).
Data center veterans like Dino Khoe caution that Nagios servers (like all servers) can turn into a sort of "Frankenstein" if not carefully planned and documented. Khoe, owner of NagiosWiki.com and a veteran in the content delivery network (CDN) space, recommends that those new to Nagios integration consider alternative projects, such as Oreon, or even commercial solutions, such as Groundworks, Zabbix, or Zenoss.
"To really trend your network traffic, however, you have to integrate Nagios with other tools, such as Cacti or Smokeping, and then use Nagios to monitor and alert on the running averages of those databases," Khoe says.
Like any tool, Nagios can be a double-edged sword. "Nagios is so flexible," says Khoe, "people often use it to monitor their mess rather than cleaning it up and doing it right in the first place."
Nagios does not hide or abstract certain things, and those who know their way around the config files can easily do things that might adversely affect their network. An example of this, says Khoe, is when an admin redefined a minute to equal 15 seconds, and then wondered why their alert intervals were wrong and queued checks spiraled out of control.
Perhaps the biggest determinant in selecting Nagios is your data center's personnel, adds Khoe. How familiar is your IT department with Nagios' underlying platform? Will your core business be dependent on one person to support it? And what type of network changes are you expecting your personnel to implement? In the end, these issues reign supreme in enterprise environments, regardless of how simple or elegant a tool might be.
ABOUT THE AUTHOR:Roger E. Rustad, Jr. is a systems engineer at BelAir Networks and a longtime open source advocate. His blog is hackmyidea.com.
This was first published in March 2008