There was nothing technically wrong with the HP ProLiant servers at Mynewplace.com, an online rental services agency based in San Francisco, but the IT staff kept on getting beeped at 4 a.m. with alerts that eventually proved to be false alarms.
So while the servers were fine, the IT staff wasn't. Entire days were being wasted each month diagnosing their clutch of 50 HP ProLiant DL145s and DL385s running Red Hat Enterprise Linux 4 AS and ES, said John Shin, Mynewplace.com's director of systems. Shin decided he needed to make some changes. .
Struggling with network monitoring
"We were struggling with monitoring," Shin said, but that may have been an understatement. Things were so bad, in fact, that at one point last year he contemplated disabling the monitoring application altogether because it was doing more harm than good.
The application was Nagios, a popular open source systems and network monitoring application that provides alerts for user-defined hosts and services. In Shin's network, however, it was triggering false alarms because of simple network management protocol [SNMP] incompatibilities with Mynewplace.com's open source application server, Resin 3.0. Resin is based on a Java implementation of the PHP scripting language and is maintained and supported by San Diego-based Caucho Technology Inc.
Nagios, JVM and Resin 3.0 woes
Since Resin and Nagios were not directly compatible, Shin would expose the application stack's Java virtual machines (JVMs) through SNMP and monitor the environment that way. Unfortunately, response times under those conditions were sluggish, he said.
"Nagios was not really the problem," Shin said. "It was the JVM stack not being able to respond to it correctly. It was recording events in SNMP that were then watched by Nagios and that made things crawl. There were a lot of man hours wasted, and it would trigger the 4 a.m. pages."
In spite of its popularity on open source repositories like SourceForge.net, Nagios has its detractors. In a recent interview about Nagios with SearchEnterpriseLinux.com, Zenoss Inc. CEO Bill Karpovich criticized Nagios for its lack of enterprise-level support. "The maintainers never thought of it as a project that an IT manager would use to monitor an entire enterprise environment," he said. Zenoss is an open source startup vendor in the systems management space.
HP OpenView alternatives
Like a lot of users with network monitoring needs, Mynewplace.com first looked to OpenView, Hewlett-Packard Co.'s systems management suite. However, Shin said that as a medium-sized business, Mynewplace.com had budgetary constraints and OpenView was too high a price to pay.
The feature-rich, expensive offerings from HP and the other members of the "big four" – IBM, CA and BMC – have spawned the "little four" (a phrase coined by analyst firm RedMonk), comprised of Hyperic, Zenoss, Qlusters and GroundWork. Executives from those companies have bet their chips on the valuable midmarket for customer wins like Mynewplace.com.
Compared with OpenView, offerings from the "little four" were priced approximately two-and-a-half times less on average, Shin found, although he would not cite specific dollar amounts. OpenView had another strike against it: "It did not have the framework in place to monitor some of our key applications," namely Resin and Postgres, Shin said.
Desperately seeking Resin support
Shin's sole requirement for a network monitoring application was that it be compatible with Resin. Nothing he had found to date -- proprietary or open source -- was out-of-the-box compatible with Resin.
But, in March, a simple Google search revealed an exception: Hyperic HQ 3.0, the flagship product of San Francisco-based Hyperic Inc. Like many commercial open source companies, Hyperic provides its software as a free download through the GNU public license (GPL), and then sells support to customers that require it. Shin downloaded a free copy for testing.
On March 31, Shin said Mynewplace.com signed a contract with Hyperic at a "substantial discount" to support 25% of the network nodes on four of his ProLiant machines for an undisclosed price. "The out-of-box support and compatibility [with Resin] were worth the price," he said, and the false alarms ceased for those four machines.
Moving the Nagios-monitored machines over to Hyperic was also a breeze thanks to HQ's auto-discovery tool, Shin said. Upon installation, HQ automatically discovered and integrated Shin's Nagios environments. HQ was also able to collect response codes from Nagios invocations as well as text-based output that Shin's existing Nagios plug-ins would report. These plug-ins perform host and service checks and return the host or service status to Nagios.
The machines run Red Hat Enterprise Linux AS and ES, Postgres and a secure mail server called StrongMail. Eventually, if all goes according to plan, Shin expects to replace the remaining 75% still monitored by the Nagios SNMP workaround with Hyperic HQ later in the year.
Meanwhile, Shin has no regrets about passing over HP OpenView. "If we had gone with OpenView, you'd have to assume that with all of the custom work and plug-ins to make it work with our core needs would have required as much time and effort as is required today with Nagios," Shin said. "Compare what we're doing now to when we were regularly spending five man-days per month just to maintain Nagios and this was a no brainer. Now it's a couple of hours [a month]."
Have a question or comment about the article? Email: Jack Loftus, News Writer And don't forget to visit our new blog, the Enterprise Linux Log.