When Tom Cignarella came on board as vice president of technical operations at PlanetOut Inc., a majority of his...
data center was in the dark.
The lights at the media and entertainment company's San Jose data center worked perfectly, but its network monitoring did not -- only 20% of Cignarella's servers had any kind of reporting or monitoring capabilities in place.
"When I first got there in 2006, there were only a few servers being monitored. There was only a limited amount of historical graphing -- we were flying blind," Cignarella said. A custom-built application, such as it was, had been handling PlanetOut's monitoring and systems management needs for roughly 40 servers running a mix of Solaris and Red Hat Enterprise Linux 4, he said.
And he didn't know it at the time, but several of his Sun Microsystems Inc. T1000 servers were running at 100% of disk capacity. Others were failing without notice. Network slowdowns were beginning to influence business. But with only 20% of systems under management, Cignarella and his IT staff really had no way of knowing why. "We were being told of problems by our customers instead of knowing about server issues ourselves," he said.
Before he took on his IT role at PlanetOut, Cignarella worked at Symantec Corp. in Cupertino, Calif., where he managed a network monitoring team.
It was at Symantec where Cignarella became familiar with an open source monitoring project called Nagios. Released in 2002 under the GPL, Nagios watches hosts and services that are specified by the user and issues alerts when things go wrong, and again when they get better.
"[At Symantec] we did extensive work with Nagios and were able to extend it throughout the enterprise," Cignarella said. However, while Nagios was a boon to Symantec, it required a system administrator who was an expert at using it to reach its full potential. "Any systems administrator can have [Nagios] up and running on their environment, but if you want to extend it further than that you need an expert," he said.
PlanetOut had no such experts. Luckily, Cignarella knew of another monitoring company called GroundWork Open Source, which had a product called Monitor comprised of several open source projects, including Nagios.
"At PlanetOut the server situation required that we have something up and running very quickly," Cignarella said. An offering from the "Big Four" -- IBM, CA Inc., Hewlett-Packard Co. and BMC Software Inc. -- was also out of the question, Cignarella said. "With the time frame we had, we needed something up and running immediately," he said.
So PlanetOut proceeded with a deployment of GroundWork Monitor 4.5. Cignarella was comfortable with the technology and was familiar with how it was installed and maintained. Cost was also a consideration, with the Big Four's proprietary offerings costing hundreds of thousands of dollars more than Monitor. Cignarella did not specify how much his deployment cost but could say he bought the standard support license from San Francisco-based GroundWork.
Onward to full-time server monitoring
The GroundWork Monitor installation started in August with version 4.5, and progressed to version 5.0 when it went live in October. PlanetOut purchased a single license from GroundWork, which allowed for installation on one production server and a highly available (HA) server for backup.
Cignarella said the application initially monitored one Web server running Tomcat, one firewall and an application server running custom applications. The configuration served as a test bed from which to build out the deployment to the rest of PlanetOut's Sun T1000s.
"[GroundWork staff] came in and deployed that and then we had them watching the alerts for us," Cignarella said. "They made sure things were running fine [for those three servers] up to the point when we turned on the application for real. Then we started getting barraged every five minutes with alerts." For example, Monitor immediately identified the aforementioned disks at 100% capacity.
A 64-bit upgrade hiccup
While the August 2006 deployment of Monitor 4.5, went smoothly, the upgrade to 5.0 in late 2006 was a "rocky road," Cignarella said. This was because PlanetOut was the first GroundWork customer to upgrade while running 64-bit servers. Many of the challenges, he said, were application-specific to PlanetOut's infrastructure.
"We'd have to take our HA box and basically start over with it; we'd have to export out the database and re-import it. Thankfully, we had that HA server to begin with, whereas having just a single server would have made things harder to do," he said.
In the end, the upgrade turned into a multiday process with a 50-point list of applications to reconfigure and redeploy. GroundWork support staff came on site to assist, Cignarella said. "We are definitely looking forward to [GroundWork's] ultimate goal of upgrading and installing a new RPM and that being it," he said.
Today, Monitor is monitoring 100% of PlanetOut's infrastructure, plus a little bit more: Cignarella said he's even monitoring things outside the data center, like networked devices in remote offices. "We've gone through the natural evolution of having network monitoring [in a mixed Solaris and Linux environment]. We've flipped from being reactive to being proactive," Cignarella said. "This is no longer just about the machines being up, it is about the applications that run on them actually working."
Have a quesiton or comment on the article? Email Jack Loftus, News Writer.
And don't forget to visit our blog at the Enterprise Linux Log.