A few years ago, the IT operations staff at BT Infonet's data center in El Segundo, Calif., had trouble keeping up with the numerous alerts and volume of data generated by multiple monitoring tools. "We have different monitoring tools that generate a lot of data," said Armand Shirikian, BT Infonet's senior IT manager of operations. "With multiple tools and multiple owners, we had to hold a meeting with eight engineers whenever we generated...
a report and wanted to provide a single voice from IT."
A communications services provider, BT Infonet has a range of offerings, from broadband to Voice over Internet Protocol. The data center in El Segundo serves as the nerve center for the company. Shirikian wanted to streamline troubleshooting processes that had become increasingly burdensome for IT operations staff.
About three years ago, Shirikian opted to deploy Alive, an analytical software package from Irvine, Calif.-based Integrien Corp. The premise behind Alive is somewhat unique among systems management tools, according to Gartner Inc. analyst David Williams. By looking at the historical performance of IT infrastructure components such as applications, servers, databases, networks and firewalls, and combining that with behavioral analytics (through what Integrien calls "dynamic threshold algorithms"), Alive can effectively predict problems well before static thresholds set off alerts. "Integrien is trying to take data and apply intelligence around it," Williams said.
Within weeks of deploying Alive, Shirikian said that he could see a vast improvement in how IT operations staff responded to problems. With the existing siloed monitoring tools, engineers relied on static thresholds; by the time the various servers, databases, applications and processes would be close to hitting those thresholds, engineers would have to scramble to avert problems.
"When we noticed our levels were approaching thresholds, engineers were constantly changing the numbers so they would have time to fix the underlying problems that were making a system approach its thresholds to begin with," he said. It was a classic case of IT operations going into reactive mode to put out a fire.
Predicting problems before they happen
With Alive, engineers are tipped off by seemingly normal activities that indicate an impending problem based on historical behavior. The Alive system delivers alerts based on behavior it predicts and can pinpoint the problematic components such as the database server or application server based on whether the activities caused problems on those components in the past. Alive also indicates to engineers the probability that a behavior will occur and predicts when it will occur. "The system alerts us to activities that are within our thresholds, but they aren't normal and they're indicative of a problem," Shirikian said.
This predictive capability has eliminated the false positive alerts that BT Infonet experienced in the past. "When a system hit 90% of the threshold, an alarm would go off, even if the system was just spiking for a split second," Shirikian explained. "That was always happening, and we would have to take the time to figure out that there really wasn't a problem."
This past December, BT Infonet upgraded to Alive 6.0, a version that includes enhanced analytics, a role-based graphical user interface and adapters that support integration with third-party monitoring tools. The adapters allow Alive 6.0 to complement existing monitoring tools from the likes of Compuware Corp., Hewlett-Packard Co., and Symantec Corp. by collecting the data and presenting it in a single dashboard. "It really reduces a lot of manual processes because we don't have to go into each tool separately to get at the data," Shirikian says.
The Alive system is installed on 175 servers that comprise the company's customer relationship management platform, including Siebel on the front end and an Oracle database on the back end. "We monitor every element in our end-to-end environment," Shirikian said. That includes servers, applications, databases and the network.
A proactive approach to IT operations
Rather than putting out fires, Shirikian likens troubleshooting activities today to noticing the smoke before a blaze erupts. An engineer can look at the data in Alive and knows that a problem lies with the database server, not the application server or network, thereby eliminating those confabs with the entire engineering staff. "We fix problems before the business even know that anything has happened," Shirikian said. Gartner's Williams says that what makes Integrien's approach unique is also what makes it challenging to market. The use of behavioral analytics "is a tough sell to data centers because it's different from the way IT operations typically get alerted to problems," he said.
Let us know what you think about the story; email Megan Santosus, Features Writer .