This content is part of the Essential Guide: Cut data center sprawl to improve IT capacity

Purging zombie servers saves AOL money, energy

AOL shares the lessons it learned from a project to rid its data centers of 14,805 zombie servers that saved the company about $10 million.

Nearly one in three servers in the U.S. is a zombie server that does nothing to contribute to the compute power of the data center.

One company that has seized on the opportunity to increase its data center efficiency and save money by eliminating those comatose "zombie" servers is AOL, a $2.5 billion worldwide company that provides email, messaging, digital advertising and content.

The New York-based company undertook the challenge of purging zombie servers at its five facilities as part of a five year data center strategy to begin chargebacks to each of the company's business units, according to James LaPlaine, AOL's senior vice president for technology operations. In the end, AOL achieved greater server utilization rates and made a measurable dent in the company's carbon footprint -- a 35% reduction in one year.

The goal was 20% but it ended up higher as the result of tough conversations with users, LaPlaine said.

In all, the reduction totaled 36,737 metric tons of carbon emissions after 14,805 servers were decommissioned and physically removed in 2014. AOL started the year with 33,103 servers in production.

The reduction was part of the company's work to close two data centers by the end of last year -- a 5 MW facility in Dulles, Va. and 1 MW facility in Mountain View, Calif.

AOL's efforts to reduce its zombie servers earned it recognition in the Uptime Institute's annual Server Roundup competition.

"We wanted to demonstrate to others in the industry that utilization is very low," LaPlaine said. "Left unchecked, the attention on underutilization hasn't been there in the industry."

Millions of Zombie servers plague data centers worldwide

The latest data about comatose servers comes from consulting firm Anthesis Group and Jonathan Koomey, Research Fellow at Stanford University, using anonymized data from customers of TSO Logic, a global company that develops IT efficiency software for data centers.

The surveyors define a comatose server as one that has not delivered information or computing services in six months or more. The study estimates there are about 3.6 million comatose servers in the U.S. and 10 million worldwide. Based on the worldwide estimates, more than 4 gigawatts of power could be saved by a reduction in the IT and infrastructure load.

TSO Logic calculates that an environment with 1,000 servers could have $300,000 savings by removing comatose servers. It bases its calculations on having 300 (30%) of those servers comatose using calculations from the company's online calculator, based on a PUE of 2 and $.11 kilowatt-hour for energy costs. The Uptime Institute also has offers a Comatose Server Calculator.

For AOL, the savings were much larger. The company saved $4.3 million by reducing utility and maintenance costs and eliminating licensing fees. That's on top of the $6 million in revenue from selling equipment to third-party vendors that refurbish it for resale, donate it or recycle it through scrap metal brokers.

AOL server decommissioning project successes, challenges

AOL's project changed the company's data center footprint and led to "mammoth" savings, said Matt Stansberry, the Uptime Institute's director of content and publications.

"There was a good incentive to do it," Stansberry said.

AOL's IT pros asked each business unit whether specific applications were needed or whether those applications could be collapsed. The default answer from IT leaders was that the app would be decommissioned unless it was deemed necessary.

"If you don't have that forcing function, people just tell you to leave it alone," LaPlaine said.

AOL has a goal to reach 20-25% CPU utilization. Right now, AOL is at 12%, he said.

And that's typical. McKinsey and Company said in its report that typical servers in business and enterprise data centers deliver between 5% and 15% of their maximum computing output during an average year.

LaPlaine recommends IT pros measure server utilization of machines rather than what employees claim they need.

"Now we make data driven decisions and not emotional ones," he said.

The data center closures and server purging were part of a three-year project for AOL to shift from "black box, central IT" to "technology business management," or TBM, which makes IT costs more transparent to all business units.

PUE has long been a focus of data center operators, Stansberry said, but Uptime began pushing for increased server efficiency about a decade ago. There are all sorts of reasons a server may be abandoned but Uptime urges enterprises to decommission those zombie servers.

Now we make data driven decisions and not emotional ones.
James LaPlaineAOL's senior vice president for technology operations

There are other, lower-profile examples of the impact of server roundups. For example, Stansberry said one large healthcare provider avoided the construction of an $80 million data center after undertaking a project to rid its data center of comatose servers.

With creative awards -- where oversized Rodeo belt buckles are presented to Fortune 500 CEOs as part of Uptime's Server Roundup -- and the creation of online calculators to come up with cost savings estimates, the benefits of ridding the data center of zombie servers continues to generate interest.

After purging zombie servers, the battle continues, LaPlaine said. Justifying future purchases becomes important. Justifying purchases also helps eliminate unnecessary capital expenses and makes better use of the company's human capital, too.

The project at AOL also led to the creation of a "utilization czar" charged with measuring and benchmarking server utilization rates. The czar continually questions the need for servers to continue the purge.

AOL also uses a configuration management database (CMDB) that tracks server utilization. He plans to continue to use data from the CMDB to make decisions.

Along the way, the biggest challenge was to continually communicate with each business unit.

"The hurdle was [explaining] that this would really happen," LaPlaine said, adding IT pros had to continually make it clear that the project was not just talk and had a final deadline of the end of 2014.

Robert Gates covers data centers, data center strategies, server technologies, converged and hyperconverged infrastructure and open source operating systems for SearchDataCenter. Follow him @RBGatesTT

Dig Deeper on Data center capacity planning