Ongoing data center operations trigger drift: Controls and load distribution gradually end up outside the bounds of a well-founded original design. If you're not continually managing IT load, power and cooling, you're probably wasting money.
"The design was for Day 0, Day 1," said Keith Klesner, vice president of engineering at The Uptime Institute, a New York-based enterprise data center industry consortium. "Where you end up today is often quite different [from Day 1]."
You'll get "opexed out of the business" if you stay on that initial design's course, and environmental responsibility is also playing a bigger role in growth decisions, Klesner said.
He recommends investing in data center assessment and optimization efforts. The IT load usually falls off from design levels to actual operations, or from opening day to a year and a half into running, due to initial overbuilding or consolidation and migration to off-premises hosting.
Uptime's 2014 29 recommendations provide a path to power and cooling optimization that relies on tactical-level communication between IT and facilities.
Uptime's power and cooling recommendations
The first eight of Uptime's recommendations establish a baseline of operations: Determine the IT load, measure intake temperature in normal and hot-spot-prone racks, and estimate the cooling units' power draw, as well as airflow volume and supply and return air temperatures. Doing this will establish the data center's sensible cooling load and help assess whether it is appropriate for the IT infrastructure.
New data center? No problem
Uptime's 29 steps to power and cooling improvement apply to existing data centers, therefore omitting specifics on cold aisle width, ceiling height and other floor plan decisions. If you're just setting up a new data center space, go back to Uptime's original 27 tips for sound data center design, published in early 2008, or to design guidance from ASHRAE; the basic principles remain the same.
Uptime recommends dividing the total sensible capacity of your cooling units (set by the manufacturer) by the sensible cooling load to obtain a simple ratio of operating capacity versus cooling load. This benchmark can be as low as 1.15, and provides a measuring stick for future improvements.
Uptime's hot aisle and cold aisle containment recommendations, steps nine through 17, haven't changed much since 2008. Place all perforated floor tiles in the cold aisle; perforated tiles belong directly in front of the rack they cool, with powered-down racks using a solid tile instead. Uptime notes that networking racks may need special attention. Ceiling grilles in return-air plenums should align with hot aisles.
However, one design focus has changed. Raised floors are no longer "de rigeur," and that's OK, according to Klesner. "You can deliver flooded cold aisle and have it all be overhead," he said.
There are a lot of raised-floor cooling designs in the data center industry today, but more chimney racks are entering the picture.
To prevent air mixing, seal cable holes, the vertical fronts of racks and wall borders. Conduits, cable and pipe paths are air-leakage culprits in the data center, as are cooling units if they're located in the room. A backflow damper or covers will prevent air loss through unused cooling units. The power distribution units, panels and electrical equipment all represent areas where air can escape.
Once the facilities team applies these best practices to minimize leakage, go back and measure temperatures as you did for the baseline. Have intake temperatures increased? More perforated tiles -- or high-air-flow tiles -- can mitigate the problem. While in-row and in-rack cooling units deliver more targeted cooling, iterative containment optimization enables more flexibility for changing IT loads, Klesner said.
Once you get your basics in place, you can start looking at raising the air and chilled water temperatures, controlling cooling by the supply air temperature, and other finesse points.
Keith Klesner, Uptime Institute
"Once you get your basics in place, you can start looking at raising the air and chilled water temperatures, controlling cooling by the supply air temperature, and other finesse points," Klesner said.
Has cooling demand decreased enough to power off units with the lowest sensible load, while maintaining appropriate redundancy for the space? See Uptime's data center optimization recommendations 18 through 21 for how to determine the minimum number of cooling units that will support the IT load. The reduction effort should proceed at the pace of one cooling unit per week.
Each week, retake temperature readings to compare with historical data, and adjust tiles accordingly. You may be able to reduce the number of perforated tiles in the cold aisles as airflow changes. Measure power draw from cooling units once you've hit the goal number. Follow steps 22 through 25 for specifics.
With the help of the IT team, facilities can increase the data center temperature, following steps 26 and 27. The goal temperature should accommodate your current infrastructure equipment's recommended operating temperature range, as well as the power tradeoffs involved in warmer data centers. Have a plan for contingencies, and a path to revert if necessary. Raise the set point on each cooling unit at a pace of one or two degrees per week unless the integrated data center team agrees on a more aggressive approach. As with the cooling demand realignment, measure temperatures and compare them against the established baseline, adjusting cold air as necessary to protect servers.
The final steps, 28 and 29, occur outside the data center: Review and report on improvements. Go back to the initial measurements for cooling units' power draw, cooling capacity in relation to load and IT equipment operating temperatures. Compare the starting values to the current state, and calculate annual power savings and resultant cost savings by kilowatt-hours of shed demand. Share this information across IT ops and with business partners.
Finally, since your IT load is constantly changing and drift is inevitable, start assessing and optimizing all over again.