Rising utility rates and escalating computing requirements are creating new power and thermal challenges for data center managers. Today's high-density rack and blade servers bring these issues into especially sharp focus. Since these architectures are inherently more scalable, adaptable and manageable than traditional platforms, they deliver much-needed relief in complex, crowded data centers. Yet they also introduce power and thermal loads that are substantially higher than those of the systems they replace. In some cases, they may even push the cooling infrastructures of older design facilities beyond their limits.
A long-term solution to these challenges will require broad industry innovation and collaboration. The most important enhancement is a shift toward multi-core architectures that incorporate two or more processing units on a single chip. This will deliver major performance improvements while helping to maintain power usage.
Moving toward integrated data center management -- performance planning
The goal of data center integration is to enable IT facilities managers to automate and optimize performance, power and cooling management across the data center, and to monitor and control all relevant variables at the component, system, rack and data center level. In the short term, companies can move toward this environment through an integrated data center approach. The goal of this integrated approach is to ensure that IT planning factors their performance or capacity requirements to sustain appropriate business service levels with the implications of underlying IT resource decisions beyond simple "price/performance" ratios. Today, ISVs provide performance and prediction software that enables companies to plan for the balancing of workloads, response times and system resource utilization with business and IT hardware resource changes. Those plans can then be factored into power and thermal planning activities, which fall out from resource technology decisions and timelines. The components of such a performance/capacity plan include:
- Build an asset inventory: It's impossible to plan data center requirements without understanding your starting point. You must be able to answer the "What" question about your resources. Know and document the physical aspects of your asset inventory, including software licenses, asset purchase dates, cost, owner, server/desktop identity, etc. in order to better understand the performance utilization of underlying assets.
- Baseline asset performance: After the physical assets have been catalogued, it is important to answer the "How" question in terms of asset performance utilization. It is not uncommon to find server CPU utilization on some applications averaging well below 15 percent across the enterprise. Leveraging performance software tools enables businesses to effectively collect, analyze and present performance data that directly associates performance measurements with business-oriented metrics. This first step also typically helps identify "low-hanging fruit" opportunities, such as idle resources that can be rapidly re-provisioned in lieu of incremental resource acquisition.
- Report/View: Once performance utilization information is available, it is time to understand the "Why" aspect. Should IT care about any given resource? Reporting capabilities enable IT professionals to examine the correlation between performance/utilization against asset ownership and purpose. It helps companies understand not only how well a given asset is being utilized but also the importance of its role within the organization. By correlating utilization, business purpose and stakeholder information, IT can make rapid, informed decisions to proceed to more comprehensive asset performance analysis for those assets whose performance or cost factors necessitate more detailed planning.
- Asset analysis: It is critical to understand the impact of the business cycle on the utilization of underlying IT resources. No business plan is complete without factoring in the requirements to sustain business service level requirements across the business cycle. Optimizing asset utilization in conjunction with reductions in power and cooling costs is worthless if the resultant configurations under (or over) provision the needs of the business. Factors such as business workload performance over time (e.g., "trade settlement application requires X CPU and Y I/O resources for a given transaction rate"), business cycle variance (e.g., "trade settlement transactions peak at 5 p.m. daily, with monthly variance of Y, and business peaks quarterly") can ensure the all-important common view between IT and business so needed to build an overall data center resource plan.
- Asset modeling: Data center planning requires the analysis of various technology choices in terms of impact on business throughput, response time and utilization. The previous steps provide the foundation upon which business change scenarios (growth, consolidation, etc.) can be factored into the overall performance analysis, ensuring high-confidence the results will not only resolve today's performance problems, but will cost-effectively scale with a company's business. This results in a series of technology choices, which can then be evaluated as part of the Power and Thermal Planning effort.
- Optimize ongoing operations: Maintaining "Actual versus Planned" measurements of ongoing performance and utilization is a necessary step to ensure ongoing cost and performance optimization. Business risks associated with change are minimized, mis-provisioning is reduced or eliminated, and costs are lowered not only by increasing average utilizations, but also by being able to maintain optimal server-, power- and thermal-related expenditures over the business lifecycle of the server assets.
Moving toward integrated data center management -- power and thermal planning
Once you have the right mix of servers that can meet your business service performance, response-time and throughput requirements, then you must consider their physical placement in the data center. Companies should consider both rack-level optimization and data center-level optimization scenarios.
Most system-, rack- and room-level cooling issues are created by insufficient airflow or inadvertent mixing of hot and cold air. The need for sufficient airflow is obvious, but is often overlooked by IT personnel who are focused on other concerns. The mixing of hot and cold air is a more subtle issue, but equally problematic, since it can dramatically reduce the efficiency of a cooling system and may also impact airflow.
- Understand airflow requirements for specific equipment: There are four basic airflow scenarios: front-to-back, side-to-side, bottom-to-top, and top-to-bottom. Understanding the requirements for specific equipment will enable an efficient rack-level design and cooling strategy. Map the technologies under consideration as part of the Performance Planning activities against the thermal characteristics specified by the vendor(s).
- Standardize on racks designed for high-density environments: Standardize on appropriate power and thermal policies to make effective rack decisions. Examples: Avoid shallow racks to ensure in-rack cabling will not obstruct airflow. Considering racks that support retrofit fan or cooling units (but verify the benefits of these add-on units) minimizes future risks associated with miscalculations. Planning for localized supplemental cooling for individual racks can accommodate future high-density systems without compromising room-wide efficiency.
- Arrange racks in rows to establish hot and cold aisles: Racks should be aligned front-to-front along cold aisles, and back-to-back along hot aisles. Within each row, racks should be tightly abutted. For this strategy to be effective, cold air must be delivered to cold aisles and hot air extracted from hot aisles. Work to eliminate hot air mixing, which will cause short cycling of the cooling system.
- Use blanking panels: Blanking panels improve airflow through the rack, minimize air loss and help prevent exhaust air recirculation.
- Ensure adequate airflow to individual racks and systems: Clearly define power and cooling requirements at the room, row, and cabinet level. Ensure sufficient airflow to racks based on system-level inlet air temperature and airflow requirements, and use thermal and aerodynamic analysis tools to model and design your cooling solutions. Insufficient airflow will often result in hotter systems and turbulence that decrease cooling efficiency. For example, if a rack requires more cold air than the room provides, its fans will pull a mix of hot and cold air. This will result in reheating of the room, hotter systems, unhealthy airflows, and a substantial reduction in the efficiency of the cooling system.
- Explore the benefits of blade servers: Blade architectures can reduce total power consumption (per unit of complete power) and deliver substantial TCO benefits through reduced cabling, easier provisioning and improved modularity and other management costs. However, they will likely increase power and cooling density. It is therefore important to look at total costs, risks and benefits within your particular physical and operational environment. For more information on Intel blade servers, visit http://www.intel.com/it/digital-enterprise/blade-server.htm
Data center-level optimizations
- Understand data center airflow: The locations of cooling systems and ductwork are obviously critical, but so are the locations of the racks, cable trays, firewalls and other infrastructure elements. Blank off any floor opening that allows access air to escape the plenum. Software tools are now available that greatly simplify airflow and thermal analysis. Consider consulting with facilities cooling specialists for complex implementations.
- Optimize room temperature settings: Consider increasing the Delta T of your cooling system to more closely match IT equipment specifications. This may allow you to reduce total airflow, while meeting the same cooling capacity and reducing operational costs. (For example, Intel IT has found it beneficial to lower supply air temperatures to between 55 and 65 degrees Fahrenheit while increasing Delta T values to 2 degrees Fahrenheit).
- Pay attention to infrastructure efficiency: It is generally worthwhile to spend more on infrastructure components that run efficiently at anticipated loads. Power loss in uninterruptible power supplies, power distribution units, cooling systems and the like just add to the thermal load.
- Perform regular power and thermal audits: New systems, upgrades, and room changes can have unintended consequences, so it is important to monitor airflow, temperature and other environmental factors on a regular basis.
- Avoid over-design: Right-sizing power and cooling infrastructure is one of the most effective ways to reduce capital and operational costs in the data center. Work to understand lifecycle requirements and size infrastructure accordingly. Track vendor innovations, and, whenever possible, move toward more modular, flexible, and standardized solutions that improve agility and scalability. Right-sizing has obvious ties with the benefits of having a long-term performance capacity plan and ongoing performance optimization methodology.
- Establish policies and educate personnel: Best practices for power and thermal management must become an integral component of data center operations. Everything from temperature and humidity settings to new system deployments should follow well-understood guidelines that optimize cooling efficiency and minimize airflow obstructions and hot/cold air mixing.
- For new data centers, establish a master plan based on usage: Different usage models require different layout and capacities to enable optimized cooling.
According to a recent reader survey conducted by Intelligent Enterprise, the majority of companies want to make the most of what they have by simply maintaining and improving their existing technologies. With these steps you can create a next generation data center and begin to reap significant ROI from your investment.
Charles Rego, Senior Practice Principal, is from Intel Solution Services, Intel Corporation's worldwide professional services organization which helps enterprise companies capitalize on the full value of Intel architecture through consulting focused on architecture transitions. Intel Corporation is headquartered in Santa Clara, Calif. For more information about Intel Solution Services, visit www.intel.com/go/intelsolutionservices/
David Wagner is director of product management, Enterprise Performance Assurance (EPA) solutions for BMC Software, Inc. BMC Software, a leading provider of enterprise management solutions that empower companies to manage their IT infrastructure from a business perspective, is headquartered in Houston, Texas. For more information about BMC Software's expertise in the EPA area, please visit www.bmc.com/capacityplanning.