How to identify and remediate data center hot spots

Data center hot spots can cause equipment failure and system outages. Find out how to identify the source of data center hot spots and how to fix the problem in this tip.

Data center hot spots refer to server input air conditions that are either too hot or too dry, according to the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) TC 9.9 guidelines.

Hot spots have been attributed to a reduction in reliability and system outages, and have been associated with computer hardware equipment manufacturers threatening to void warranties or maintenance agreements.

More on data center cooling and humidity:
Air flow management strategies for efficient data center cooling

Hot-aisle vs. cold-aisle containment: Liebert and APC face off

Data center humidity levels source of debate

The recently revised ASHRAE environmental guidelines recommend the input air environment to range between 18 degrees Celsius (64 Fahrenheit) and 27 degrees Celsius (81 degrees Fahrenheit), with a moisture content of 5.5 degrees Celsius (42 degrees Fahrenheit) and 15 degrees Celsius (59 degrees Fahrenheit), truncated at a maximum of 60% Rh. The major equipment manufacturers that helped to create these new guidelines have agreed that they're acceptable for the long-term reliability and performance of their equipment. See Figure 1 for a graphic illustration of the recommended temperature and moisture content limits

Figure 1: Recommended environmental limits

Click to enlarge.

As a side note, it's the user's responsibility to maintain the proper environment at the intake of the computer equipment. From that point on, it is the hardware manufacturer's responsibility to ensure the equipment is properly cooled.

Hot spots occur when the environment at the air input to the server, storage device, communications router or other computer equipment is higher in temperature or lower in moisture content than recommended. That is, above 27 degrees Celsius and below 5.5 degrees Celsius Dew Point. (Hot spots do not occur at the rear of the cabinet or in the hot aisle!)

What causes hot spots?
What are the causes of out-of-spec airflow? The primary cause is not enough cold air being delivered to the cold aisle. This creates an effect where the cooling fans of the servers located in the bottom of the cabinet consume all the cold air, leaving the cooling fans in the servers at the top of the cabinet to suck in room air. At that level, the room air is a mixture of hot exhaust air from the hot aisle recirculating over the top of the cabinet or around the ends of the aisles. This hot exhaust air can also recirculate within the cabinet through openings in the cabinet mounting surface.

The first source, hot exhaust air recirculating over the top or end of cabinets, is caused by bypass airflow (see Figure 2).There are so many openings in the perimeter wall under the raised floor of the computer room or the raised floor itself that the under-floor static pressure isn't sufficient to push enough cold air into the cold aisle to satisfy the demands of all the server cooling fans. Under the raised floor, these openings can be cable passthroughs, conduit openings, gaps at the bottom of building columns or just holes in the wall. In the raised floor they are cable cutouts, holes under cabinets, extra perforated tiles usually not located in the cold aisle, openings in the floor around the perimeter of the room, behind air handlers, under PDUs, etc.

Figure 2: Illustration of bypass airflow

Click to enlarge.

Interestingly, the source of hot spots in a computer room is seldom too little cooling capacity. In fact, too much cooling capacity can be a significant contributor to the existence of hot spots. How does this happen? If the heat load in the room requires eight cooling units and there are 10 or 12 installed, each of the 10 or 12 units is doing less work than if there were only eight. This decreases the temperature drop across the units, and the under-floor temperature is higher, which can contribute to additional hot spots when the static pressure is also too low.

How are hot spots eliminated?
The first step is to identify where the holes are: under the raised floor, in the raised floor or above the dropped ceiling (if that area is used as a return air path). All these holes need to be sealed to ensure the only place cold air is entering the room is through perforated tiles or grates in the cold aisle. Unused holes should be permanently sealed. Cable cutouts and pass-throughs should be sealed so that cables can be added and removed without requiring the sealing system to be removed and replaced, because most of the time the replacement will not happen. For that reason, the best device to seal cable cutouts and pass-throughs is a brush grommet system.

The next step is to ensure only the required number of perforated tiles are installed in the cold aisles -- no perforated tiles in the hot aisle or in the middle of an unused area of the computer room. The proper number of perforated tiles can be determined by checking the tile manufacture's specification sheet and matching the airflow volume to the under-floor static pressure. For example, if the spec reads at 0.5 inches of water static pressure, the tile flow volume is 750 CFM. Divide the total airflow volume (CFM) from all the operating cooling units by 750 and that will determine the total number of perforated tiles that should be installed in the cold aisles.

The placement of these perforated tiles needs to be matched to the heat load in each cabinet. A general distribution suggestion is outlined in the chart below.


Cabinet load Perforated tile placement
Less than 2 kW Perforated tiles can be checkerboarded down the aisle with one tile cooling two or more cabinets
2 kW to 5 kW Dedicate one perforated tile to each cabinet
5 kW to 10 kW 40% to 60% open grates are required
More than 10 kW Supplemental cooling is usually required; isolation of hot and cold aisles is recommended

Figure 3: Elimination of bypass airflow

Click to enlarge.

The final steps include sealing the cabinets themselves. Blanking plates should be installed to seal all areas of the equipment mounting surfaces that are not occupied by computer equipment. In addition, any gaps between the mounting rails and the sides of the cabinet need to be sealed. Any gaps at the top or bottom of the cabinet need to be sealed. No hot exhaust air should be allowed to circulate from the back through the cabinet to the front. Also ensure that the area below the raised floor is cleaned out. This includes unused cables, bundles and spools of excessively long cables, general debris, etc. The same should be done overhead in the return air path.

To see if you've done the job correctly, measure the intake air temperature of the servers, especially the top server in each cabinet. This can be done with an IR Temperature Gun, liquid crystal temperature strips or wireless RF temperature monitors. If the input air temperatures are too low or too high, perforated tiles can be removed, added or rearranged. This monitoring should be done on a regular basis, especially when new equipment is installed in an area or equipment is removed.

Added advantages of providing the correct environment
More than likely there is excess cooling capacity operating in the computer room. I've already discussed this point in relation to cooling capacity, but I'll now discuss its relation to airflow capacity. There are numerous benefits now that the airflow has been directed exclusively to the cold aisle. First, some of the excess capacity can be turned off, thereby reducing energy costs. If cooling units cannot be turned off, at least redundant capacity has been restored so additional cooling units are not required. Finally, additional applications and/or equipment can be installed in the room without adding more cooling units.

For any of this to be done properly, a detailed study of heat load, cooling capacity and airflow balance must be conducted in the computer room. This will require training of the facilities maintenance staff or employing a trained engineer.

When the cost-saving steps are added to the airflow remediation work, the return on investment of the cost of the entire project is usually less than a year. A 10% reduction in energy utilization is simple. When done properly and completely, users have reduced energy consumption by 25% to 36%.

ABOUT THE AUTHOR: Dr. Robert F. Sullivan, or "Dr Bob" as he is commonly known in the data center industry, originated the concept now known as hot-aisle/cold-aisle cooling. He is a consultant to the Uptime Institute and one of the most highly regarded experts on data center cooling.

What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at  [email protected].

Dig Deeper on Data center design and facilities