Manage Learn to apply best practices and optimize your operations.

Warning: Failure to comply with data center maintenance is reckless

Ensuring that your data center runs safely can be hard work, but upholding maintenance best practices is imperative. Adopt and build upon these rules.

Too often, operational and safety practices are ignored or unrecognized in data centers.

Complying with modern data center best practices for design and operations is challenging enough, but facilities must be properly maintained to keep up a reliable level of service. A good program for operational practices and data center maintenance brings out the full value of investments, especially if the data center is certified by organizations such as the Uptime Institute or TIA.

The data center is a potentially dangerous place for people and equipment. Good maintenance, written operating practices, regular training and rules enforcement will avoid injuries and outages and prolong equipment service life and reliability. For example, training should cover the locations and proper use of fire extinguishers; the maintenance program should include verifying that extinguishers are properly charged.

We asked SearchDataCenter readers about the unsafe practices they’ve seen in data centers over the years. You’ll see their responses throughout the article, and you can share yours at the end.

The data center is not a cafeteria

Food and drinks should never be allowed inside. This prohibition should be strictly enforced and violators dismissed.

The data center is not an obstacle course

The Occupational Safety and Health Administration dictates that holes in tiles be covered and openings protected with safety cones or temporary guard rails. No more than four contiguous floor tiles should be removed at any time. These best practices will prevent injury from falling into open holes, minimize the amount of air lost through the openings and keep the floor structure stable.

Tiles should always be lifted with a reliable tile puller and set aside where they won't create a tripping hazard. Using the right tools for any job and properly handling materials are always good safety practices; they also get the jobs done efficiently.

The data center is not a closet

Storing equipment inside the compute area, particularly in boxes, brings in particulate contamination. Opening boxes or uncrating equipment creates serious contamination that can clog filters and heat sinks, raising the operating temperatures of computing hardware and contributing to early failure. Paper and cardboard are also fuels for a fire.

"Biggest pet peeve in the data center? Untidy cables inside a rack. Wait until one gets pinched in a door and causes problems that are difficult to track down."

     --Stuart Woodward, IT professional since 1988

Cables strung across the floor during installation are another tripping hazard. They can also shed dirt or create static buildup as they're pulled across floors. Have someone hold a damp cloth around the cable bundle to remove some surface dirt as it leaves the box. The aisle(s) in which cables are strung should be blocked off. Never leave cables on the floor any longer than absolutely necessary, certainly not over breaks or overnight.

Routinely damp-mop the data center floor. Foot wipe mats at entrances should be changed regularly. Good maintenance practice should include an annual treatment by a professional data center cleaning organization. If you have a raised access floor, this is a good time to have it re-leveled; uneven floors leak expensive cool air and create a tripping hazard.

When moving or installing racks or cabinets, managers should implement safety checks to ensure they are properly secured and stabilized. This is particularly critical for any data center in a seismic zone. Put a couple of layers of 0.25"-thick Masonite on the raised floor to prevent tile damage, particularly if the cabinets are loaded with equipment. A cabinet should never be loaded beyond its stated capacity. Remember that cabinets with mounted equipment can be top-heavy and fall when being moved -- particularly going up or down ramps. Perforated air flow tiles are particularly vulnerable to rolling loads and most don't even have rolling load ratings.

A data center is neither a gym nor a sauna

Consider using a server lifting device. It avoids dropping an expensive server or network switch and can prevent employee injuries. Mechanical lifters are also faster and more efficient for installing equipment.

"A woman with long hair once got it caught in a server fan that was left open, causing the server to shut down and putting her in physical pain. Simple things like replacing covers on equipment after service can avoid surprise problems."

     --Robert McFarlane

Good cooling is essential to keeping equipment reliable, and a maintenance contract with a qualified service company is just the start. During data center maintenance checks, managers should do sporadic walk-throughs to see that blanking plates are installed in unused rack and cabinet spaces. Your maintenance firm should check the filters in the air conditioners in addition to the filters and heat sinks in computing equipment. Make cleaning or replacing these filters routine. Temperature and humidity readings should be verified at least annually. In facilities using cool aisle containment, be sure to calibrate the differential pressure sensors. And, of course, all air conditioner monitoring systems should be tested regularly to ensure that alarms work.

In cold climates, it's important to check on sump heaters in cooling towers and heat trace on piping. Video monitoring of cooling towers can alert facility staff to freezing before serious problems occur.

The data center is not a party center

Don't forget about protecting staff from noise damage. Cooling equipment and server fans can create a lot of noise. Advise or even require staff to use hearing protection and make it readily available to everyone, with instruction on where and how to use it.

Electrically, the most critical best practice for data center maintenance is load balance. Unbalanced loads are not only energy inefficient, but they can lead to unnecessary replacements of an uninterruptible power supply (UPS) on the mistaken notion that it is running near its maximum capacity. Large UPS systems deliver three-phase power, and many racks and cabinets today are circuited with either two or all three of those phases. Power draws should be checked regularly at each point in the power chain: racks and cabinets, power distribution units and finally at the UPS. Maintaining load balance will get maximum power from your UPS at the highest efficiency.

A lot of companies hate to find out that they have to run tubing to keep the data center cool, but they don't realize how much they're cutting down the lifespan of a $20,000 switch, along with everything else in the room. I've been to a school where its data center was in the closet of a math classroom. Not only was it not air conditioned in there, the rack was hardly set up correctly, leaving random pieces of hardware sitting on spare chairs.

     --TJ Hatem, IT consultant

The best way to monitor and balance power is with metered power strips in each cabinet, preferably with remote readout capability, along with a data center infrastructure management tool to track usage. Clear identification of every electrical panel and circuit will ward off mistaken shutdowns. Use large character, color-coded labels.

Weak batteries are the most common cause of UPS failures. Consider investing in a good battery monitoring system. Battery failure usually happens at the worst time -- when power goes out and load is suddenly put onto the system. This is most common with valve-regulated lead acid (VRLA) batteries, which are preferred since they don't require the special rooms and precautions of flooded lead acid cells. But VRLA batteries with 10-year warranties may last only three years -- or fewer if power fluctuations cause frequent use. A good monitor can alert to failing cells before it's too late. It can also extend the life of the full battery string by identifying cells for replacement before they degrade the rest of the cells.

If you use flooded lead acid batteries, check all safety equipment regularly, including hydrogen detectors, eye wash, deluge shower and alarms. Activation of any of the safety systems should automatically alert a security station.

The data center is not a craft project

Any electrical work must be performed by licensed electricians, but it is imperative that anyone working in a data center understand the sensitivity of computer operations and the associated risks in a live operating environment.

The electrical system in a data center should be infrared (IR) scanned yearly to identify loose connections that could cause overheating and failure. Specify new electrical equipment with IR scanning windows so it can be checked without opening the panels. Also identify any electrical systems that warrant arc flash hazard warning placards. Opening high-current enclosures is dangerous without specialized protection. Someone familiar with this work in data centers should know the precautions; the small risk far outweighs having a preventable fire or unplanned shutdown due to bad electrical connections.

If you have the dreaded Emergency Power Off (EPO) switch, ensure it is well labeled and equipped with a lift-cover protector, preferably with an audible alarm. Those alarms are usually battery operated, so the batteries should be replaced as well -- carefully, lest you activate the EPO!

Good generator maintenance is very critical. The two most common causes of generator failure are dead start batteries and fuel contamination. In cold climates, check block heater operation.

Anything with water should also be checked regularly, and it's not just sprinkler pipes. Floor drains can dry out and become clogged. System drains could back-up during a storm and flood the data center. Air conditioner condensate drains should be examined, along with liquid detectors. Search for leaks in the roof and any other water sources above your data center. And if you have a pre-action or gas-based fire suppression system, it should be on a regular maintenance schedule with a fully qualified vendor -- one that won't set it off while checking it. 

This is a sampling of what should be included in a thorough data center facilities operation and maintenance plan -- check it against your current plan and update as needed. If you don't have a data-center-specific maintenance program for your operation, use this list to develop one, and get everyone on board.

About the author:
Robert McFarlane is a principal in charge of data center design at Shen Milsom and Wilke LLC, with more than 35 years of experience. An expert in data center power and cooling, he helped pioneer building cable design and is a corresponding member of ASHRAE TC9.9. McFarlane also teaches at Marist College's Institute for Data Center Professionals. 

Dig Deeper on Data center design and facilities

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

What safety or cleanliness violations do you see in the data center? What's your biggest pet peeve regarding data center maintenance?