Manage Learn to apply best practices and optimize your operations.

UPS management checklist

UPS management checklist: Business continuity efforts will end in a fizzle if your power fails. Here are ten ways to make sure your data center is always plugged in.

Regulatory compliance has become a legal and necessary extension of business continuity and disaster recovery planning. IT departments need to take steps to ensure their companies do not run afoul of an increasingly complex set of laws and regulations relating to data integrity and availability.

Therefore, managers must have a keen understanding of the regulatory compliance issues that impact business continuity planning and just as important--how mission-critical power, cooling, and monitoring strategies support business continuity.

Here are some consideration for determining the adequacy of your power and cooling protection.

1. If you rely on backup generators, is your generator sized adequately to power the cooling system and critical computer systems?

I t may seem basic, but ensure that not only your critical computer systems, but your cooling systems, are supported by back-up generators. With the densities of today's computer systems, computer rooms can heat up fast if computers continue to operate on backup power without precision cooling. Also, be sure your automatic transfer switch is configured so that the lag time between switching from generator back to utility is short enough not to disrupt UPS power to your computer systems.

2. Is mission-critical network equipment— whether in the data center or at remote locations— being protected by line-interactive UPS systems?

I f so, your network may be at risk. Liebert recommends against using line-interactive UPS systems for mission-critical applications. Line-interactive UPS units are only 85% effective against power anomalies in general and have little protection against surges, harmonics and frequency variations. In high-speed networks, some Liebert customers have found that in going to battery, line-interactive UPS systems drop loads, resulting in data losses. Use of double-conversion on-line UPS systems ensures continual power conditioning and maximum protection against even daily power anomalies.

3. Are you relying on UPS batteries to ride through daily power fluctuations?

I f you are using line-interactive UPS units, these units go to battery every time a power anomaly occurs, which wears down the batteries more quickly than the batteries of double-conversion UPS systems. Perform a cost-benefit analysis, including battery replacement costs, for switching to double-conversion online UPS systems, which do not go to battery as often.

4. Are your UPS batteries fully charged and do you have a testing and replenishment plan in place?

I t sounds elementary, but many IT managers do not know the status of their UPS batteries. Be sure to implement a policy of regular battery inspections, and have a replacement program that makes replacement manageable. Ten Questions to Ask

5. Have you performed an analysis of your power protection strategy to ensure that you are not sacrificing reliability as you build out your network?

B uilding redundancy into the power system is a proven strategy for increasing power system reliability and, consequently, network availability. Redundancy enables maintenance of a UPS module without affecting power to connected equipment and also increases fault tolerance.

Redundancy is typically achieved though either an N + 1 or a 1 + 1 design. In an N + 1 design, also known as a parallel-redundant system, multiple UPS modules are sized so that there are enough modules to power connected equipment (N), plus one additional module for redundancy (+ 1). During normal operation, the load is shared equally across all modules. If a single module fails or must be taken offline for service, the system can continue to power connected systems.

In a 1 + 1 design, two UPS modules are sized so that either module is capable of carrying the entire load. While 1 + 1 systems deliver a significant improvement in availability over N + 1 systems and are regularly specified for the most critical applications, N + 1 remains a viable and popular option for applications seeking to balance cost, reliability and scalability. However, a statistical analysis of the projected reliability for multi-module systems reveals the point at which the risks of power system scalability to network availability clearly outweigh the benefits.

Downtime attributed to additional modules remains fairly flat — and acceptable for many applications — up to the 3 + 1 level. At 4 + 1 and beyond, power system availability begins to drop dramatically and downtime increases substantially for each module added.

For instance, if UPS reliability is .9995, a 13+ 1 architecture will be down about 90 times as often as a 1 + 1 system. This is particularly problematic because modules are added to an N + 1 system as the load increases. Typically, a load increase correlates with an increase in network criticality. So, a "scalable" N + 1 architecture is actually responding to an increase in network criticality by reducing system availability. Ten Questions to Ask

6. Are you monitoring for heat and humidity in critical computer areas, including small rooms and server closets?

T emperature increases of 10 degrees above 75 degrees F reduce the lifespan of network equipment by 50%, and heat is the biggest threat to UPS battery life. Yet, heat density is becoming a bigger issue, as more and denser equipment is packed into small spaces. Without temperature and humidity monitoring these spaces are at risk of overheating and the downtime that may result. Small-space precision cooling solutions can help you overcome these risks.

7. Is there sufficient circulation of air to cool the computer systems area and the UPS system room, especially if you are using blade servers or other high-density equipment?

T he use of blade servers has increased heat densities in data centers and computer rooms. When heat densities approach 100 watts per square foot, it's time to consider spot cooling systems to resolve hot spots. Also, ensure that you have proper air circulation and configurations to help resolve these increased densities. The rule of thumb for cooling computer equipment is one air change per minute. Do not allow hot air from computer systems to be forced back onto the computer systems or the device can ultimately be damaged. Racks should be configured in a hot aisle / cold aisle arrangement to ensure that cold supply air is directed to the cold aisle through perforated tile or an overhead system.

8. Do you have dual UPS units for all network equipment that has dual power supplies?

N etwork equipment manufacturers specify that their dual power supplies are for dual UPS units to ensure redundancy. Some equipment requires that both power supplies be used for the equipment to operate, and some equipment has three power supplies and requires two for operating.

There are several ways to protect this equipment. You may want to use a different UPS for each power supply and provide separate circuits for each UPS. You may also want to run communications software between dual UPS units supporting a single load to ensure that if one UPS goes down, the other UPS will continue to support the load and not initiate a graceful shutdown if not desired. Ten Questions to Ask

9. Can you keep mission-critical loads online while doing UPS maintenance?

E ven in applications where UPS redundancy is not feasible, solutions are available that allow mission-critical systems to continue to operate during UPS maintenance. A maintenance bypass switch provides the capability to switch mission-critical loads to utility power during UPS maintenance.

10. Is your UPS monitoring application set to notify you when the load has exceeded 80% of the UPS system's maximum capacity?

M any IT professionals prefer to set alarms to trigger when loads reach less than 80% of capacity. Above this threshold, the UPS system may be forced to go to unplanned bypass. To avoid this, reallocate power or add power equipment as necessary instead of resorting to an unplanned transfer to bypass.

From the white paper: Regulatory Compliance and Critical System Protection: The Role of Mission-Critical Powerand Cooling in Ensuring Data Integrity and Availability, Developed by Liebert Corp. with the assistance of Network Frontiers.

Dig Deeper on Data center design and facilities