Probably the hottest buzzword in the data center industry today is data center infrastructure management. Every...
product seems to offer some sort of DCIM capability, and there are several stand-alone DCIM systems that claim to cover everything. But what is DCIM really? What should it be able to do?
The emergence of data center infrastructure management (DCIM) monitoring tools gives rise to a number of questions. What makes DCIM valuable to your enterprise? Will it help improve your bottom line? Will it do everything you need initially and will it grow with you into the future? How much staff time will be needed to implement it and keep it updated? These are the questions to ask both your vendors and yourself before jumping into any DCIM system.
The early days of DCIM were about managing floor space and keeping track of assets. The power utilization effectiveness (PUE) metric has spurred it to become an all-encompassing tool for monitoring the entire data center infrastructure. If you're really concerned about energy efficiency, improving PUE and energy cost savings, you need a whole range of power and cooling information as well as asset management, which can be stored in a DCIM system. The phrase, "You can't manage what you can't measure," has never been more true than with the challenge of reducing energy use and cost.
For many data centers, having data on room temperature, rack power draws and alarms for uninterruptable power supply (UPS) and air conditioner failures is sufficient. DCIM solutions are available in a variety of forms from several hardware and software vendors – from those who make intelligent plug strips, CRACs and humidity sensors to asset tracking and cabinet security access – and many will integrate those fundamental parameters into a cost-effective package that may do even more. But for large facilities, especially those that want to track PUE and maximize the efficiency of both energy and computing usage, more information will be necessary.
You could implement discrete DCIM solutions from each vendor you use – separate units for UPS and air conditioners, rack power, the central cooling plant, the generators and asset management. But creating a major DCIM system in this manner would result in a confusing array of disparate displays, reports and data listings that would likely overlap and become unwieldy. More likely, most of the systems would fall into disuse, resulting in a lot of money spent on technology with no management benefit.
Everything in the data center is now inter-related and co-dependent. Servers draw more power as processor utilization increases, then ramp down with reduced computing load. That affects cooling requirements, which in newer facilities should be delivered from air conditioners with variable speed controls. Those, in turn, should cause changes in pump speeds, chiller capacities and cooling tower operation. In a well-designed infrastructure all this should be automatically balanced, but it still needs to be monitored to make sure it's working right.
Information from your DCIM system should also help determine how best to deploy and utilize computing hardware within your data center so everything will run as efficiently as possible. Let's say your cooling can handle a concentration of blade servers, but only if the computer room air conditioners (CRACs) run at maximum speed. A different deployment might use less energy, but you won't know that unless you can see the power draw and operating points of everything in the chain. And you won't really know what would improve operation unless you can model the alternatives and see what they would mean.
In short, the significantly expanded role of DCIM also brings with it a considerable increase in complexity which requires a well-integrated solution.
Lofty requirements for integrated DCIM systems
There are two major things to consider when examining modern, robust DCIM approaches: universality and data handling.
A truly universal DCIM product must meet two huge requirements. First, the system must be able to connect to air conditioners, UPS systems, power strips, PDUs, servers, chillers, pumps, sensors for temperature, humidity and pressure, flow meters, cooling towers, generators, battery monitors, lighting controls, fire protection and security systems, computing hardware, and anything else involved with the operation of the data center. Second, it must do all this while being vendor agnostic. It must seamlessly interface to every manufacturer's hardware and pass all the data available to the DCIM system with complete transparency. That can be difficult considering all the different equipment in the complex infrastructure of a data center and the variety of data and alarm protocols used. And, of course, a full DCIM solution should also include the fundamental capability of tracking assets.
The second requirement is data handling. The broad expansion of DCIM brings with it a data explosion. If you're really measuring and tracking all aspects of a data center, there's just too much for anyone to absorb. Most data center equipment is now network-attached and IP-addressable. Air conditioners and UPS systems can deliver as many as 256 data points. Newer computing hardware can spit out mountains of measurements on internal temperatures, air flow, fan speed and processor utilization. That volume of data is more than you should ever need – or care about – unless you're a manufacturer gathering a history on wear, performance and energy efficiency over time. For the average user much of it will be meaningless. However, your DCIM system will need to capture it all in order to avoid missing the parameters that do matter to you.
How do you deal with such large volumes of data gathered every hour of every day? Turning all that data into information is what really distinguishes an OK DCIM solution from a good one. To be useful as a management tool, all of the physical, electrical, mechanical and operational aspects need to be integrated and do the following:
- Alert you to potential problems before they occur by graphically highlighting indicative anomalies and changes in operating parameters.
- Demonstrate the ability to quickly and easily "drill down" to get more detail about any condition, which should be presented in clear graphic format down to the raw data.
That's a challenging programming task to be sure. With the variety of equipment types and manufacturers that make up a modern data center, just showing pretty 3D pictures and bringing up alarms after problems have already occurred is not enough. What's important is whether the information necessary to manage operations is all there in the first place. Only then can consideration be given to whether displaying it in 3D adds understanding and makes it quicker for users to grasp the meaning and take action.
Three more perks of robust DCIM systems
Once you determine that a DCIM system covers the basics, there are three other aspects to make up a robust DCIM system. One is whether you can run "what-if" scenarios to see the effects of adding equipment or see what happens if something in the infrastructure fails. This can be a great help in planning where to locate new hardware. Some systems even have computational fluid dynamics (CFD) integration. A properly-constructed CFD model interpreted by someone with solid knowledge of air flow and the CFD system can be a valuable addition. However, CFD is a prime example of "garbage in, garbage out" (GIGO). A CFD model can be constructed to show just about anything as being good or bad, so when it's part of a DCIM implementation, it must be validated against actual field conditions on a regular basis.
The system must also be able to grow and adapt to future needs. There is always new hardware and a continuous flow of new methods and techniques for modernizing data center infrastructure, so a robust DCIM solution should be able to integrate your future needs. Ideally, this can be done with a modular solution that requires you to buy only the pieces you need at the outset and to grow in a coordinated manner as your requirements change.
Finally there's the initial creation of the fundamental information, database, graphics and support required to maintain it going forward. Many DCIM systems fall into disuse because personnel were not available to keep them up-to-date.
It is becoming impractical to run even a small data center without some form of DCIM. Simply relying on alarms and sporadic readings from equipment display panels is insufficient. Ask yourself how much you need to monitor and what resources are going to be necessary to implement and maintain the level of control you want. If basic monitoring is necessary, it might still be good to look at something that can do more than you initially require; your needs will probably increase over time. But if you know you need a full, robust DCIM tool for use in managing a major operation, what may be most important – after ensuring all necessary interfaces are available – is the way data is translated into information and displayed.
About the author:
Robert McFarlane is a principal in charge of data center design for the international consulting firm Shen Milsom and Wilke LLC. McFarlane has spent more than 35 years in communications consulting, has experience in every segment of the data center industry and was a pioneer in developing the field of building cable design. McFarlane also teaches the data center facilities course in the Marist College Institute for Data Center Professional program, is a data center power and cooling expert, is widely published, speaks at many industry seminars and is a corresponding member of ASHRAE TC9.9 which publishes a wide range of industry guidelines.