The concepts of power efficiency and management are easy to talk about, but there’s not as much attention or conversation when it comes to actually delivering accurate and meaningful power data. Without the right data, collected from the right points in your enterprise, all of the efficiency assessments and energy management tools in the world won’t do you very much good. You might wind up overestimating or underestimating your energy efficiency, resulting in wasted money and lost opportunities to optimize your data center.
In this podcast Steve Bigelow, senior technology editor, sits down with Pete Sclafani, CIO and co-founder of 6connect, an Internet infrastructure management company located in Palo Alto, Calif., to talk about measuring data center power.
You can also listen to a podcast of this Q&A here:
Steve Bigelow: What's the best technique to measure power that’s delivered to the data center and equipment?
Pete Sclafani: It depends on the frequency requirements of energy audits. For some companies, doing an audit once a month is sufficient for their needs, but in other cases, you may be using power data for billing purposes. So, a once a month snapshot of your data center power use is not going to be as accurate as you'd like it to be. You want to have some data of power delivery over a range of time.
You can do manual measurement of the circuit at the panel. That is easy to do with a calibrated clamp meter. You want a qualified electrician, because you have to have open electrical panels to do the measurement. You want to be very safe and follow all of the safety precautions. Depending on the location of the panel, this may be somewhat invasive. If the panel is near a customer's rack of equipment, you might have to schedule a downtime to make sure there isn't a risk for electrocution and people touching things that shouldn't be touched. Another method that's not manual is using current transformers (CTs), where you have the CT wired to each circuit. Those go back to a gateway, and that gateway starts collecting data for you and putting it into a database. This, of course, would require calibration of the CTs. Depending on the brand of CTs, it could take six months or it could be a year. But, either way, you have to bracket some regular interval of time and calibrate those devices.
The ideal technique we found is to use some combination of those two methods. Then, you have some way of calibrating data on a regular basis, and you're also able to get historical, or even real-time, information on power consumption using CTs spread throughout the data center. It's tough to get real-time power consumption data if you have to do manual measurements every time you want to do an audit.
Finally, when doing an audit, you want to use similar times of the day and days of the week. In much the same way as you have Internet traffic fluctuating throughout the days of the week, power consumption can fluctuate as well. Depending on the efficiency of your equipment, you may see different power consumption figures for off-peak measurements. One example we have is a client of ours that did a lot of audio and video streaming. They have a consistent flow of power usage throughout the day, but when people went home and started streaming more information to their home environments, they would see a spike in both traffic and data center power usage. So again, the more data points you have, the better.
Make sure your batteries are fresh when you're using these clamp meters. It's always helpful to make sure your equipment is up to snuff and calibrated, but batteries tend to be overlooked. We tend to assume they will be there and they will always work.
Bigelow: Is it OK to rely on data from power distribution units (PDUs) as a basis for efficiency evaluations? Is there a preference as to how or where that PDU data is collected?
Sclafani: One of the issues that we've seen with measuring power in any environment, especially in a data center, is that depending on the termination point, you're going to find different values on the same circuit. So, the power consumption value that you get at the rack level is going to be different from what you get at the breaker box, and that's going to be different from what you get at the PDU on the floor.
One of the things that most electricians take into account is power loss. As you go through the step-down locations where power is changing form–changing voltages or passing through a junction or breaker–that amperage, or that power, is going to change in some way. It's very easy to compare the measurement from the PDU on a data center floor to the value you get at the breaker itself. Then, compare that down to your rack-level PDU, and that number will change. That's helpful for you to know because that ties correctly into the efficiency of your equipment and infrastructure.
If you're doing any sort of audit, the secret to gathering these meaningful data points is calibration and consistency. You want to make sure when you are comparing data across different power sources that you do it consistently. For example, if you're trying to get efficiency numbers for rack-level equipment, compare the rack-level PDUs by themselves to start, and then look at the circuits at the breakers. That will tell you both the delta between those two measurements and the delta between the racks. That will hopefully give you some more meaningful data. As long as you're consistent with the measurement and the location of where you're getting those measurements, it's definitely worthwhile for evaluating efficiency. I don't think there's a wrong data set that you can pull from.
Bigelow: Is it possible to generate alerts or notifications when data center power demands exceed preset limits? Are there other related warnings or alarms that IT admins need to think about?
Sclafani: That's a great question. I think there are two parts to it. One is going to be just collecting the data. Where does that information live? We've seen it stored in everything from spreadsheets to integration into enterprise resource planning billing systems. It really just depends on how you get that data into your repository–whatever that may look like. It may be a data warehouse, or it may just be a text file that you update once a month.
Now that is has been collected, you want to make sure you do have the ability to set those systems to create alerts or notifications. Most PDUs you purchase for the data center, either for a rack level or floor level, have some sort of functionality built-in that allows you to do Simple Network Management Protocol (SNMP) reporting or interaction. You can hook this up to a Nagios or Cacti, or other customized software system, and tie it into whatever you are using for your alerts or monitoring system.
As far as preset limits on a circuit, the National Electrical Manufactures Association standards have preset limit of circuit utilization of 80%. For example, if you have a circuit at 30 amps, you obviously don't want to go above 24 amps. If you do go above that percentage, there is a risk for damage from the excessive heat of that circuit. You need to be very careful. One of the ways that we recommend dealing with this is setting up alerts to provide some leeway for spikes. For example, a circuit may have servers attached to it, and it may go above 80% for a day. Do you want to get an alert every time that happens? Is there a rule for how long that is an alert state before you are notified? That's one of the challenges that IT and operations need to talk about. They may realize the excessive draw on that circuit is just the way it's going to be, so adding another circuit and balancing the load [may be the answer].
Bigelow: What other mistakes do organizations make when measuring or calculating PUE, or doing any other kinds of efficiency evaluation for that matter?
Sclafani: There are two big mistakes. One is data verification and the other is selective amnesia. Verifying data is key, because when you agree to take a measurement at a certain point, it helps to do some sanity checks with that power consumption data, either upstream or downstream, to be sure it makes sense. Typically, you're going to do a power audit at the breaker. That is a fairly reliable set of information, especially if you do it manually. You can compare that to either CT data or to rack-level data, just to see if the numbers make sense. If there are any outliers, that may be a sign of either weird wiring or a circuit that is mislabeled.
The other mistake is selective amnesia. When you're calculating power usage effectiveness (PUE), you're looking at IT load versus operational loads needed too cool that infrastructure. There are different interpretations to it. Some people may say that they don't factor in the lights that are used in the data center or don't factor in the power used to cool the office space. The trick is being consistent on what you're measuring, how you're measuring it and what factors go into that PUE.
In one audit we participated in, the data center tenant had given us a list of circuits to identify and audit against. The list was originally provided by the data center operator, and it was what was billed against, so it was considered accurate. We did a two-phase audit, one at the breaker level and again at the rack level. This produced some surprises when we realized there were several circuits in use that were not being billed. There were other circuits being billed that no longer existed. Needless to say, the data center tenant and the operator had not reconciled that level of detail. That's something that a third party can help with by acting as a mediator. Of course, tracking this data historically can be a big help. But taking on another IT project requires getting facilities and IT in the same room and agreeing on requirements and implementation of the budget. Sometimes, that can be a challenge.