News Stay informed about the latest enterprise technology news and product updates.

SPECpower benchmark has flaws, says analyst

The SPECpower benchmark for measuring a server's power-performance doesn't accurately reflect real-world configurations, says a Burton group analyst.

A Burton Group analyst says a recently released benchmark for measuring a server's power efficiency has enough faults to make its relevance to end users questionable.

Nik Simpson says the guidelines for Standard Performance Evaluation Corp.'s SPECpower benchmark don't reflect real-world circumstances, and users need to know that before looking at the power-performance numbers.


In particular, Simpson noted that with rackmount servers, vendors need test their products with only one power supply. Most enterprise data centers install servers with two power supplies for redundancy. Earlier this year, SPEC announced support for blade servers in its SPECpower benchmark, but Simpson said that there is no standard power supply guideline that carries over from rackmount to blade servers.

"If you understand the benchmark and don't try to compare blade results with rackmount server results, it gives you a reasonable inkling," he said. "But vendors are often optimizing their results, and that ends in configurations that customers wouldn't run."

The reliability of power-efficiency benchmarks
A common complaint about benchmarks is that they don't reflect real-world results. End users often say they therefore don't factor benchmarks into their purchasing decisions.

"We don't benchmark in the way of running some third-party benchmarking application," said Matthew Leeds, the vice president of IT operations at the Emeryville, Calif.-based digital media company Gracenote, regarding his decision to upgrade to six-core AMD processors. "We put our application on the box and run it and see how it behaves, and that's our benchmark. It's interesting to look at third-party benchmarks but they're not relevant. What matters to us is how our set of applications behaves."


Power-performance benchmarks have gotten even less attention from end users -- at least right now. Data center managers have been mostly indifferent to the federal Environmental Protection Agency's Energy Star specification for servers, which was released in May. Either power consumption isn't a priority, it's too early in the game, or there's question about whether it's truly more efficient to buy a bunch of small Energy Star servers rather than a bigger box that consolidates them all.

The SPEC organization has worked on changes to the power-performance benchmark, including the expansion of workload types. Currently it measures only performance of server-side Java compared with power consumption.

"We're talking about what we can do inside of SPEC to create more benchmarks with a power element included," said Klaus Lange, a Hewlett-Packard Co. senior design engineer and SPECpower committee member.

Greg Darnell, another committee member who is an engineer at Dell, added that comparing the power-performance of blades to rack servers is hard. It is not an "apples to apples" comparison because of different form factors, he said. While rackmount servers have their own power supplies, blade servers within a chassis share power supplies. Unfortunately, that leaves end users on their own when they're trying to determine whether blades or rack servers are more power-efficient.

Another criticism, said Simpson, is that the SPECpower benchmark allows the vendor to disconnect all network interfaces before testing.

"How many servers do you see in the enterprise that aren't attached to a network?" he said.

Darnell said the intent of the benchmark is to not put systems with more networking features at a disadvantage. If there are two otherwise identical systems but the first has one onboard port and the other has eight, the committee decided that these systems should have matching results on this particular workload.

Meanwhile, SPEC president Walter Bays defended benchmarks with an auto-racing analogy. If drivers' mothers told them to drive safely and slow around the turns, the race might be more reflective of real-world driving but it wouldn't be a good comparison of auto technology or driver skill, he said.

"Similarly, tuned benchmark results gives us the most fair comparison of the ultimate capability of the hardware and software systems under test," he said.

But Simpson said SPECpower could better represent the real world in its work.

"Maybe the configuration tested must be the most popular configuration bought for that model," he said. "It would at least be something to stop vendors from creating artificially power-efficient systems just for chest-beating on SPECpower."

Let us know what you think about the story; email Mark Fontecchio, News Writer. Also, check out our blogs: Data Center Facilities Pro, Mainframe Propeller Head, and Server Farming.

Dig Deeper on IT compliance and governance strategies

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.