The hidden costs of scale-out supercomputing

Since the mid-1990s, supercomputer users have moved from scale-up big-box servers to scale-out clusters. But some say beware the hidden costs.

PORTLAND, Ore. -- For the past two decades, scale-out server cluster architecture has been the dominant trend in supercomputing, as users and vendors move away from big iron high-performance computing (HPC) -- but at what cost?

For on high-performance computing:
High performance computing fast guide

Sun targets high-performance computing shops

At the Supercomputing 09 conference this week, HPC users and experts weighed in and said that scale-out supercomputing with commodity hardware has been cheaper, but the drawbacks are worth considering.

Price versus reliability
Isaac Traxler, a manager of HPC operations for the supercomputing lab at Louisiana State University, has seen both sides of the coin. The first HPC platform at LSU used IBM's big Unix p575 technology.

But when LSU bought a new system two years ago, it went the Intel cluster route: Dell servers with InfiniBand connections, running a modified version of Red Hat Linux.

The big difference, according to Traxler, is price versus reliability. "The p575s were total redundant systems. Anything that could fail on the server would not cause a disruption in service," Traxler said. "You pay for that redundancy: that magic console that calls IBM before you even know anything is wrong and IBM shows up at your door with parts in hand."

On the other hand, with the Intel cluster -- the longest job the supercomputing users could run was three days. Odds were that one of the components would fail in that time frame. On the p575, the job queue limit was two weeks, but Traxler said you could have gone months.

"If you're buying HPC equipment, you take that choice: top of the line or inexpensive as possible. Both have advantages," Traxler said. "What I would love to do is have Power7s for the price of Dells."

Another problem with the scale-out computing according to Traxler is that he spent half the project money on InfiniBand equipment to connect up the servers. But with scale-up computing, that cost largely goes away. "When my machine is 256-way, I may only need an interconnect to get from the server to disk," Traxler said.

The days of massive cash outlays for InfiniBand may be drawing to a close. Already, Traxler said there were 256-node servers at the IBM booth, and he expects that over the next couple of years Intel-based machines will scale up to this level.

"It's easy to buy 64-way Intel machines today. And that's going to affect companies like QLogic and the Mellanox, all these folks making their living off InfiniBand." Traxler said.

The rise of the purpose-built supercomputer?
Michael Bernhardt, the communications chair for the Supercomputing conference has been involved with HPC for more than 25 years and he expects there will always be a need for high-end big iron. But commodity cluster-based HPC has pushed supercomputing beyond the scientific academic community.

That shift has been taking place since the mid-1990s, and Bernhardt sees the paradigm evolving again.

"People are moving toward purpose-built projects," Bernhardt said. "There is so much to be gained from the user standpoint when you're given a system that is close to being specifically built for what you need. For example, a scientist involved with advanced drug discovery is asked by the IT group to work on a particular cluster that has 20 open source [projects] running on it. He needs to be a computer scientist, not just a drug researcher, and that's not the best way to do science. Imagine the productivity gains if the system has the applications -- ported and tuned --plug and play."

HP bets on scale-out, bigger memory on x86
HP launched a batch of HPC-focused ProLiant servers at the event this week: high density, two-socket machines for scale-out environments, using the latest Intel and AMD processors. While these machines are the bread and butter of HP's supercomputing portfolio, the company still sees demand for scale-up machines.

"When we look at the various high-performance apps, there is a subset that requires larger amounts of memory or an app that doesn't scale in the clustered environment," Ed Turkel, manager of business development for HP's Scalable Infrastructure said.

"Typically the issue is large memory. It used to be that the only way to get large memory footprints was to get a Superdome. What we're seeing now is that we're able to put a memory footprint in a smaller x86 SMP system that satisfies that requirement without an enterprise-class server."

Turkel said the HP DL785 G6 is an eight-socket machine using AMD Opteron processors and 8-Gigabit dims -- it has a memory capacity of half a TB.

The new products announced at the show include:

  • HP ProLiant BL2x220c -- two blade severs in the physical space of one and boasting 33% higher memory capacity than the previous generation.
  • HP ProLiant SL165z -- a "skinless" AMD Opteron-based server built on a lightweight rail and tray design that HP claims will reduce capital, facilities and shipping costs while using a fraction of the space normally required within a data center.

What did you think of this feature? Write to's Matt Stansberry about your data center concerns at

Dig Deeper on Data center budget and culture

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.