If your organization has IT money to burn and doesn't mind being locked into a proprietary system, then buy a supercomputer for your high-performance computing (HPC) applications. If you're not living in that kind of IT ivory tower, you should join the crowd that's checking out the new capabilities of HPC Linux clusters, says Len Rosenthal, co-founder and marketing vice president for PathScale Inc., a provider of hardware and software clustering systems based in Sunnyvale, Calif. In this interview, he points out the strengths and weaknesses in HPC of Linux clusters and the top two processors.
Why are and why aren't HPC users moving to Linux?
Rosenthal: Instead of large SGI Origin, 64-way Sun SPARC or big IBM systems, they can deploy much more cost- effective Linux clustering solutions at a fraction of a cost.
An HPC Linux cluster gives better performance than the old alternatives, like Cray supercomputers, at a fifth of the cost. Sure, Cray machines had and have good performance and a good architecture, but they are prohibitively expensive compared to commodity clusters, and they're proprietary.
However, only 30 percent of the HPC market is running a Linux cluster today. The other 70% wants to know: How do I get applications off of my huge system and onto clusters? That's why giving them technologies that enable that shift is critical to broader adoption of HPC on Linux clusters.
Could you cite an example of an application that has been moved successfully from supercomputers to Linux clusters?
Rosenthal: There are a bunch of new applications or, let's say, new problems that people are trying to solve with more cost-effective architectures. Weather analysis, weather modeling and climate analysis has traditionally been done on SMP systems, but some organizations have migrated to Linux clusters because improved cluster architectures enable better resolutions. Instead of weather grids of 15 miles by 15 miles, they can get a three-mile by three-mile grid. So, with the Linux cluster, their grid sizes are dramatically lower with a better interconnect. That's a more cost-effective approach.
How does an HPC-on-Linux cluster compare in cost to a typical low-end cluster, like the kind used for storage or Oracle Real Applications Clusters (RAC)?
Rosenthal: The hardware cost of a Linux cluster is going to be the same. The only difference is the interconnect. On Oracle RAC today, for example, gigabit Ethernet is used as the interconnect for Linux hardware. In clustering, Infiniband is considered by many to be the next evolution of interconnect. Gigabit Ethernet is dramatically less expensive, probably free on most motherboards. You still have to pay for the switching, but the performance is just unacceptable for any kind of scaling. That's the reason why you don't see any Oracle clusters scaling beyond eight or 10 nodes. That's why Infiniband is attractive because the latency gets dramatically reduced and the bandwidth gets dramatically increased.
As for pricing, I can only speak for PathScale, which hasn't released pricing yet for (the Infiniband-compliant interconnect technology) InfiniPath. I can say that it will be below $1,000 a node.
Is there a preferred processor for HPC on Linux clusters?
Rosenthal: There is a difference between an AMD approach versus an Intel approach. Intel does not have HyperTransport technology on its chips. So, what that means is that they have to rely on a PCI-X or PCI Express adapter card to get lower latency. To implement any PCI is an additional chip crossing, which means that you need another bridge chip in there. So, that is inherently slower, up to about 400 nanoseconds slower. HyperTransport is the fastest way to connect to the CPU.
PathScale runs on AMD. Typically, clusters built with AMD Opteron are going to be more scalable and efficient than any clusters built on Intel. You just can't get the best memory latencies on the Intel system. In my opinion, AMD today is already the best 64-bit building block for HPC clusters. I think Intel came out with a 64-bit compatible X86 chip because Intel's Itanium wasn't getting any market acceptance.
In clusters, is the importance of bandwidth diminishing as the importance of latency increases?
Rosenthal:Yes. Picture, if you will, a water pipe. Bandwidth is the size of the pipe, and latency is the speed of what is going through it and how much water is going through it. Bandwidth is important, but we've reached a point of where the pipe is getting pretty large, 10 gigabits per second. Bandwidth is not the bottleneck and is not stopping scalability. Latency, how fast you can get information in and out of that pipe, is the bottleneck.
Who are the buyers for HPC Linux clusters today?
Rosenthal: In the business world, manufacturing companies like Ford, General Motors and Boeing use HPC clusters to do large simulations. Oil and gas companies, like Chevron-Texaco, are doing reservoir simulations and in the past have used SMP systems primarily, but they are starting to use Linux clusters.
Then, there are traditional HPC buyers: scientists, engineers and/or researchers at bio-science or drug companies, universities, government labs, NASA, and military agencies. Also, pharmaceutical and biotech are hot markets now for HPC on Linux clusters.
For more information on Linux clusters and HPC:
Cray CTO: Supercomputers outshine Linux clusters for HPC