PRINCETON, N.J. – The Princeton Plasma Physics Laboratory recently installed a cluster of Sun x2100 servers to study fusion energy, which mimics the Sun by fusing hydrogen atoms to create neutron energy.
"It's sort of the holy grail of energy," said Paul Henderson, the lab's head of systems and network engineering. "You're using hydrogen, which is plentiful and available on the earth, to produce energy without any pollution and without any radioactivity."
|Click for photos from the Princeton Plasma Physics Laboratory|
The lab claims that the top two inches of water at Lake Erie contains more potential fusion fuel than the entire world's known oil reserves.
To mimic the Sun, the lab wrapped huge magnetic coils around a vacuum chamber. The magnetic fields created by the coils allow the plasma containing energy to float in a vacuum chamber. Physicists control the magnetic fields to shape the plasma in order to study the energy in different states, which is recorded in microsecond phases.
"One of the more interesting things is [the servers] look at the magnetic field data in real time and give you a simulation of what the plasma did inside the vacuum chamber," said Raymond Camp, a senior staff engineer at the lab.
Datel servers out, Sun servers in
But the old Datel AMD Athlon-based server cluster that used to crunch algorithms for the floating balls of plasma wasn't cutting it.
It would often take hours for the lab to analyze the numbers and publish the results -- so much so that the physicists would go on coffee breaks while they were waiting.
Last spring, Henderson installed 180 Sun Fire x2100 servers running a Red Hat-based OS kernel with all the daemons stripped out to avoid surprise interrupts that could cause the number-crunching to hiccup. At first, he ran it side-by-side with the 200 Athlon server cluster so physicists could get accustomed to the change. But, when researchers discovered that the Sun cluster could crunch algorithms in less than five minutes, the Datel cluster was left behind.
"We were getting between 300 and 400 percent faster performance on our code, much more reliable hardware, and basically everybody just abandoned the old cluster and we shut it off quicker than we thought we would have to," Henderson said.
Power savings to boot
In addition, Henderson said the lab -- which is run by Princeton University but funded by the U.S. Department of Energy -- is now saving $80,000 a year in power and cooling costs, and getting much more reliable servers. He attributes that partly because of the change in hardware, but also because of a redesigned data center.
Henderson's story goes further back than just acquiring the Sun Fire servers. When he first arrived at the lab, half the jobs the cluster ran would fail, a rate he found unacceptable. A few factors went into this, with two of them being data center facility-related, and one regarding the design of the Datel servers themselves.
In the data center room, all the systems faced the same way so that the hot exhaust from one row of servers would blow directly to the intake of the next row. Moving to a hot-aisle, cold-aisle configuration fixed that.
Next, there wasn't enough humidity in the room, creating so much static electricity that when an operator would come in the front door, servers would just shut down. Turning up the humidity levels on the CRAC units solved that problem.
But one of the lab's power issues could only be resolved by purchasing new hardware. Henderson said the air flow design within the Datel servers was not up to par. Cooling fans within the box that were meant to help cool off CPU heat were blocked from blowing air to the processors by the servers' RAM cards. And the fans on the processors themselves pointed up, rather than toward the back of the server, so that hot air would get trapped in the quarter-inch space between the fans and the chassis. This heated up the inside of the box, often shutting it down. The new Sun servers, Henderson said, have much better airflow within the box.
With an improved data center design and the new servers, Henderson said the failure rate is now less than 1 percent.
"The Sun hardware all works as planned and we really didn't have any surprises," Henderson said. "It was just a very smooth transition."
Dig deeper on x86 commodity rackmount servers