WavebreakMediaMicro - Fotolia
- Alex Barrett, Modern Infrastructure Editor-in-Chief
The frequency with which components of data center systems break up could put some Hollywood couples to shame.
While the "in" thing in system design is convergence, some forward-looking technologists are aiming toward disaggregated servers. It's conscious uncoupling, data-center style.
Networking and storage are frequently purchased and configured separately from servers. Disaggregating systems takes things a step further and targets the processing, main memory, and the input/out (I/O) subsystem -- "the three piece suit" which makes up every system, said Dr. Tom Bradicich, Hewlett-Packard vice president of engineering for servers.
Disaggregation is particularly attractive among hyperscale cloud service providers, which see disaggregation as a way to achieve more flexible systems and fewer underutilized resources.
"In the public cloud, you're playing a multi-billion dollar Tetris game," said Mike Neil, Microsoft general manager for Enterprise Cloud. "You have all these resources manifested as physical systems, and the challenge is to be as efficient as possible with those resources."
CPU: Memory: I/O
In today's traditional servers, the ratios of CPU to memory to I/O are mostly unchangeable. With disaggregated servers, those systems are separated into discrete pools of resources that are mixed and matched to create differently sized and shaped systems. Data center architects can then "compose" systems through an orchestration interface that are CPU-, memory- or I/O-intensive, depending on workload demands, and then tear them down to recreate another system with a new profile.
The push for disaggregated servers could trickle down to enterprises and high performance computing (HPC) environments, if the economics are right, said Kuba Stolarski, IDC research manager for server, virtualization and workloads.
"The standard justification for disaggregated systems is the refresh or maintenance story," Stolarski said. "Today, if the processor is due for a refresh, then the whole box is going to go," even if the other components are still viable.
Everyone who spends a lot on servers cares about saving money, not just hyperscale folk, said Todd Brannon, Cisco director for product marketing for its Unified Computing product line. The CPU represents about two-thirds of the total system cost, and the rest (memory, I/O subsystem) typically doesn't need replacing as often as the processor. In a disaggregated system, "just replace the processing cartridge and maintain the investment in everything else," Brannon said.
But plenty of things could prevent disaggregated systems from excelling: Fundamental laws of physics and engineering, a failure to get the required technology costs down and management complexity.
Wanted: Superfast fabrics
Is the first disaggregated system already here?
Cisco announced the UCS M-Series Modular Servers, the first example of a commercially available disaggregated system, said Brannon. The front of the 2U M-Series chassis holds eight processing cartridges that contain CPU and memory --connected to disaggregated storage and connectivity resources via Cisco's Virtual Interface Card.
The holy grail of disaggregation is between CPU and main memory, so systems vendors need to deliver better, faster bandwidth between those components.
"The biggest challenge of disaggregation is the interconnect," said HP's Dr. Bradicich.
Today, the distance between processors and main memory is measured in inches, "but for disaggregation to work, it needs to be measured in feet," he said. That is easier said than done. "It all has to do with physics: the farther it is, the slower it is."
One emerging interconnect technology commonly associated with disaggregation is silicon photonics, which has three major advantages over interconnect technologies: performance, weight and distance, said Jay Kyathsandra, Intel senior product marketing manager in its Cloud Platforms Group, which is developing the technology. Silicon photonics supports data transmissions of up to 1.3Tb per second, weighs about a third as much as copper cables and can extend to 300 meters, according to the specifications.
But silicon photonics isn't necessarily synonymous with disaggregation, said Kyathsandra. "The actual implementation will be a function of what the original equipment manufacturer/original design manufacturer wants to do," he said. If your system does not require the kinds of speeds it provides, "silicon photonics may not be a part of [the ultimate implementation]."
Systems designers must weigh the advantages of silicon photonics over cost, said Microsoft's Neil. Consider the tens of dollars it costs to connect a hard drive to a system locally versus the hundreds or even thousands of dollars it costs to implement network-attached storage or storage area network. In a disaggregated system, it might be physically possible to separate memory and CPU at long distances, but an expensive interconnect technology will eat into the potential utilization benefits and derail the whole plan.
Systems designers might find that emerging Ethernet standards provide sufficient performance and low enough latency to support disaggregation. Several systems designed for hyperscale systems rely on 10GbE. When grouped with Quad Small Form-factor Pluggable transceivers, Ethernet can go to 40Gb, said Kevin Deierling, vice president of marketing at Mellanox, a supplier of high-performance Ethernet and InfiniBand interconnect technology.
Meanwhile, work is underway on 25GbE, he said. That gets you to 100Gb -- the speed of many InfiniBand fabrics required in HPC environments.
Whatever happens, systems vendors aren't waiting for interconnect technologies to be fully baked. HP's Moonshot chassis, for example, is currently equipped with several fabrics that will connect future disaggregated system components, said Dr. Bradicich. One Ethernet fabric is used today, the second "proximal array" fabric was revealed as part of the 64-bit ARM-based m400 servers announcement and the third, as-of-yet unutilized fabric called the 2D Torus Mesh, allows any one Moonshot component to communicate directly with any of its neighbors, Bradicich explained.
"The highways are laid down and paved. There are just no cars on it yet."
Running the show
To enable disaggregation, the software required to provision and deprovision system resources must be modified.
Intel's overall disaggregation initiative is known as Rack Scale Architecture. On the hardware side, that includes its work in the silicon photonics optical interconnect, as well as a programmable network switch due out in 2015. A pod management framework communicates with the system components via hardware application programming interfaces, and provides a policy-based orchestration framework to assemble and disassemble systems from the resource pool, said Intel's Kyathsandra.
Intel demonstrated the system within a single rack, and will share the results with the Open Compute Project for hardware designs and OpenStack for cloud orchestration, to encourage openness and adoption. "You should be able to have different racks from different vendors, and get the same hardware-level information from each rack," Kyathsandra said. The goal is for different orchestration layers to work together seamlessly. "It shouldn't matter whether you are using VMware or OpenStack," he said.
Disaggregate in the face of danger
Disaggregation presents challenges for software development, management and operating systems vendors too. "How do you reason across these pools of resources?" said Microsoft's Neil.
Microsoft's strategy is to innovate in Azure, and then push that work back out via the Open Compute Project for hardware, and Windows Server and Systems Center for management. Microsoft also made a version of its Azure system, the Microsoft Cloud Platform System, based on commercially available Dell hardware.
"Our general goal is to take innovations that we've done in Azure, and drive [them] out toward broader industry adoption," Neil said. Disaggregated hardware and software designs will likely find their way to Open Compute Project and the market, he said.
Disaggregated system designs may enter data centers much faster than may seem possible, said Mellanox's Deierling.
"[It] will trickle down to the enterprise faster than people realize," he said. Hyperscale vendors have a lot of engineers who are already doing this work, and it's not a big leap for them to productize the work that they have been doing internally for public consumption. "They're saying 'hey, we've already done the heavy lifting. Let's bring this to the enterprise.'"
Bin and box packing
With north of a million servers, Microsoft spends more than $2bil per year on cloud infrastructure. At that kind of scale, disaggregation has the potential to save Microsoft big money by solving the bin-and-box packing problem: How to build systems ("bins") that can take the maximum number of workloads ("boxes"), Neil said.
"Let's say I have a bin that is four feet long," Neil said. "I put two two-foot boxes in it, and the bin is filled. But what if I want to put in a three foot box, and a 1.5 foot box? It doesn't work."
The bin packing problem is an algorithm that aims to pack objects of different volumes into the smallest number of bins possible. There's a range of packing and shipping applications for this problem, as well as other iterations, such as planning file backups. Bin packing also includes the idea that some items can share space -- and therefore resources -- when packed into a bin, like virtual machines in a server.