Buying a chassis server once meant you were buying a server that slotted into a chassis. That's no longer the case....
A chassis server commits its user to the physical, plug-in architecture of a particular server vendor. In recent years, vendors have broadened this scope to include network communication architecture.
Chassis computing lets network designers and architects think with respect to how servers communicate with one another and with other communication partners in the network. Today's chassis servers from Cisco Systems and IBM/Lenovo come bundled with each vendor's server networking architecture. Buy vendor X's server chassis and server blades, and you get vendor X's networking architecture -- like it or not.
A server blade has two communication requirements: intra-chassis communication -- with another system in the same chassis -- or extra-chassis communication -- with a system located in another chassis.
Servers conduct north/south traffic out of and into the chassis through a top-of-rack (ToR) switch, which essentially works like a traditional local-area network switch, but services a rack of servers. Conversely, east/west traffic goes between server blades in the same chassis.
The differences in network connections across vendors are hardly trivial. Cisco and IBM's choices in their architectural blueprints lead to dramatic differences in theoretical network performance over the architecture's lifetime, as well as practical differences in network performance of current-generation chassis servers.
All together now
Cisco UCS chassis servers handle both types of traffic in the same way, while IBM treats each traffic pattern differently.
Cisco's UCS switch network architecture is switch-centric, unsurprising for a company that grew in the switch industry. Like everything else, architecture is political and technical, and one can assume Cisco's switch group holds great sway over designs.
Any server that needs to communicate to any partner system via the network sends packets through the ToR switch. Whether a blade server needs to communicate with a client across the globe, a server in another chassis, or a server in the adjacent slot one inch away (up to 8 per chassis), the information travels north/south.
With IBM's Flex System design, network traffic between a server and a partner outside the chassis (north/south) travels through a ToR switch, similar to Cisco's architecture. However, in Flex Systems, switching capacity is built into the chassis for traffic that originates and terminates in a single chassis (east/west across up to 14 blade servers). For intra-chassis communication, no traffic leaves to transit the ToR switch.
Since the study of the IBM and Cisco server architectures upon which this article is based*, Lenovo acquired the IBM Flex System chassis server line.
Scalable server architecture
How a chassis server handles network communications has significant ramifications on bandwidth available for applications and, ultimately, system performance.
IBM commissioned Tolly to benchmark VMware vMotion (the migration of a running virtual machine from one physical host server to another) scenarios on the Cisco UCS servers and IBM's Flex System chassis.
To benchmark total network capacity to blades within the same chassis, the tests used Cisco's highest capacity switch available -- the 6296UP ToR switch -- with Cisco's fabric extender 2208XP FEX modules to communicate with the switch.
A single Cisco 2208 FEX module provides 80 Gbps of bandwidth between the fabric and ToR switch. A single chassis can run with two FEX modules maximum, enabling 160 Gbps between a UCS chassis and the ToR switch.
To benchmark the network capacity of the IBM chassis, Tolly engineers loaded up servers with traffic generation software that rang across servers in the same chassis. Tolly measured 438 Gbps of network traffic (with zero load on the ToR switch due to the chassis architecture). Theoretically, the IBM chassis server offers 2.7 times higher maximum capacity.
To test the real-world application impact of chassis server architecture, the engineers increased the background traffic and measured the time required to complete a VMware vMotion server migration task. The Flex System showed faster completion, including no competing traffic.
When background traffic increased to the maximum, modeling what an application might encounter in a busy data center, the differences were even more significant.
With Cisco, the maximum background traffic achieved to and from the Cisco UCS chassis was 90.3 Gbps; the IBM Flex System handled 438 Gbps. When run at these levels, IBM completed the VMware migration of nine virtual machines (one vApp tile) in 39 seconds, while Cisco required 99 seconds -- about 2.5 times more time.
Architecture matters in the real world, so understand the benefits and limitations of whatever systems you choose for your organization.
From a networking perspective, IBM's chassis design provides dramatically more bandwidth (and demonstrable performance benefits) for traffic that moves between colocation blades. If applications demand intra-chassis networking performance, then the choice is clear.
IBM's off-chassis networking is the same as Cisco's, in that traffic also passes through the ToR switch.
Of course, there are other reasons to choose one vendor over another that are unrelated to network scalability. If you are running a data center where the chassis servers' bandwidth limits are not stressed, then the potential benefits may be of less importance than vendor streamlining or other factors. For example, e-commerce transactions typically have low bandwidth demands. In a 100% Cisco shop running this type of application, bandwidth limitations are trivial compared to training or licensing costs.
About the author:
Kevin Tolly is founder of The Tolly Group, which provides third-party validation/testing services. Tolly is also the founder and CEO of Tolly Research, which provides research services to IT vendors and end-user companies.
* This article is based on a study, IBM Flex System Network Architecture: VM Migration and Aggregate Network Performance versus Cisco UCS, that IBM commissioned from The Tolly Group. See Tolly document 214104 for details.