Data center networking architecture draws on machine learning, SDN, AI
A comprehensive collection of articles, videos and more, hand-picked by our editors
In the not-too-distant past, traffic forwarding within the data center was simple. One IP address would talk to another IP address. The addresses belonged to endpoints -- bare-metal hosts or virtual machines talking to other bare-metal hosts or virtual machines. The path between those IP addresses was known to the data center switches as entries in the routing and bridging tables.
If an engineer needed to troubleshoot poor performance or odd behavior between two IP endpoints, a good starting point was constructing the path between the two by looking at those tables. Equal-cost multipath and multichassis link aggregation added complexity to this process, but on the whole, operators could find out exactly which path any given data center conversation traversed.
There was little to complicate traffic flows between endpoints. Network-address translation, encryption or tunneling were rarely present. Those sorts of functions tended to be located at the data center edge, communicating with devices outside the trusted perimeter.
Times were simple because the needs were simple.
The modern data center
A modern data center networking architecture looks different as business needs have morphed. The once relatively simple data center is now a unified infrastructure platform on which applications run. The data center runs as a whole; it's an engine for application delivery.
Increasingly, infrastructure is transparent to developers and their applications. A thoroughly modern infrastructure is an abstraction upon which developers lay their applications. Pools of resources are allocated on demand, and the developer doesn't have to worry about the infrastructure. Rather, the infrastructure just works.
The modern data center also handles security in a distributed way that coordinates with the dynamic standing up and tearing down of workloads. No longer does traffic have to be pushed through a central, physical firewall to enforce a security policy. Rather, a central security policy is constructed, and a security manager installs the relevant parts of that policy onto the affected hosts, VMs or containers. There is no infrastructure chokepoint and no arcane routing requirements to enforce such a policy.
At a high level, we've been describing private cloud architecture. Abstracting physical infrastructure in this way allows for a simpler collaboration with the public cloud. Thus, hybrid cloud architectures are growing in popularity, with the expectation that public cloud workloads have the same security and connectivity as private cloud workloads.
With hybrid cloud architectures becoming the new normal, it's important to note the impact these trends have on networking. No longer is the data center as simple as one IP address talking to another, with routing and bridging tables a consultation away when there's trouble.
The infrastructure mechanisms that deliver modern data center flexibility rely on complex networking. Driving this complexity is the need for workload segregation, service-policy enforcement and security. Thus, rather than a sea of IP addresses, the modern data center looks more like a layer cake.
At the bottom of our layer cake is the underlay network. This network is the basis on which all other network services will ride. This is also the network that looks the most familiar to the average network engineer. When they peer into their routing and bridging tables, they are seeing the underlay network -- the data center foundation.
The underlay by itself, however, can't provide everything that the hybrid cloud needs. One growing requirement is segregation, referred to as multi-tenancy. A tenant could be an application, a business unit or a customer.
A tenant's traffic is segregated from other traffic through Virtual Extensible LAN (VXLAN) encapsulation technology. Traffic from one segment is encapsulated in a VXLAN packet, delivered in this wrapper across the network and decapsulated on the other side. VXLAN is a second layer -- an overlay -- on top of our base underlay.
Not only does it provide segregation of traffic, but VXLAN can also be used to route traffic via a specific path across the network. Let's say the data center needs to forward traffic through a specific firewall and load balancer. In a modern network, firewalls and load balancers are likely to exist as virtualized network functions, residing potentially anywhere in the data center. To route traffic exactly where it needs to go, VXLAN encapsulation can be used to tunnel traffic flows from device to device until they have traversed all required devices.
Firewall rules form another layer in our overlay and underlay cake. A central policy manager inserts firewall rules host by host. Each host ends up with its own set of rules that govern forwarding into and out of the device. Known as microsegmentation, this is a practical way to ensure security in a scalable data center.
A wildcard that adds yet more networking complexity is the container. Container networking is a nascent technology, governed by namespaces, proxy servers and network-address translation to enable containers to communicate with each other as well as the outside work -- yet another layer.
Trouble for operators
The complexity that comes with a modern data center networking architecture is a potential issue for operators. Most networking issues are tied to connectivity or performance. Two endpoints that should be able to connect but cannot is one sort of problem. Two endpoints that connect but aren't communicating as quickly as expected is a different problem.
Troubleshoot a connectivity problem with the packet walk method. From one network device to another, follow the path that a packet would take to arrive at its destination. When the actual IP endpoints are known, this is straightforward.
In the modern data center, the underlay is used to transmit VXLAN or other overlay packets. On top of that, we add firewall rules and then perhaps network-address translation or proxy services; a packet walk becomes more difficult and fraught with nuance. To diagnose a connectivity issue, an operator needs to know the source and destination of the packet -- including container, virtual machine or bare-metal host, the firewall policies governing that packet, packet encapsulation and the service chain to be followed.
Assuming the operator understands the application flow, and works in a flat, silo-free IT organization, this isn't so bad. Still, it is not easy. Looking up media access control and IP addresses in bridging and routing tables is only one small part of a more elaborate troubleshooting process. Add the fact that modern infrastructure is often ephemeral, and operators can be troubleshooting issues that happened in the past and can't be reconstructed.
Performance challenges are even harder to diagnose. The sheer number of network devices touching a given conversation likely involves a virtual operating system, a hypervisor soft switch, a virtual firewall, a top-of-rack switch, a spine switch and then the reverse all the way to the other endpoint.
When some workloads are in the public cloud, matters become more complex. Putting infrastructure or platform as a service in the equation means adding high latency and additional tunneling to our troubleshooting equation.
We're stuck with IP. And since we're stuck with IP while at the same time needing additional functionality, overlays are here to stay. Overlays give us the ability to steer and segregate traffic, and that functionality is important. With it, we can treat our infrastructure as pools of resources, adding and subtracting capacity at will. The issue then becomes one of managing the network complexity we've added to our environments.
The networking industry has taken on this challenge in a couple of ways. The first is acceptance. If we agree that the complexity is here to stay, then we'll provide tools that allow us to discover or visualize what's happening on the network. For example, Cisco provides enhanced tools for operators to troubleshoot end-to-end connectivity issues on its Application Centric Infrastructure platform. VMware recently bought Arkin, a visualization tool that correlates workloads with firewall policy and VXLAN segmentation in a GUI paired with a natural language search engine.
Effective troubleshooting and visualization tools are, increasingly, strong points in modern data center platforms. However, some people have reacted against the complexity by creating forwarding schemes that eschew overlays if at all possible.
For instance, the Romana.io open source project relies on a hierarchical IP addressing scheme combined with host-based firewall rules to create segmentation and a central security policy. The open source Project Calico is similar. Romana.io and Project Calico are both interesting in that they offer forwarding schemes that scale to large data centers while still handling security and segmentation requirements -- and they do it without an overlay.
Perhaps the biggest question isn't about how to handle network complexity, but is about the humans supporting the solution. There's a thought out there that automation will allow IT staff to be thinned. As a 20-year IT infrastructure veteran, I don't see it that way. With great complexity comes a great support requirement. Organizations won't want to be on hold with their vendors when the magic goes sideways. They'll want to have pros who know the system at the ready to fix what's broken.
What's the next step for network evolution?
How to plan your data center network
What virtualization means for data center networking