Nightman1965 - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

MLAG installs flatten network without a forklift upgrade

Hierarchical networks cause undue latency as well as bandwidth and port loss. MLAG removes potential STP problems and gives you a flatter pre-fabric network.

The world is moving toward a flatter software-defined fabric network -- at least that's what vendors, media and analysts claim. In reality, the majority of data centers still use older-style hierarchical networks.

There's a reason why software-defined networks are the future -- the blocking mode of spanning tree protocol (STP) hierarchical networks causes undue latency in the data center, uses up active ports and strangles bandwidth.

The move to a fabric network may be too much too soon for your data center now, but there are intermediate steps like reorganization and multichassis link aggregation (MLAG) deployment that flatten the network as much as possible.

Latency climbing trees

A three-tier network in the data center has switches in each rack and at the top of each row, with core switches for the network backbone.

If a workload is contained in one rack, then the data traverses the in-rack switch to achieve minimal latency. If the workload operates across the whole row, then it must go up to the rack switch, to the row switch, down to another rack switch, to the device and back again. If the workload is virtualized across rows of servers, then data must journey up from rack to row to core, back down again, and repeat ad nauseam. This is not an effective way to deal with latency-critical workload traffic.

Complicating this latency is the blocking via STP networking design fault. STP automatically disables ports to prevent loops. For example, a switch needs to talk to two other switches: Switch A is connected to Switch B with data paths between ports. Switch C is brought into the mix -- if the network configuration is not done correctly, it could create a loop where A, B and C are all active simultaneously, causing data broadcast issues and a meltdown in network performance.

If another port fails, a blocked port can be automatically brought up with STP. However, in practice, STP in a large network leads to a large number of unused ports.

STP is also inept at dealing with any change to the network, requiring new calculations whenever a new port is introduced or a port fails.

A newer version of STP -- Rapid STP (RSTP) -- aims to minimize the time delay in rebuilding the tree. RSTP is a stopgap measure until companies can move totally to a fabric network and avoid STP completely. While generally accepted as an improvement on STP, it is still the same underlying technology.

Keep the equipment, lose the layout

Many organizations still use hierarchical networks with a version of STP. Is the only option to replace network equipment with new software-defined, fabric-based boxes?

Although a true fabric network is the best way of gaining optimized performance on a highly virtualized or cloud IT platform, there are a few things that data center administrators predominantly working with a physical or clustered model can do. Even those with a good degree of virtualization can improve performance by taking the network into account.

To start, review the location of your compute workloads. Those that regularly interact with each other and are latency-sensitive should be in the same rack when possible, or in the same row to minimize the hops required for data to move from one server to another.

Next, review your switch setup. Ensure that networking connectivity needs are met with a minimum number of switches, and also that the scheme provides high availability through adequate failover should a physical port fail.

Plot out every port and its connections. This facilitates a more structured cabling approach. Keep data and power cables separated to avoid crosstalk; label every RJ45 plug and socket so that engineers know exactly what is connected to what.

If anything changes, note and record the modifications. The best way to do this is via automated tools that sense when a cable is disconnected -- look to network management tools from Cisco, Juniper, IBM and CA Technologies, or preferably data center infrastructure management tools from Nlyte, Future Facilities, CA Technologies or Emerson Network Power.

MLAG to the rescue

Once you have a better optimized switch platform, optimize how it is used. This is where you minimize any possible problems from STP. Even with a high-availability network switch architecture -- each switch is connected to two other switches higher in the hierarchy -- up to 50% of the available bandwidth is lost supporting STP.

MLAG -- also referred to as MC-LAG, Etherchannel, port bonding, link bonding or multi-link trunking -- has been around for a long time, but is hardly used. MLAG allows for parallel paths across two physical ports to bond as a single virtual port, controlled through a single control plane in one physical switch. MLAG cannot create STP looping, so it does not block any physical port or use up any bandwidth.

MLAG is not a basic standard, and so depends on the control plane within a specific box. It can be difficult to get MLAG to work in a heterogeneous environment, but as the typical data center gets most of its network switches from one vendor, this won't be a major problem.

Dig Deeper on SDN and other network strategies

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

What will take SDN from future plan to practical, necessary upgrade?
In order for SDN to become generally practical, it is going to have to be simplified. We also need more real use cases.
Two factual errors:

1. MLAG specifically refers to terminating at least one leg of a standard LAG on a different piece of hardware than the rest. "MC-LAG" = "Multi-Chassis LAG".
All the other terms for which you claim MLAG equivalency refer to any generic LAG.

2. Both LAGs and MLAGs follow the LACP standard, formally known as IEEE 802.3ad. Etherchannel is the exception, that's a pre-standard Cisco technology.