Essential Guide

Guide to managing data center costs and the IT budget

A comprehensive collection of articles, videos and more, hand-picked by our editors
Get started Bring yourself up to speed with our introductory content.

Resolve a network bottleneck with these techniques

Data center network performance problems will disappear with fixes from a simple reorganization through a strategic overhaul.

As IT adds I/O and communication-intensive apps to servers, the strain on data center networking increases dra...

stically.

Installations built around 1 GbE are no longer efficient, and in many instances, the replacement -- 10 GbE -- is also a network bottleneck.

There's more than one way to break a log jam of network traffic, from inexpensive and quick fixes to strategic capital investments and restructuring. New technology is set to boost networking capacity, and network fabric adoption has solved backbone performance bottlenecks with multi-lane Ethernet. In some cases, simple operational organization relieves congestion.

Storage with servers

Changes in data flow are low-cost and a quick fix to network bottlenecks. An example is the flow from networked storage to servers.

Google colocates networked storage nodes with multiple servers, and then arranges the apps to use data from the nearby storage. This approach allows you to add inexpensive ports to the in-rack switches, or even have two switches, which allows dual Ethernet ports on the nodes. It also makes it easy to have four or more ports on the storage nodes, removing a network bottleneck to allow data in and out. Almost all of the data flow in the rack is through top-of-rack switching -- at low latency -- and backbone traffic is dramatically reduced.

Databases

Databases are different. The most efficient models today use a large pool of dynamic random access memory (DRAM) dual in-line memory modules to create an in-memory database. Ideally, the IT organization buys a tailored new server fleet -- some of which can hold as much as 6 TB of DRAM -- although older servers work too.

An adjunct to the in-memory architecture is to include fast solid state drive (SSD) storage in the servers. This can be used as staging buffers for the DRAM or as a networked storage resource. Both approaches reduce network load, but 10 GbE networks are probably too slow to keep up with systems less than one year old, even with two ports per server.

Virtualization

Virtualization is common in x86 server clusters, bringing its own network bottleneck problems. Boot storms are infamous for saturating networks; even in steady state operation, creating instances adds a load, as gigabytes of data shift from networked storage to the server.

In this situation, transfer allegiance from traditional virtualization to the container model. This means giving up the flexibility to create an instance with any OS, but that's usually not an issue.

The container approach, which reduces network traffic, requires every instance in a server to use the same (container-supporting) OS. DRAM space saved from a single OS and app stack allows double the instance count, and startup is quick. However, if the apps in the instances are network- or I/O-intensive, the same stresses can occur.

Technical fixes of the future

Inter-switch connection using 40 GbE (quad-lane) links is common, and we are looking for 100 GbE as an alternative to 10 GbE. This trend is getting a boost from efforts to deploy 25 GbE links, which will allow four-lane, and relatively inexpensive, 100 GbE links for storage devices and inter-switch connection.

With 25 GbE, data centers use existing cabling in the rack and between switches. Unfortunately, you cannot retrofit adapters, instead using a PCIe card or a new node. Even so, replacing top-of-rack switches to create a 10/100 GbE environment as soon as economically possible will substantially boost overall cluster performance.

This new technology is rapidly moving into the market, reflecting the needs of cloud service providers. The projects to deliver 25 GbE links are generally less than 12 months old, and IEEE ratification is set for a record schedule. Production network interface cards and switches are expected in the second half of 2015.

There is also a 50 GbE dual-lane option in the pipeline. The faster speed could fit larger servers running in-memory big data analytics, probably with at least two links per server. With the trend for these servers and high-performance computing to have massive core count CPUs or GPUs, expect data starvation to be an issue, as will the time to load terabytes of memory with data.

Software-based fixes also address bottlenecks. Software-defined networking can spread the load of tasks on backbone lines across servers.

With storage and architecture performance growing rapidly, networking is going to be at the forefront of innovation for the next decade, so evolution should be rapid.

About the author:
Jim O'Reilly is a consultant focused on storage and cloud computing. He was vice president of engineering at Germane Systems, where he created ruggedized servers and storage for the U.S. submarine fleet. He has also held senior management positions at SGI/Rackable and Verari; was CEO at startups Scalant and CDS; headed operations at PC Brand and Metalithic; and led major divisions of Memorex-Telex and NCR, where his team developed the first SCSI ASIC, now in the Smithsonian.

Next Steps

Identify the storage bottlenecks that are strangling app performance

This was last published in March 2015

PRO+

Content

Find more PRO+ content and other member only offers, here.

Essential Guide

Guide to managing data center costs and the IT budget

Join the conversation

4 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What causes bottlenecks on your data center network?
Cancel
For us, the biggest issue is the parallelization of testing runs, and the occasional latency that renders runs as failed, only to have them pass without issue when we rerun the failed test cases. Since our CI server relies on the ability to complete these runs with a high level of accuracy the delays and latency can indeed be frustrating.
Cancel
Due to the very large size of the files exchanged daily within my enterprise, we see bottlenecks at the end point storage locales. We have found these to be caused due to the slow nature of the hard drives. We have made the switch to SSD endpoints and have noticed considerably less bottlenecking within the data center network within our company. Other causes seem to be overuse by mobile users accessing the network.
Cancel
If loading data or OS's over ethernet is an issue maybe another look at your data center is needed with better routes to the data.
Cancel

-ADS BY GOOGLE

SearchWindowsServer

SearchEnterpriseLinux

SearchServerVirtualization

SearchCloudComputing

Close