Server architectures have stagnated over the last few years, due to the lack of competition on the CPU side and...
a complacency about designs that has settled over the IT industry. The next five years will be radically different.
We are about to see some changes in the enterprise server sector that will dramatically improve performance, possibly lower cost and certainly make for some rethinking of how we deploy commercial off-the-shelf gear.
Change is already underway for server technology. Solid-state drives (SSDs) and flash memory architectures overcame the major roadblocks to architectural innovation. With faster access, the server engine has to keep up. Peripheral Component Interconnect Express (PCIe) bandwidth is taxed, and operating system interrupt handlers no longer service the drives fast enough. Nothing is sacred today. Dynamic RAM (DRAM) is under pressure. Networking must see a major uptick in performance. Data centers are automatically orchestrating everything. The data center of 2020 will not resemble today's deployments.
Enterprise servers are about to evolve significantly. In the near term, handling faster SSDs means that PCIe 3.0 -- with 8 gigatransfers per second (GT/s) data transfer rate -- is now mainstream, and PCIe 4.0 -- transferring data at 16 GT/s -- is on the horizon. Even with these boosts, the DRAM volatile storage interface to persistent, nonvolatile storage is a limiting factor in server performance, due to the latencies involved in retrieving data.
There's a broad spectrum of answers to the DRAM problem. Next-gen servers will require faster and bigger DRAM. Server architectures will change to mount DRAM in a module with the CPU, allowing for tight electrical coupling and serial interfaces. This approach, championed by the Hybrid Memory Cube (HMC) Consortium, will emerge first in smartphones and then become a major new direction in servers in the second half of 2016, perhaps with proprietary variations.
HMC makes the DRAM interface serial, somewhat like the internal structure in flash products. This permits many more independent channels to talk to the memory chips, significantly boosting bandwidth. Initially, we should see around 384 gigabytes per second with perhaps even 1 terabyte per second within five years. Tight coupling saves power, which enables more memory in the average next-gen server, so expect terabytes to become the norm.
Beyond the x86 CPU-based server architecture
The Intel versus AMD battle for the server architecture roadmap is difficult to predict. One fork in the road comes from graphical processing units (GPUs) and accelerator cores. Parallel computing makes sense for a number of applications and AMD has access to GPU technology, so we will surely see accelerated processing unit-based server architectures in the future. ARM's partnership with GPU and system-on-chip maker NVIDIA Corporation could yield servers to address big data processing with a lot of GPU power. Intel is void of serious GPU capability, but will respond with more Xeon Phi parallel co-processing product designs and extended SIMD instructions.
Put these options together and throw in specialized accelerator functions, such as encryption and compression, and the result is purpose-built data center IT infrastructure products that target the local area network, wide area network or storage as well as big data applications.
All that DRAM capacity and bandwidth puts a huge strain on the link from DRAM to persistent memory within server architectures. Tightly coupled flash within the HMC architecture yields a much higher bandwidth to persistent storage, but it isn't clear this will be a flash-based system. Intel and other companies are working on resistive RAM (RRAM) and phase change memory designs that would supersede flash in this role, with access latencies in the nanosecond range. It's a good bet that these will be mainstream in five years.
With more DRAM and truly fast persistent storage, the CPU core count of server chips is bound to increase, by a big factor. Future servers will reach a much greater performance than today's architectures can achieve. Combine amplified CPU numbers with the migration to Docker containers and the equivalent of a 10 times growth in instances per server is possible. Next-gen servers will be bigger and the CPU makers will influence a larger portion of the server's cost. The data center roadmap should change to require fewer servers as a result.
And say goodbye to the keyboard, mouse and video method of interfacing with servers. This might be radical for some admins who aren't accustomed to server automation and orchestration.
From next-gen servers to a next-gen data center
This intensified server performance will modify how enterprise data centers approach storage and networking. Persistent RRAM in the CPU module changes the traffic patterns to storage in radical ways. The pressure for super performance to local SSDs will lessen, but RRAM becomes a shareable resource for the server cluster.
It's time for faster and lower-latency networking to connect these revved up resources on the data center roadmap. We can expect 25 Gigabit Ethernet (GbE) to replace 10GbE within a year. By 2017, 40GbE will be mainstream, with prices dropping for 100GbE to replace it. Software-defined networking will change switch architectures, reducing latency and automating the networking process. RDMA over Converged Ethernet is likely to beat out InfiniBand by 2020 as well.
If one believes the marketers, hyper-converged infrastructure will be a mainstream server architecture, encompassing storage and compute within the same box, and handling network data services. This construction fits well with data sharing in both RRAM and fast SSD formats. Bulk storage will still exist, with slower SSD or hard drives in most cases, though even this type of appliance may resemble a hyper-converged server. Most likely, fiber-channel will deprecate, with data centers favoring Ethernet storage. This simplifies and shrinks the connection profile of the server.
With fast RRAM storage, a class of servers with no disk slots or PCIe cards spaces will surface. These could be HMC module sized cards plugging into a backplane, for example, allowing very high density. This could give rise to storage-focused HMC modules with many terabytes of storage onboard. The interconnect performance could be extremely high, using photonic switching for example.
How to resolve modern storage bottlenecks
Make data center networking as fast as possible
Are your servers more Prius or SUV?