Sergey Nivens - Fotolia
With the advent of virtualization, server memory -- and server memory performance -- emerged as a critical computing resource.
Use, features and configuration options are evolving, so data center hardware managers should get to know today's server memory types as well as they know compute.
Protect server reliability
Memory holds the image and data for every virtual machine, so memory reliability proves essential for enterprise servers. If memory fails, it can cause corruption on the VM in that space, resulting in data loss or even a more dramatic server fault. Several server memory types help contend with memory defects.
Error correction code (ECC) is an established technique to locate and correct errors in memory contents. ECC applies a mathematical algorithm to some amount of data in memory, such as a single 64-bit memory address, and then calculates a code for that data, adding the code into a reserved memory space. When the server reads memory contents, it calculates an ECC and compares it to the ECC in memory. If the two match, they are assumed valid. If not, the ECC algorithm can determine which bit is wrong and change it. ECC detects -- though it cannot fix -- two-bit errors.
Advanced ECC expands the ECC mechanism by distributing memory reads across multiple independent ECC devices, rather than a single one. Advanced ECC detects and corrects single- and double-bit errors and detects memory device failures.
Single device error correction (SDDC) uses a mix of ECC architectures to detect and correct multi-bit errors up to four bits; the technology also identifies and shuts down failed memory chips on a dual in-line memory module (DIMM). SDDC can remove a failed chip or the entire memory module from the server's memory map, allowing the server to recover its memory contents to a spare module. High-end server manufacturers use trademarked variations of this to identify, shutdown and recover memory faults, such as IBM's Chipkill, Hewlett Packard Enterprise's Advanced ECC and Chipspare, or Intel-based Lockstep memory.
Some server memory types preserve memory integrity at the expense of performance. A server configured to implement a high-reliability profile attempts to mitigate faults associated with bus frequencies (speed), temperatures, voltage levels and memory refresh rates. The server lowers frequencies and voltages, which reduces stress on the memory components, waste heat and failure rates.
If you're refreshing data center servers, you'll find more server memory options available now that use the memory module's serial presence detect (SPD) space to record the number and location of correctable memory errors on each module. SPD tracks error rates and looks for modules that might experience a dramatic increase in correctable errors. Technicians can implement preemptive action, such as memory sparing or workload migration to other servers, and then replace the problematic DIMM. A similar technique, memory page retire, tracks recoverable memory faults to memory pages, or regions. Once correctable errors become excessive, the system retires the problematic page and blocks it until the afflicted module is replaced.
Server memory's role in performance
Data center hardware buyers should pursue DIMMs with identical rank, capacity and speed in every channel to achieve best memory and system performance. With DIMMs of differing capacity, ensure all DIMMs accommodate the same rank and speed, and that all channels use the same mix of sizes. Every populated channel should have a logically identical DIMM installed in the same locations.
To appreciate server-grade memory configuration, it's best to understand memory geometry and characteristics. Ranks basically refer to the organization of the DIMM's constituent memory chips and how they are accessed at the hardware level. For example, a DIMM with eight 8-bit chips has one rank, and a DIMM with 8 chips on both sides has two ranks.
Memory module capacity is directly related to the constituent memory chips on the module. Capacity is usually noted as chip depth by chip width by ranks. For example, a DIMM with four ranks of 128 megabits x 16-bit chips has a total memory capacity of 128 x 16 x 4 = 8,192 Mbits, or 1 gigabyte. DIMMs are organized into channels managed by the server's memory controllers.
DIMMs are also classified by speed: the clock frequency of the bus connecting the memory and processors. The latest enterprise-class servers with DDR4 DIMMs hit memory speeds of 1,866 to 2,133 millions of data transfers per second (MT/s), while older systems with DDR3 DIMMs might reach only reach 1,600 MT/s or 1,333 MT/s.
The server's processor must support the desired memory frequency. Older or less-expensive server memory types might limit memory operation to a slower frequency, hurting performance.
More isn't always better
Adding server memory capacity can be as simple as adding more DIMMs, but too many DIMMs can impair memory performance by reducing frequency. For example, a server operating with two DIMMs per channel hits 2,133 MT/s speeds, but drops to 1,866 MT/s when fitted with a third DIMM to the channels. Use fewer DIMMs with larger capacities. Load-reduced DIMMs (LRDIMMs) offer maximum capacity and performance.
Select a resilient memory alternative to advanced ECC if the server supports it. Advanced ECC ties multiple memory channel host controllers together to support SDDC for large data width (x8) memory chips. In some cases, it leaves some channels unavailable, and they cannot be populated. The interaction of multiple memory controllers can also hamper memory performance. Servers such as the Dell PowerEdge R710 offer an alternative optimized mode to run all memory channels and memory controllers independently, but this may prohibit DIMMs with memory geometries larger than x4.
To advance data center technology and support more simultaneous virtual machines, server buyers need to understand these ways to boost the performance of memory devices.
Learn how to optimize memory paging with these simple tricks
Explore emerging server memory types, along with their pros and cons
Take a deeper look at memory ranks, channels and types