This content is part of the Essential Guide: Guide to managing data center costs and the IT budget

Essential Guide

Browse Sections

Must-know x86 processor features of 2015

Processor design philosophies are changing, and new instruction sets are adding speed and security so long as the software on top knows to take advantage.

The modern x86 server processor has shifted from ubiquitous brute strength to specialized efficiency for tomorrow's workloads.

Servers scale out via massively parallel computing, or scale up by integrating complete systems on the same chip. Improved chip fabrication techniques and new instruction sets mean data centers benefit from new server features and functionality.

Matching systems with workloads complicates server processor selection for IT planners. These x86 processors of 2015 will inform your next technology refresh.

The MIC processor architecture

Processor design philosophies have changed. Moore's law pushed general-purpose processors faster, relying on clock cycles. Smartphone processors pushed efficiency by tailoring the chip to specific tasks, stripping out unneeded transistors. Server processors like Intel's Atom family now adopt the smartphone approach, addressing task-specific roles. Reduced instruction set computing (RISC) processors like ARM's reference architecture Cortex-A15 also target efficiency for data centers.

With individually less-powerful processor cores, performance relies on scaled out, parallel processors, as in the HP Moonshot microserver. A single HP ProLiant m800 server cartridge used in the Moonshot chassis offers eight ARM cores per processor, and the chassis holds up to 45 cartridges. This suits Web servers and other scalable uses.

Intel's take on processors for massively parallel computing tasks is the many integrated core (MIC) architecture. Intel's MIC architecture matches the processor to the computing task for better performance with lower energy use. The 2015 MIC design promises over 50 cores on a single chip in the 22-nm Knights Landing (code name) Xeon Phi processor.

Intel's Phi coprocessor modules integrate MIC components onto existing servers via a PCI-e connection. This enables parallel computing with 61 cores and 16 GB of local memory.

The scope of applications capable of using this massively parallel computing architecture is still small. Consider adopting the Phi coprocessor card for scientific computing, modeling, financial analysis, simulation and other compute-intensive tasks.

SoC architectures

System-on-chip (SoC) processors integrate CPU, memory, timing, external interfaces and other components onto a single low-power device. SoCs provide a computer without a full, conventional motherboard.

Originally used in embedded systems, SoCs now appear as task-specific processors for storage or communication devices. One example is Intel's Quark X1000 SoC family. At only 2.2 watts, the 32-bit Quark X1021D includes 2 GB of memory, PCI-e 2.0 and USB 2.0 interfaces. As a bare-bones system to address specific tasks, Quark includes almost no special processor features like SpeedStep Technology or other enhancements. This is best exemplified in Intel's Atom processor slated for networking uses. The 64-bit Atom C2308 offers two cores with two threads each, 16 GB of memory, USB and PCI-e support, and includes the AES-NI instruction set for native AES data encryption, described below.

SoCs will become more powerful in 2015. Intel is releasing its D family for enterprise microservers and embedded systems. Intel D processors mix Xeon capabilities and Atom energy efficiency with up to eight cores, dual-channel DDR3 and DDR4 memory, virtualization support and reliability, availability and serviceability features along with instruction sets for floating point, multi-threading, and other extensions. The D family uses 20 to 45 watts (depending on the number of cores) per processor.

Instruction sets

The finer the chip fabrication process, the lower the latencies, and faster the processor can operate. Current chips are at 14 nm fabrication process -- a 5 nm process is expected by 2020.

This influences instruction sets. Each new instruction requires up to tens of thousands of new transistors, and each transistor requires energy and outputs heat. As chip generations switch to smaller fabrication lines, it enables new server processor capabilities with lower energy use.

We've seen instruction set extensions like streaming single-instruction multiple-data (SIMD) extensions 5 (known as SSE5) improve floating math and logic performance, and Intel's VT and AMD's V extensions support virtualization functions. As a new generation of 14 nm processors moves into production and deployment, noteworthy instruction set extensions will run on them.

Parallel processing techniques will crunch more mathematical operations with every clock cycle. Intel's Initial Many Core Instructions instruction set executes 16 single-precision -- eight double-precision -- operations in every clock cycle. The Fused Multiply-Add (FMA) instructions execute 32 single-precision floating point -- 16 double-precision floating point -- operations per clock cycle.

Advanced Vector Extensions (AVX) increase the SIMD register size to 256 bits, while supporting older SSE instructions and the software using them. AVX increases parallelism and supports floating-point math in graphics, science and finance uses. AVX2 expands most commands to 256 bits and supporting FMA. On Intel's Phi, AVX-512 extensions will expand most commands to 512 bits with a new prefix scheme, resulting in significant math and logic operation improvements -- once software is re-coded and re-compiled against the new processors.

Encryption imposes computing overhead, particularly on-the-fly. Advanced Encryption Standard-New Instructions (AES-NI) adds instructions to some Xeon processors to handle encryption in hardware. AES-NI adoption will grow in 2015.

Trusted Execution Technology (TXT) instructions lock workloads to the underlying hardware platforms to boost security and prevent malicious activities. Given the increasing attention to hacking and computer security, expect more aggressive TXT hardware functions implementation in future servers.

The transactional synchronization extensions (TSX), which adds hardware transactional memory support to load and store instructions granularly, should explode in 2015. TSX can speed up multi-threaded software operations.

Stephen J. Bigelow is a senior technology editor at TechTarget, covering data center and virtualization technologies. He acquired many CompTIA certifications in his more than two decades writing about the IT industry.

Next Steps

ARM processors are RISC based

X86 processors are CISC based


Why Wintel is winning

The state of the server hardware market

Dig Deeper on Server hardware strategy