Intel's Haswell-based E7 version 3 processor competes with IBM's POWER8 systems for large, scale-up data center workloads, which used to be the sole domain of mainframes.
The E7 processors target traditionally mainframe workloads: Online transaction processing (OLTP), big data business intelligence (BI) and scientific simulations. These business-critical applications crunch a lot of data, require high I/O throughput and are difficult to segment across separate machines. E7-based servers also consolidate virtualized x86 infrastructures onto a smaller footprint.
The E7 v3 uses the same core and internal dual ring interconnect as the E5 v3, but handles different data center workloads. The E5 series appears primarily in two-socket, scale-out, cloud-native servers. Intel E7 v3's feature set supports scale-up, high-memory, mission-critical workloads. Servers based on the E7 v3 offer concentrated compute in primarily four- to eight-socket systems (the processor can go into systems with up to 32 sockets) with 6 to 12 TB of memory across 96 or 192 memory sockets, respectively.
Big workload options
Data center managers typically refresh four- to eight-socket business-critical systems every five years. The E7 v3 will potentially replace Intel Xeon 7400-based servers and IBM Power servers for these workloads. Intel estimates, for example, that one rack of E7 v3 servers will replace 10 racks of 7400-series Xeons circa-2010 for OLTP workloads.
Enterprises also compare Intel E7 v3 with IBM POWER8 servers. The high-end E7 v3 provides roughly equivalent performance to the POWER8 system, according to as-yet-unpublished Intel benchmarking, at about 10 times lower total cost of ownership, factoring in initial CapEx, power and cooling expenses, and software plus support licenses.
IBM optimized two POWER8 configurations for SAP HANA, which should compete well with a loaded E7 8000 series server. Benchmarks are not yet available from IBM for the systems, which use 24 cores and 1 TB of memory and 40 cores and 2 TB memory, respectively.
The E7 v3 processor core
Intel E7 v3 models sport up to 18 cores sharing 45 MB of last-level cache. This completes Intel's migration to the Haswell architecture. The processor adds features to improve memory performance, power management and I/O throughput. It also includes new transaction-related and crypto-acceleration features, better memory performance on DDR4, and system resiliency from Run Sure and MCA/machine check.
Software licenses, as for OLTP with high fees per core, sometimes cost much more than the underlying hardware. The E7s have segment-optimized processors, which trade off core count for CPU frequency and/or power budget.
Big, consolidated systems running big data applications will make use of a set of reliability, availability and serviceability features. Features include memory mirroring and sparing, recovery from parity errors with DDR4 memory and circuitry that allows firmware to intercept and handle corrected and uncorrected error events.
The E7 v3 enhances transaction throughput via additions and updates to Intel's transaction extensions (TSX) that speed multithreaded database applications using hardware lock elision. A base set of TSX functions were included with the E5 v3 processors, but later disabled due to unspecified bugs. The Haswell E7 fixes and improves the TSX feature set and enables fine-grain locking performance using coarse-grain code. This provides up to six times greater OLTP throughput on workloads such as SAP HANA.
Flash in the pan
Creative new flash memory system designs and processor interfaces may mitigate demand for Intel E7 v3 high-memory, scale-up systems.
In-memory databases offer the requisite performance for analytics workloads operating on large data sets -- at a hefty cost. A 1 TB system, using 16GB DDR4 server DIMMs that run about $200 each, boasts $13,000 of RAM. In contrast, a 960 GB enterprise solid-state drive costs less than $700. This 17:1 price difference is the driving force behind innovative new flash storage designs and interfaces.
IBM uses a high-speed, low-latency CAPI interface to its POWER processors, which makes flash look and perform like internal RAM. At the OpenPOWER Summit 2015, Redis Labs showed comparative results from a large NoSQL app: A system with 90% CAPI flash provided virtually identical performance (200,000 IOPS, sub-millisecond latency) to a 100% in-memory database. It is more than 70% cheaper.
At EMC World 2015, EMC demonstrated a DSSD rack-scale PCIe flash product executing Hadoop queries for a typical analytical app. On synthetic performance benchmarks, the flash array nearly matched native RAM speeds.
About the author:
Kurt is an engineer and technologist whose experience is both broad and deep, designing and building digital systems ranging from sub-micron transistors to Web-scale infrastructure. He now applies the knowledge and skills in R&D and IT architecture to analysis, consulting and communications.
IBM's roadmap for scale-up BI workloads relies on Power servers and the OpenPower consortium
Compare these processor roadmaps with AMD's