In my tip Using virtualization to reinvent high-performance computing on Linux, I called out a range of applications and benefits that virtualization can bring to high-performance computing (HPC). But the question is, despite these compelling cases, why don't we see more pervasive use of virtualization in HPC?
Well, you may have heard this: There ain't no such thing as a free lunch. In some cases, virtualization technology may not (yet) meet legacy HPC requirements; in others, HPC systems providers and deployers are not comfortable departing from familiar (and expensive) technology acquisition paths and roadmaps.
This tip outlines a number of perceived roadblocks to leveraging virtualization and explains how virtualization can actually be a very good fit for HPC.
Virtualization overhead and increased latency
HPC architects have focused the greater part of their efforts on optimizing hardware to achieve maximum computing throughput. Investments included heavy parallelization of processing units and use of hierarchical memory design with high-speed interconnects to ensure maximum utilization of those CPUs. As such, defenders of traditional HPC attack virtualization for inserting a layer of abstraction between software loads and carefully tuned hardware. Virtualization, they claim, not only induces execution latency within and among parallel processors, and the delays can be highly variable or "jittery."
Advocates of modern virtualization technology respond that virtualization not only avoids many of these dreaded latency issues, it can actually enhance HPC performance. For example, virtualization facilitate use of specialized OSes, optimized for classes of HPC applications (e.g., Red Hat CHAOS or legacy mainframe OSes), A hypervisor can guarantee resource allocations to a VM with an HPC guest, dedicating a partition of physical memory or a percentage of CPU cycles, or guaranteeing a maximum latency to time-sensitive code (e.g., interrupt processing). Nodes of a virtual cluster can run concurrently, on multiple actual real nodes or using different processor cores of one or several physical nodes. Gang scheduling can allow a cluster-based HPC application, while running, to communicate between nodes in real time, as it would without virtualization on legacy HPC hardware.
The pain and pleasure of paravirtualization
With the advent of chip-level support technologies like Intel VT and AMD-V, virtualization is becoming increasingly transparent to enterprise applications and users. However, legacy HPC hardware does not benefit from these particular advances, nor does much in-place clustered commodity hardware running HPC loads. Without hardware support, hypervisors rely on paravirtualization, which is source-level or compile-time recoding of privileged and other instructions to implement virtual machine architectures.
Concerns about the efficiency of paravirtualization and the "expansion" of native instructions into hypervisor library calls foster the perception that virtualization negatively impacts HPC application performance. Moreover, worries about paravirtualization compound the general notion that virtualization introduces an additional software layer with accompanying overhead.
This overhead of virtualization in general and paravirtualization in particular has been the focus of industry-wide optimization efforts. In particular, hypervisors like Xen and VMware actually leverage paravirtualization to reduce performance impact, and can enhance performance via streamlining of inter-process communication on individual VMs and even more so for communication among guest HPC OSes and loads.
HPC acquisition model
Enterprise virtualization and adoption of commercial off-the-shelf (COTS) commodity hardware go hand-in-hand towards lower total costs. HPC, with its "need for speed" has traditionally fostered a boutique business, with single suppliers indulging deployers' capability and capacity needs through massive investments in specialized hardware. Such lock-in models obviate software-based scaling and limit migration to new generations of single-vendor legacy systems.
Moreover, HPC deployers seldom considered COTS-based acquisition paths and technologies like virtualization because space and time multiplexing provided by hardware virtualization provides no short-term benefit to HPC users. However, using virtualization to scale available compute resources to meet capacity computing needs and to apply available computing power to match capability computing demands is making virtualization sufficiently attractive to challenge this hardware-centric acquisition model.
De facto adoption: Grid computing and "the cloud"
In the current economic climate, traditional HPC increasingly outstrips the financial realities of even well-heeled deployers with government and university ties. In place of legacy "heavy iron" HPC implementation a menu of distributed options that go beyond local clustering of commodity hardware is emerging: grid computing and the "cloud." Grid computing entails broadly distributed, networked parallel processing, where compute loads are subdivided and dispatched to hundreds or even thousands of compute nodes. An examples of a successful open grid project is Berkeley open infrastructure for network computing (BOINC), whose infrastructure supports scientific loads as diverse as the Search for Extra-Terrestrial Intelligence (SETI), to simulations of global warming and immunology. The grid supporting SETI, for example, boasts over 580,000 nodes presenting the application with 1,394 teraflops.
Grid computing can be implemented with or without recourse to virtualization, but the huge surge in remotely-hosted cloud computing is highly dependent upon underlying virtualization platforms. Supporting cloud computing with virtualization can involve aggregation of commodity servers and/or partitioning of mainframe hardware, in some cases even of "super computers" (as IBM likes to term its cloud computing platforms). For HPC, both grid and cloud represent the end of climate-controlled server rooms eating into institutional capital and operational expenses, principally by outsourcing capability and capacity to virtualized massive commodity clusters and remote high-performance compute servers.
ABOUT THE AUTHOR: Bill Weinberg brings two decades of open source, embedded and open systems, telecoms, and other technology experience to his role as Independent Analyst and Consultant at Linuxpundit.com. He also serves as General Manager of the Linux Phone Standards Forum (LiPS), and as the Mobile Linux Weatherman for the Linux Foundation.