Virtualization has been the hot technology and certainly the killer app-enabler of the last several years, building on readily-available open source technology (e.g., Linux and Xen) with adoption marching in step with commercialization of commodity server and blade server hardware. Virtualization technologies for servers, desktops, embedded and mobile devices have, for the most part moved from exotic to mainstream. The broad range of use cases, includes hardware consolidation, legacy migration, trusted computing and highly-available systems.
An area that has yet to embrace virtualization is high-performance computing (HPC). Today's plummeting hardware costs bring "supercomputer" capabilities into SMB server-rooms and onto the engineering desktop. But HPC systems deployers remain leery of hypervisor technology. HPC systems and HPC-capable commodity hardware today account for nearly a fifth of the $55 billion worldwide server market (IDC), but represent only a blip in the growing virtualization software market.
In the first part of this article, I describe the benefits and capabilities of HPC using virtualization. The second part covers some critiques and challenges with virtualization in HPC.
Parallel computing and clustering platforms
Legacy HPC systems boasted arrays of proprietary parallel vector processors that hosted specialized operating systems or versions of UNIX customized for HPC compute loads. Today, parallel computing systems and the processors that power them are increasingly based on "off the shelf" server-class microprocessors, such as IBM's Power Systems, AMD's Opteron or Intel's Xeon, and run stock or customized version of open source Linux.
Indeed, most modern supercomputers comprise highly-tuned clusters of commodity hardware combined with specialized custom interconnects. Loosely-coupled systems rely on gigabit (or faster) Ethernet and shared backplane hardware (e.g., PCI-X, fibrechannel, etc.) overlaid with software abstractions like PVM and MPI. Single-chip multi-core implementations and tightly coupled multiprocessing communicate over local bus and high-speed shared-memory interfaces for both SMP (symmetric multiprocessing) and "massively parallel" architectures that distribute loads with OpenMP and other paradigms. Even less specialized commodity-based "supercomputers" build on clustered white box PCs and open source software like Beowulf, Warewulf and openMosix/CHAOS.
Virtualization applications for HPC
Despite concerns by HPC traditionalists about the impact of virtualization on performance, virtualization offers HPC users and administrators a range of concrete capabilities and benefits. These include:
Simplifying administration and provisioning: Booting, rebooting, provisioning and load-balancing at the systems and application level on traditional HPC platforms constitutes some of the super-human headaches that accompany super computing. The need to supply system images to hundreds or thousands of nodes, to support booting and rebooting without exhausting interconnect and memory bandwidth, and to monitor and maintain the health of those myriad nodes was and is no small feat. Virtualization and tools available for more mundane enterprise computing promise to ease the burden of HPC deployers by abstracting the particulars of supercomputer configurations and by aggregating multiple processing nodes into more manageable collections.
Supporting mixed HPC loads and migration: HPC systems gain their edge from hosting specialized and highly tuned system software and loads. That same edge can cut both ways – adding new functionality or running multiple mixed loads can degrade existing system performance and present a formidable obstacle when migrating HPC applications to lower-cost commodity hardware. By partitioning existing HPC platforms and abstracting underlying legacy dependencies, virtualization can help HPC deployers better utilize existing systems and ease migration (incremental or whole-hog).
Soft upgrades (sans forklift): Traditional vector processing platforms deployed heady mixes of exotic hardware and proprietary software to support it. Such specialized hardware was and is costly to acquire and maintain, and offered few (if any) upgrade paths. Moving to newer systems, for either hardware or software, usually entailed rolling trucks and forklifts to swap out heavy iron. Virtualization both eases the migration from specialized hardware and software to commodity substitutes and provides a vehicle for incremental "soft" upgrades of both hardware and software as HPC loads shift to new virtual instances of HPC platforms.
Resource scaling: Virtualization can provide a boon to both capability and capacity HPC. For capability applications, virtualization allows resource-hungry applications to access arbitrary large amounts of memory and other resources by abstracting and aggregating those resources among cluster members. For capacity computing, virtualization allows more transparent allocation of appropriate resources to each HPC subunit assigned a problem set.
Limiting complexity of programming: Developers programming today's dual- and quad-core microprocessors are finding out what HPC application developers have known for generations: parallelized, multithreaded programming is not easy. While virtualization does not ease this particular programming burden, it does abstract away the additional layer of complexity imposed by clustered environments and the distributed programming models that are particular to them. Through aggregation, virtualization can make large and complex clusters of disparate machines appear as a single compute resource. Aggregated virtual systems are then ready to benefit from emerging multithreaded programming paradigms, with transparency comparable to multithreading on dual and quad core machines.
Debugging HPC applications: Virtualization can ease complex and time-consuming tasks in development and testing of HPC applications and systems. A hypervisor (virtual machine manager) can be configured to allow one virtual machine (VM) to monitor the state, interrupts and communications of another VM, for debugging or performance analysis. Since this introspection code runs outside the VM being monitored, it can look at any path in the OS or application, without the limitations that may occur when a debugger is inside, and shares hardware state with, the OS or application being debugged This may be especially beneficial when developing or using a specialized minimal OS for a specific HPC application.
ABOUT THE AUTHOR: Bill Weinberg brings two decades of open source, embedded and open systems, telecoms, and other technology experience to his role as Independent Analyst and Consultant at Linuxpundit.com. He also serves as General Manager of the Linux Phone Standards Forum (LiPS), and as the Mobile Linux Weatherman for the Linux Foundation.