Virtualization is creating a buzz in the IT industry today, but that buzz fizzles when virtualization on Linux clusters is mentioned. That's a shame, says Beowulf Project co-founder Donald Becker, because clusters offer a no-clutter virtualization option.
Linux clusters deserve more respect in virtualization and high availability, says Becker, and he explains why in this interview. He also offers some migration advice to SMP users switching to Linux clusters.
In addition to his Beowulf role, Becker is Scyld Software's founder and chief scientist. He has just joined SearchEnterpriseLinux.com's Ask the Expert team as an advisor and can answer your questions about Linux cluster and server issues.
What are the latest developments in Linux cluster technology?
Donald Becker: The hot development area for clusters right now is virtualization.
Two well-known traditional virtualization systems are VMWare and Xen. VMWare emulates a PC down to the hardware devices. This allows it to run almost any operating system as a process in the native OS.
However, one downside of virtualization is that it often implies, and is implemented with, significant overhead. As a result, VMWare runs with emulation overhead and no opportunity for optimization. Xen is a para-hypervisor, allowing multiple operating systems to run at once. It has reduced overhead by not doing a full emulation, but it requires some modifications to the host kernel.
How does virtualization on a Linux cluster differ from these approaches?
Becker: Both VMWare and Xen make the implicit assumption that they will be running multiple kernels and full installations. Running a cluster provides an opportunity not addressed by this type of virtualization: creating lightweight environments with only the elements required by the application. When the system services typically expected from a machine (i.e. login shell and housekeeping utilities such as 'cron') are provided by the master node, compute nodes only need to run the application so that you can scale up performance, which is the opposite of traditional virtualization systems.
Some IT directors have told me that they're using SMP systems for compute-intensive calculations. What are some of the challenges one of them might face in migrating to a Linux cluster?
Becker: The answer depends largely on the applications you are running.
The easiest type of application to put on a cluster is one that uses 'parametric execution.' With this model, multiple independent copies of an application are run on different data sets.
If the SMP system is used because it offers high throughput on single-threaded jobs, a Beowulf cluster will be an almost drop-in replacement. However, if the SMP system is being used with threaded programs that rely on shared memory for fine-grained communication, complicated restructuring of the application may be required.
One traditional case where SMP have been used is with transactional databases. This is a special case where fine-grained communication and locking are used on the SMP, but other ways to achieve the same result are used in a cluster so an analysis has to be done to determine if moving to a cluster is feasible.
Most applications fall in between these two extremes.
Can Linux clusters achieve high availability (HA), or are they inherently too complex?
Becker: Scalability does not by itself preclude high availability, so Linux clusters can achieve high availability. The term HA typically refers to virtually constant access to business critical applications and data. Levels of uptime are often benchmarked at five nines availability or 99.999% of the time, equivalent to less than one second of downtime per day.
The classic high availability cluster deployment includes two servers that are aware of one another's state and share a storage subsystem. If either server fails, the second server takes over all of the services for both. Highly available clusters achieve that level of reliability by using failover, load balancing, redundancy and other features to join two or more servers together to protect against both planned (i.e. administrative) and unplanned (i.e, faults) system outages and sharing storage devices.
By their sheer size, large-scale clusters are likely to encounter frequent failures and so require a design that handles failures. However, this design does not happen automatically and many tools on the market today do not address this situation at all. Carefully designed subsystems, such as some commercial grade clusters with strong management capabilities, can handle incremental scalability. This means they can scale both up and down and avoid cascading failures, which occur when one node crashes holding essential memory page or file system data, eventually causing other nodes to become inoperable.
What are some ways to achieve high availability in a Linux cluster?
Becker: In a cluster configuration, multiple servers in a common location are managed through a central point. This presents distinct advantages for high availability situations, as there are more 'backup; servers to go to. This allows for 24 x 7 availability, failover protection, centralized management of distributed applications, the ability to handle extremely large data sets, dynamic web publishing and disaster recovery.
It is critical to choose a method that ensures data integrity during downtime and restart. Unfortunately, not all alternatives improve reliability and can preserve data integrity, so it is critical to do your homework and choose your implementation wisely.
Some non-commercial cluster systems create a partial single system illusion by requiring network virtual memory or a consistent global file system, or implementing transparent process migration. However, these designs handle failure poorly because if any of the nodes fail the system must go through a time consuming lock recovery process or even kill all processes related to the failed machine.
A better approach is to ensure the cluster's master node is running. For larger installations insure that a master is running with conventional cold/warm/hot spare and fail-over, or an innovative simultaneous multiple master approach. Compute nodes, on the other hand, may join and leave the cluster without impacting the underlying system, although the application running on them is lost.