Yes, you can build highly reliable and enterprise-ready Linux-based server clusters using free open source software components. Karl Kopper, author of the new book "The Linux Enterprise Cluster," published by No Starch Press, did. SearchEnterpriseLinux.com recently talked to Kopper to get some Linux cluster pointers.
Kopper has worked for nearly 20 years with distributed computing environments on many platforms, including Linux, Windows, Macintosh, and Unix. He's also contributed to the development of numerous Linux and open source projects, including Heartbeat, Ganglia, Linux Virtual Server (LVS), Kernel and Zope.
The inspiration for your book came from your experience of building an enterprise-ready Linux cluster. Can you tell me a little bit about what that process what like? What were some of the challenges you faced and how did you overcome them?
I had a very favorable experience with Linux on a single box. Then I was actually assigned to try to build a highly available Web environment using Linux. This led to my discovery of several open source packages, including the LVS and the Heartbeat package. I was really enthused about how useful these packages were. But at the same time it was a bit frustrating trying to thread all of the pieces together for different open source packages and different sources of documentation - in other words, the Web sites for the different projects - in order to build one system. My enthusiasm for the projects and lack of one document that combined them inspired me to write a book about it.
Can you describe those two open source packages you just mentioned and what they do?
Those are the two primary ones. Heartbeat is from the LinuxHA.org Web site and it's a high availability failover package that allows for computational resources to be made highly available by giving you the ability to fail them over from one box to another box. So, you eliminate single points of failure. That's a free software download.
There is a separate package called Linux Virtual Server that is now in the mainstream kernel that allows for one Linux server to appear or to load balance requests across several cluster nodes. It load balances at the connection level. And it can balance them across multiple nodes.
What were some of the other open source packages or projects that helped you in your endeavor?
There are several other packages that I'm pretty enthusiastic about. Each one fills a niche. And of course there are a variety of projects out there that fill a variety of different niches when you talk about clustering. There are a lot of different roads you can go down.
Ganglia is another interesting software package out of the University of California at Berkeley for monitoring systems and it provides a graphical Web-based interface that allows you to look at real time information about cluster nodes and identify hot spots. It actually was designed for monitoring clusters of clusters, or computational grids, so it can definitely do the job of monitoring just one cluster.
Of course, in the stock version of the kernel, there are tables that allow for sophisticated packet handling rules such as marking the packets while they're inside the kernel and filtering out what can and can't be allowed into the system.
With Heartbeat and Linux virtual server there was kind of a gap between high availability as a failing over resource and clustered load balancing as addressed by LVS that was the ability to create the load balancing capability as a resource. And a package called ldirectord is available that fills that niche very nicely.
Your book also talks about the best ways to deploy printers. Which open source package helped you with that?
The LPRng printing package is a very reliable printing package for Linux and I've had an extremely positive experience with it. It's a free package and lets you build a printing system using a centralized print spooler and the cluster nodes are simply print clients for that centralized print spooler.
All of the packaged you've mentioned are free open source downloads. How can folks go about supporting these deployments?
Support is definitely one of the questions that comes up. The good thing about the Linux enterprise cluster as I call it is the ability to support it with let's call it your legacy Unix skills set and then add on the specific knowledge about the specific packages that I've just described. So, if your organization has in-house Unix expertise, it's not throwing away your skill set and learning an entirely new paradigm. There is definitely a learning curve if there is no in-house Linux or Unix expertise.
As far as outside support goes, a phrase I've heard is 'proxying support through internal resources,' meaning that people with in-house expertise will guarantee and support the package using open source packages, Web sites, mailing lists and documentation and source code. For actual outside support, a lot of times you can pay for support. For instance, Dr. Patrick Powell, who wrote LPRng offers support for LPRng through a company that he works for called AStArt Technologies.