Administering large Linux scale-out clusters like Web farms can be intimidating and time consuming. But it doesn't...
have to be.
Egan Ford, Linux cluster architect for advanced technical support with Armonk, N.Y.-based IBM Corp., said a systems administrator with a proactive, inquisitive attitude can run big Linux clusters with confidence in the results, provided the system is well designed and tested and he/she takes advantage of the excellent open source tools available.
"Work smarter, not harder," is the main theme in "Lazy Linux: 11 secrets for lazy cluster admins," which Ford co-authored with IBM colleague Vallard Benincosa based on a decade of building Web and HPC clusters for IBM. "We're all engineers and we know how to solve problems. A lot of it is just common sense," Ford said.
The key to success is an upfront investment of time and learning that ultimately will result in a reliable, automated Linux cluster that will require less maintenance and manual intervention later on, the report advises.
"The point [of the Lazy Linux paper] is to reach out to someone with small-scale cluster experience now working on a large system and help them take the right steps to be sure that it actually will work," Ford said. "The probability of failure is much larger in a big system. Everything from design to testing has to be planned," and the operation must be tested against benchmarks, he said.
Administrator role is key
According to the Lazy Linux whitepaper, a successful Linux cluster administrator must have not only a combination of networking, operating systems and architecture skills but a passion for open source, using it wherever possible. The best admins are constantly researching Linux sites for information to avoid wasting time on a problem that has already been solved. And when they do develop a cool shortcut or remedy, they contribute their results back to the open source community.
Top admins also automate everything possible with command line scripts, which reduces typing and provides documentation for future reconstruction. For example, an operating system image, whether stateless or stateful, is simple to recreate if scripts have been written to document each step in the process. Another key step in automation is creating parallel commands and procedures so that all instructions to nodes are implemented with a single keystroke.
To optimize communication and minimize work for the cluster staff, administrators should create a cluster wiki that is up-to-date, easy to edit and centrally accessible rather than store data in static Word or Excel documents. The wiki should include data such as a cluster inventory, a log of changes, links to open support tickets and documentation for common tasks.
In addition, admins should prevent direct user access to the system because performance problems often are caused by competing user demands for the same CPU rather than deficiencies with the machine. All job requests instead should be routed through a queuing scheduler which assigns tasks among the nodes.
Design Linux clusters for scale
The white paper advises that beyond good administrative practices, a well-run Linux cluster must be designed for scale at the get-go. This means maximizing automation and minimizing management.
"Management is the No. 1 problem," Ford said. "If you can't maintain the system, you will waste efficiency and money and you won't meet your business goals."
A cluster must be structured with the flexibility to deploy, provision or alter thousands or tens of thousands of nodes quickly to meet changing business requirements, he said.
Bottlenecks are a huge problem for large, scale-out clusters, but they can be minimized with proper planning. For example, reliability will be improved by isolating the TFTP and DHCP provisioning servers on a separate network interface controller. Performance also will be boosted by dividing clusters into sub-clusters or scalable units, each with their own service nodes and broadcast domains. To avoid service degradation, designers should avoid overloading the network with more nodes than it can handle.
In addition, cluster admins should strongly advocate the addition of remote power controls and a remote viewing console for the detection and resolution of boot up problems from other locations.
Linux cluster use management and monitoring tools
Finally, Linux cluster admins should take advantage of the many excellent open source management tools that have been developed over the last decade to automate tasks, improve control and save time. Before the tools existed, it took six IBM engineers two nights to assemble and provision a 34-node HPC cluster for the first LinuxWorld show, a task that a single person now can do a decade later in 10 minutes, Ford said.
The most popular open source cluster management tools include the OSCAR system imager, Rocks, Perceus and xCAT, a toolkit that performs imaging, kickstarts and stateless installs for nearly any Linux distro, Ford said.
Open source monitoring tools also are excellent but there is no single best solution for all cluster systems and sometimes a combination of downloadable and customized tools are needed to get the job done.
In addition, monitoring options sometimes require admins to choose between uptime or performance. For example, data collection monitoring agents can prevent the nodes from overload but, on the other hand, they sap memory, CPU and application performance. Typically, large cluster shops use Nagios for alerts and Ganglia for monitoring. Cacti, Zenoss and CluMon also are popular monitoring tools.
Final advice, trends
A final point: vendors often rush to get a system up and running and hand it over to the customer but each system should be tested against industry benchmarks for memory, CPU, disk and network components to ensure the cluster will yield consistent, accurate results.
Of all the administrative suggestions, Ford said the most important step is to design the system right in the first place, making it easy to manage. His favorite tool is xCAT, which he said is a best-of-breed tool for large-scale, automated hardware control.
"I can be sitting in a chair, discover new machines and replace others, check energy efficiency, and much more, all remotely, all in an automated, scalable fashion," Ford said.
Looking to the future, Ford foresees three cluster computing shifts: greater adoption of liquid cooling due to energy savings; increasing use of adaptive management, with the scheduler provisioning nodes on demand, and an accelerated move to stateless cluster computing.
"Just turn the system on and make one change to the central image," he said. "People will be moving more and more to stateless systems."
Let us know what you think about the story; email Pam Derringer, News Contributor .