The flexibility of Linux gives IT managers a hand in managing servers, as does the many specifications and tools freely available for that operating system (OS). In this tip, server management experts Kamini Rupani and Steve Rokov offer advice on how to avoid mistakes in server management, managing Linux servers remotely, why Linux-based servers can be easier to manage than Windows ones and how to take advantage of server management specifications like (Intelligent Platform Management Interface) and Systems Management Architecture for Server Hardware.
Rupani and Rokov discussed Linux server management with SearchEnterpriseLinux.com during a preview interview for the LinuxWorld 2007 Conference and Expo. Rupani and Rokov are product management directors for serial console and embedded technology management products, respectively, for vendor Avocent of Huntsville, Ala., a LinuxWorld 2007 exhibitor.
What server management capabilities are unique to Linux, if any?
Kamini Rupani: Linux presents a number of options on the management front. You can manage a Linux server with an in-band management agent. You can use SSH (Secure Shell), telnet, serial console and connect with PPP (Point-to-Point protocol) through a console server, etc. You can use KVM, remote X, and VNC (virtual network computing). Unix also provides a powerful scripting environment for automating common IT tasks. Many of these options are extremely difficult or impossible to accomplish with a Windows server.
One of the biggest advantages of using Linux is the freedom to customize it to the needs of enterprise data center. IT can easily write scripts to automate their IT processes, which can be completely catered to individual enterprise data center requirements. By moving outside the scope of their traditional Windows-based environment and trying out something new, IT managers have a great deal to gain in efficiency.
What are the most common mistakes IT managers make when managing Linux-based servers?
Steve Rokov: One of the biggest mistakes IT managers can make is not using the pre-integrated agentless capabilities that standards like IPMI and SMASH provide. In fact, the Aberdeen Group, an industry research firm, stated that IPMI has become a checklist requirement for IT when evaluating infrastructure needs.
IPMI was created by the IPMI Forum back in 1998. It's an industry-wide management initiative that today has more than 180 vendors that work together to continually update and implement this open hardware management standard for servers and other systems such as storage, network and telecommunications equipment. An important characteristic of IPMI is that it is an open and flexible standard that can be supported across tower, pedestal, rack and blade servers -- irrespective of the hardware vendor or Linux build used. And by being pre-integrated within the server, it does not demand any extra management agent purchases - an approach frequently described as agentless.
In emergency and/or ad-hoc situations, administrators often need to interactively manage various servers using specific commands. The challenge however is that different vendors use different commands to do the same thing. Enter SMASH. Defined by the DMTF (Distributed Management Task Force), the SMASH working group released its first specification last year – the SMASH Command Line Protocol (CLP). SMASH CLP addresses IT administrator needs for a universal command line - enabling systems offered by different vendors to be managed with the same commands and scripts.
It's helpful to think of IPMI and SMASH as collaborating with each other. Even though IPMI offers a standardized message interface for developers and vendors inside the server, the server's command line interface often varies from vendor to vendor. Think of SMASH then as the scriptable command line interface (CLI) available out of the box, accessing the IPMI hardware management features in the box.
Could you give examples of how IPMI and SMASH help in the real IT world?
Rokov: There are three likely scenarios: a large cluster, a branch office, and a mixed rack of 1U servers with blades.
In clusters, how do you diagnose and power cycle the various servers within the cluster when the operating system has hung? By configuring IPMI thresholds within the servers, potential heat and power issues can be recorded and alerted to management consoles, ahead of meltdowns, providing time to fix the problem. Alternatively, power cycling can be achieved by executing the "Power Cycle" SMASH script against the IPMI firmware in the servers using a telnet or SSH2 session.
In a branch office, without local expertise or even personnel to keep an eye on systems, how do you fill the gap? By placing an appliance out at the branch, you can aggregate alerts and secure access to a single point. If an appliance supports SMASH, scripts can be run centrally, irrespective of the model or vendor of those branch servers. Opening a Telnet/SSH2 session to the SMASH appliance enables a health check to be run by pulling IPMI information. Additionally, an 'Inventory Scan' can be run using a SMASH script to identify whether changes have been made out in the field. •
In mixed racks, IPMI and SMASH don't care whether they are being applied to a blade, a motherboard, a plug-in card, or the blade's chassis manager. To the administrator and his scripts, it all looks the same. So what happens when the rack experiences an event? A Stream Console over LAN SMASH script opens multiple operating system console sessions and records what the operating system consoles were doing right before the failure.
What are the most common challenges in managing Linux servers remotely?
Rupani: The most common challenge is the same for managing any server, whether Linux, Unix or Windows, remotely, which is having a secure way to connect and control even when the server isn't operational. That's why it is important to build an out-of-band management strategy into your server management strategy. You can't assume that the operating systems (OS) will always perform as expected, so unless you build a system that allows you to remotely reload an OS, reboot or power cycle in a secure fashion, you are not spending time or money wisely.
What server management best practices have you seen succeed most often in your work with IT organizations?
Rupani: During your data center's design phase, design an out-of-band management strategy, not just an in-band strategy.
Free up valuable time by getting rid of mundane tasks that can be automated.
Consider virtualization: If you haven't yet, you should be, because power and cooling issues and increased costs are not going away, and virtualization is a good method for helping to solve those issues. Virtualization isn't going to go away either, so over time, more tools for leveraging virtualization will be developed. Now is a good time to begin exploring virtualization in a test environment to determine if and how it can be rolled into your data center strategy.