Though today's servers consume more power, compared with those from several years ago, they also offer a wider...
variety of performance options, says SearchOpenSource's server and cluster expert Don Becker. Figuring out how much power you'll need and considering alternatives like blade platforms are important to consider when upgrading your server system.
In this tip, Becker describes basic rules for making network boot servers more reliable -- like avoiding multicasts -- and explains the Mosix approach to cluster architecture. Becker, CEO of Scyld Software, also suggests ways to avoid data loss when consolidating servers.
How much of a factor is power usage when updating the hardware of an IT shop?
Don Becker: Today's servers do consume more power when compared to servers from several years ago. But the good news is that they provide greater performance options. First, you should confirm exactly how much power you have available in your server location. Is it a single 20Amp 120V circuit, a single 15Amp circuit or multiple circuits? The amperage can be found by looking at the circuit breaker in the breaker panel for that outlet. You should also find out what other outlets are on the same circuit. If your new servers do draw more power than the servers they replace, you may also want to consider the cooling or air conditioning for your servers.
Depending on the load on your current servers, one or two Altus 1U or 2U servers might be able to handle the load. You might also want to consider a blade platform like BladeRunner, which would provide similar CPU performance, but with lower power usage per system. You also gain some integrated management features, redundant power and networking options with BladeRunner.
What are the criteria for server consolidation in an IT shop?
Becker: When considering what services to consolidate, recognize that to maintain performance you will need as much memory in the new server as the sum of the servers it replaces. If your database has grown over time, you may also want to "upgrade" the amount of memory provided to that service.
Whenever you deploy a new server to replace an old one, be sure to thoroughly test the new server and configuration before putting it into production. Make backups of your production servers and services. Restore those backups onto your new servers and build a test network to verify that the new servers work correctly. Testing of the new configuration is the most important thing you can do to make the migration go smoothly.
After you complete your testing, you'll need to repeat the process to get a copy of the most recent data from your production servers during scheduled downtime for your changeover. Be sure to keep the old servers available for a couple of weeks or so in case you have to roll back the change due to some undiscovered fatal issue.
What is the Mosix approach to cluster architecture? What are the advantages and disadvantages to using it?
Becker: The Mosix system is based around "transparent process migration." Under Mosix, the operating system kernel identifies candidate processes to move to other cooperating machines. Likely processes are ones that are CPU intensive, without doing many system calls or I/O. The kernel migrates the process by making a copy of the process address space to the remote machine and continuing execution on the remote machine. When the process makes a system call, the parameters are passed by back to the original machine's kernel.
While it has the advantage of creating a unified processes space -- the user doesn't need to know if a process is local or remote to monitor and control it -- it can have considerable execution overhead, with normally fast system calls now requiring network transactions. The Mosix process enables migration of unmodified applications at any point during execution. However, because the remote process intimately depends on the originating machine, failure of either machine is fatal.
How can IT shops make their network boot servers more reliable?
Becker: Here are some basic tips for reducing your network boot server issues:
- Verify that the network isn't a source of problems. "Smart" Ethernet switches are more likely to cause problems than non-configurable switches. Likely problems are: Spanning Tree Protocol, which blocks broadcast traffic for 60 seconds after a link is enable to check for network loops.
- Broadcast packet rate limits, where the switch prevents broadcast packets "storms" from impacting other traffic. Some switches default to rates as low as only 16 packets per second.
- Watch out for duplex mismatches. Set network switches to autonegotiate, or leave at CSMA/CD "half duplex".
- Minimize the size of the image you serve over TFTP. This will reduce both the TFTP network traffic and the window of vulnerability.
- Avoid using multicast, except for service discovery.