Being creative isn't always a good thing. Author Mark Sobell has seen Red Hat Linux administrators create some sticky situations for themselves and their companies. In this installment of his recent tips interview with SearchEnterpriseLinux.com, he describes common mistakes made by and important tasks neglected by admins. Sobell is author of the new Prentice Hall PTR book, A Practical Guide to Red Hat Linux (Second Edition): Fedora Core and Red Hat Enterprise Linux. In part two of this tips series, he explained how to corral Trojan horses and described some handy tools for backups. In the first installment, he covered user administration.
What's one of the most neglected tasks of system administration, and why?
Sobell: Backups. Backups are most neglected by admins of small systems because they are expensive to do right. DVDs are too small to back up even a workstation hard disk, and decent tape drives cost more than a small enterprise's server. Removable hard disks are cheap but fragile. The easiest solution is not to bother. However, I don't think this is a mistake that many people make twice.
By the way, while I'm working on a book, I use a cron script to backup my writing hourly to an offsite location using rsync over a network. I don't have to pay attention to it, except to check the output of the commands (which are mailed to me), and it has saved me many hours of work a few times.
In the creating problems section of your new Linux book, you talk about mistakes sys admins make. What are they, and how can those mistakes be avoided?
Sobell: Here are a few:
- Believing that uptime is important. Linux is very stable, but trying to prove it by keeping a server up for as long as possible does no one any good. This is a very common problem for new sys admins, particularly Linux sys admins. It is better to reboot a server every once in a while no one is using it than to have it down for a day while people need it. Applying a kernel or libc security fix may cost you your uptime, but not applying it could cost a lot more.
- Not testing upgrades. This is particularly common with security upgrades. For any production server, there should be another machine with exactly the same software configuration (and ideally the same hardware configuration) that patches are tested on before deployment.
- Not having a fall-back plan for an upgrade. This was recently highlighted by The Chicago Tribune's upgrade, which killed their ability to print newspapers. The admin had no contingency plan in case of failure. When doing any upgrade, run the new and old systems in parallel until you are sure that the new one works.
- Not testing that backups actually work. Periodically make sure that you can restore files from backups. You can be virtuous about doing backups, but if you cannot restore files when you need to, the backups are of little use. See Landmark.org for more about common backup mistakes.
- Believing in RAID. RAID is a useful tool that can greatly reduce the risk of data loss. But it is very easy to forget that RAID cannot restore deleted files. If you or a user deletes a file, RAID is of no use. A backup is the only way to restore a deleted file.