The third in a series of fast references, "Disaster Recovery Basics" quickly reviews the basics of what you should have covered in case of a -- you guessed it -- disaster.
|Documents|| Have documentation on everything in your data center, in print and attatched to the machine.
Include what's running, who the users are, who's responsible for it, and contact info.
|"Load shedding" plan||Possibly color-code the machines for low, medium, and high priority to show what gets shut off first, and what gets brought back up first.|
|Batteries||Make sure you have good battery monitoring capabilities on your UPS system. Batteries are most likely to fail under stress, and you need to know if one is going bad.|
|Air conditioning||Make sure it can be maintained or brought back on-line if the power's out.
High-density blade servers may shut down quickly if cooling is lost.
If your system uses chilled water or Glycerol, consider a storage tank so cooling can be resumed with pumps and fans before the chillers are back in operation-can take ten minutes or more.
|Generator||Make sure you have a scheme for transferring loads to the generator, so it doesn't become unstable.
Let computer hardware run on the UPS until A/C is back on, then transfer the UPS load.
|Concurrent maintainability||Make sure every part of your data center is designed for concurrent maintainability, or so any air conditioner, pump, pipe loop, valve, chiller, UPS, power circuit, automatic transfer switch, etc. can be taken off-line for maintenance without disrupting operations. This enables preventative maintenance, and lets you run most of your data center if one piece is failed or damaged.|
|Testing||Test your equipment and disaster recovery plan, your procedures, and do it under real conditions as much as possible without risking your operation.
If you're afraid to randomly pull a plug, then you do not have a workable recovery plan.