Businesses tend to either ramp up or slow way down over the holiday season. This depends, of course, on the industry they're in. We're not in retail or a heavy-transaction type of business, so we use the holiday season as a time to schedule significant upgrades or modifications, because any downtime these changes incur will have an effect on fewer people. It's also a great time to review data center disaster recovery (DR) plans and look at staffing issues for those times when you have only a skeleton crew.
Managing disaster recovery personnel
Managing personnel is probably the biggest challenge in the data center, whether it's the holiday season or in the event of a disaster. One thing that is often ignored is that if you have people off and a disaster occurs, getting the DR site up and running poses a serious problem. People usually want to support the company, but it's not always going to be at the forefront of their thinking.
The way we deal with this issue is by telling people up front that in the event of a disaster you're counting on them to come in to the office. It's really a matter of getting everyone on the same page and understanding that if there is a disaster, that they're expected to come in. When people know that up front, then they can make sure to have a plan ready if the need arises. Everyone in IT has to understand that it's their job to keep things running in the event of a disaster, even if that disaster happens during the holiday.
Determining who's going to be on the DR response team poses another challenge. I can think of one example when I thought that I made the right move by designating a senior member of the team, but it turned out that I was better off having a more junior person who's cross-trained in a couple functions.
Disaster scenario games
Developing a disaster recovery plan for you data center requires the data center or IT manager to really think things through. It's wise to develop different disaster scenarios with a variety of on-hand staff levels because you have to account for those situations when people might not show up, such as during the holidays.
One of the things we do is make a game of it. Appoint someone to be the referee and sit down with the IT manager and drill him or her with different situations. Start out with a usual situation: a disaster just occurred; what do you do? The data center manager will likely respond with a canned reaction: Activate the DR site; and X number of designated people show up and we keep things running.
Then have the referee start getting creative with scenarios: OK, of those 12 designees, six don't show up. Now what do you do? The first time data center managers play this game, they usually get this deer-in-the-headlights look, but it's good way to get people thinking about ways to stop planning for the usual and shooting for the unusual.
Data center managers might argue that the probably that half of their DR staff won't show up is so low that it shouldn't be considered. That's the wrong answer if your goal is to develop an effective data center disaster recovery or holiday emergency plan.
Disaster risk analysis
One of the key things to do is the risk analysis to determine vulnerabilities and costs associated with maintaining them versus losing them. Every risk has an associated cost. It might cost you more to designate sufficient people to head up the disaster recovery, but if you can't operate because you haven't planned for a situation where people aren't able to make it, chances are it would be more costly in the long run.
Often, the disaster recovery planning process is a matter of short- and long-term perspectives. For example, you have to look at your applications and ask yourself how long you can afford to have each of them down. Doing this does a few things: It helps you size your DR site; and it helps you figure out what people you need to manage these applications.
Furthermore, your DR plan is part of running the business. In other words, if you can't justify the cost of developing and testing your DR plan, then you have to consider whether you can justify remaining in business.
Disaster recovery planning is a continuous process
One thing that happens is that people work really hard and develop a plan and put it on the shelf and don't look at it for three years. By then it's out of date, and if you haven't updated your plan and something happens, you may not have the appropriate responses documented.
The military does a good job of this. If you look at their DR plans, there are very specific steps, and you can literally turn to the correct page for step-by-step instructions on how to handle each disaster because they thought through these things ahead of time without the stress of being under pressure.
In an ideal situation, you have all these scenarios mapped and you have a "disaster book." If something happens you can say, "This is like disaster 27 in my book," turn to that page and execute your plan. By the way, have more than one copy of the book. I know a place that did a stellar job with documentation and procedures and had everything so well planned that a trained monkey could do it. But when a disaster came in the form of a fire, it hit the area where they stored their documentation.
This and every holiday season, remind people that if disaster strikes, they need to be reachable. But sometimes people can't come in, so data centers need to be prepared for that, too. Lastly, plan, test and review. It's the only way to make sure you've covered your bases.ABOUT THE AUTHOR: Robert Rosen is the immediate past president of Share Inc. Currently, he serves as the CIO at the National Institute of Arthritis and Musculoskeletal and Skin Diseases of the National Institutes of Health, U.S. Department of Health and Human Services.