How do you know when your disaster recovery (DR) and business continuity plans are complete? That's what the Hanover
Insurance Group Inc., asked Kelley Okolita when they interviewed her for the position of business continuity and disaster recovery program manager at the Worcester, Mass.-based firm.
"I said, 'It's simple,'" replied Okolita. "'I'm done when every Hanover employee can answer a question: Does everybody know what to do next if they get kicked out of the primary site?' They should know what they're supposed to do next," Okolita said.
In a presentation at the Gartner Data Center Conference in Las Vegas, Okolita spelled out how she built a plan for Hanover, which hired her in mid-2005 from Fidelity Investments, where she had done similar work. Okolita stressed two major themes: documentation and testing.
Documentation central to disaster recovery
The purpose of a disaster recovery and business continuity plan is to safeguard human life and keep the business running. The best way to do so, Okolita said, is to take as many decisions out of the hands of humans as possible. Where should employees go in the event of an emergency? Tell them where to go so there's no guesswork involved.
"People in crisis make bad decisions," she said. "The more you can decide for people, particularly in the first four to five hours, the more they just know what to do and the better your recovery is going to be."
Most important is ensuring that everyone knows how to contact other members of the organization. Even if you have no other plan, if you have your people, you might be able to do something, Okolita said. Rather than having a long contact list, Hanover set up a conference bridge. If there's an emergency, everyone in the organization can call in to find out further instructions or speak to one another. It's not complex, but it works.
Next, Okolita builds teams according to event type and location. If a backup facility experiences a power outage, for example, make sure facility employees are on that team. If a primary data center suffers a security breach, compose a team of IT security employees. Organizing groups in this manner is better because you can document who plays key roles and who has certain responsibilities.
Training and testing
Training people involves education about documentation, because according to Okolita, "it really doesn't matter if you have a plan if no one knows what's in it."
Training also involves testing, or as Okolita called it, "exercising."
"There is no way to fail a business continuity test, because if we knew it all worked, we wouldn't test it," she said. "So we started calling them exercising. The problem with calling it a test is you wonder if you're going to fail."
But whatever you call it, do it a lot. Test for a loss of systems. Test for a loss of facility power. Test for an IT and a physical security breach. Test for natural disaster scenarios. Test, test, test. And keep track of who is performing and who isn't.
"A lot of people will do what they should do, but other people won't unless you keep a status report on how they did," Okolita said. "For each task a person is responsible for, they have to come back and show me that they did that task, and they get a red or a green for that."
Disaster recovery scare tactics
Okolita was blunt about certain aspects of disaster recovery strategy: For executives to agree that a DR and business continuity plan is worth developing and testing, you may have to sound the alarm. When Okolita worked at Fidelity, the company's data center was located on a pier jutting into Boston Harbor, surrounded by water on all three sides, with the airport across the pier, and the Big Dig federal highway tunnel project less than 100 yards away. Still Okolita could not convince executives that any of these factors posed a threat to the facility's uptime.
Instead, it was the presence of a nearby drawbridge on which sat the cables running to the facility that convinced them of the need for more solid DR measures. Executives could imagine that if the bridge went up, the cables would disconnect and the facility would shut down.
"We … forget that the only reason we have data centers is that the business needs them," Okolita emphasized. "The business needs to own the program. This isn't a technology issue. It's a business issue."