The recent power outage at colocation provider 365 Main Inc. in San Francisco has put the question of disaster recovery (DR) first and foremost in data center managers' minds.
Even before the 365 Main outage on July 24, which occurred after backup diesel generators failed to start, data center managers were only moderately optimistic about their disaster recovery prospects.
Attendees at Gartner Inc.'s "Best Practices in Disaster Recovery Testing" session presented at the IT Infrastructure, Operations & Management Summit in Orlando, Fla., in June were asked if they feel confident about their data center disaster recovery plans. Of the 100 or so attendees, about 40% said they felt just "OK" about their plans.
"That [statistic] doesn't give me a warm and fuzzy feeling that people are very confident with their DR plans," said Roberta J. Witty, the session's presenter and research vice president at Gartner. "The level of confidence in DR plans are not where they should be in most cases."
Many companies take uptime for granted and don't invest in bulletproof backup systems. In fact, most disaster recovery plans are full of holes due to inadequate funding, low awareness of what could happen and not enough management support, Witty said.
"In a lot of cases, the tech group hasn't budgeted for DR, especially new companies; it costs money to create this backup infrastructure. IT knows it can recover only a portion of what the business expects, and the business side doesn't understand what is involved with backup," Witty said.
On average and depending on industry, about 7% of corporate IT budgets are allocated to disaster recovery. Some businesses may spend more, Witty said. Most enterprises don't employ disaster recovery testing at all business levels – global, regional and local -- but they should, she said.
"Either pay now, or you will pay later," Witty said.
But even the most stable data centers can run into trouble, as evidenced by 365 Main, which invests in redundant power sources and runs its business on the promise of constant uptime.
Jason Needham, the director of product management at F5 Networks Inc., which provides disaster recovery services for Web sites, said companies can proactively avoid this kind of problem by spreading their backup sources among more than one service provider.
"Some businesses do have redundant data centers, and they make the investments and are struggling with the costs of doing that. But by having more than one backup source available, you are spreading your risk and not putting all of your eggs in one basket," Needham said.
Businesses with data centers in places like San Francisco that carry a risk of earthquakes and other disasters, for instance, should have a backup provider located far from that risk and on a different power grid, Needham said.
"A multi-tiered strategy is best. Even if you don't have a full-capacity data center, don't rely on any one service provider," Needham said. "The 365 Main outage was just an hour, but it took craigslist [one of the companies hosted at the facility] 11 hours to get back online. Avoiding brand damage is absolutely essential for businesses."Backup methods
The first step in creating a backup system is to protect all systems with an uninterruptible power supply (UPS). The measure ensures that, when utility power goes down, equipment is powered via a separate source and that enough generators keep critical equipment running, said Gartner's Witty.
"Make sure you have tested these systems during normal operations, and make sure you have enough fuel for the generators to take you through the first eight hours or so of the power outage," said Witty.
Witty also suggested testing all backup components at least once each year, and F5 Networks' Needham agreed.
"Perform tests of the backup systems you have in place to make sure these things are configured correctly so that they actually work when needed," Needham said.
In addition, various business continuity management (BCM) programs should be implemented and tested periodically. Some standard BCM measures include the following:
- Emergency notification: Data center employees should know whom to notify in case of an emergency and, during an event, how to direct customers, trading partners, suppliers, the press, among others.
- Testing: It's imperative for companies to test their recovery plans regularly to ensure the smoothest possible recovery.
- Technology and service provider resources: Organizations need to ask whether their technology and service providers are equipped to handle rapid recovery after an event.
- Personnel training: Companies often operate on the assumption that employees will be around to restart the business immediately; but during a disaster, the first thing employees do is protect their homes and families. To ensure continuity after a disaster, companies need to form recovery teams.
Let us know what you think about the story; e-mail Bridget Botelho, News Writer.
Also, check out our news blog at serverspecs.blogs.techtarget.com.