Disaster recovery RTOs are increasingly aggressive due to the business's reliance on IT, yet IT pros' main concern remains disaster recovery costs. What gives?
The majority of IT pros want to add efficiency to their disaster recovery (DR) plan, and improve recovery time (see the infographic). Yet nearly every IT and business professional surveyed -- 92% of respondents to the TechTarget Disaster Recovery, Business Continuity Survey run in the first quarter of 2015 -- said price was one of the most important factors when evaluating DR products.
Faster recovery and better efficiency both drive up disaster recovery cost for the IT department, with hot or warm recovery site investments and more frequent DR tests. We asked DR expert Michael Herrera, CEO of MHA Consulting, an international business continuity consulting firm in Glendale, Ariz., to explain the disconnect between the continuity that businesses crave, how much they're willing to spend on it and how to reconcile cost with fast RTOs.
Who makes the DR decisions?
Michael Herrera: I wish the businesses would be more involved as the systems are key to their existence, but it's typically always IT. In many cases, IT is trying to get the whole BC plan in place as well. BC is still an area where many companies are just starting out -- often they see DR as BC.
What are BC, RTO and RPO?
A business continuity (BC) plan is a fail-safe document for businesses.
Recovery time objective (RTO) is the maximum amount of down time for computers, systems or applications after a disaster.
Recovery point objective (RPO) is the age of backup files that must to be restored in order for normal operations to resume after a failure.
What do people ask most about in DR?
Herrera: People ask most about the alignment of RTO and RPO and what that means to their DR strategy, which leads directly to price. This year, for example, we had companies [which] completed their business impact analysis [BIA] with four and 12 hour RTO quotes, and management said no way, that's too aggressive -- we're sticking with 48 hours. These are very knowledgeable people, but they still think that people can work manually. There are no manual workarounds anymore when the IT systems go down.
Typically, you need active/active or active/passive at the minimum to meet a 24-hour or [faster disaster recovery] RTO. You need systems ready to go immediately. Even at 48 hours for an RTO, you need a warm site because of how much data and systems are involved. You used to have just a few core [IT] systems. But, in today's world, you have core and feeder systems and complex data and processes running with this business data. The only way to know if you need a system is to run a DR test and figure out that your core system doesn't work when this piece of it isn't there.
Volatility of change is so high now in data centers, something used to be critical ... becomes less important and then another system becomes critical.
Why is price a top concern?
Herrera: Management continues to see DR as insurance: 'It's an expense we can't qualify -- what are the odds that a disaster will happen?' Often, results of the business impact analysis aren't believed or changed to match what management thinks the recovery time objectives should be.
Often, larger companies do BIA in house, but many outsource the BIA. Either way, the same thing happens -- people argue that BC/DR isn't as important. Where are we going to find the money for it? Typically, IT pays for all of it versus being a shared expense.
What common mistakes happen?
Herrera: Businesses commonly make mistakes around knowing what they really need. DR is not just about servers; it's really about the business processes. IT itself is about the business. The business impact analysis has to be rigorous enough. What's the strategy that works for us? For many companies it's a hybrid approach.
Hybrid means some internal DR and some external on the cloud. What resources do they have already? Fail over to a branch site server room and cloud, for example. What does [disaster recovery] cost for today and tomorrow, and regular testing?
Is 'recovery time actual' [RTA] often longer than planned RTO?
Herrera: You don't have to do it all at once: Do 48-hour recovery and work toward 24-hour RTO over the next two years. You might not get to it right away.
So often, we can't get IT to actually tell us the RTA from a DR test or actual failure. In many cases, IT doesn't know or won't easily share that info. They don't want the business to know that they couldn't recover fully or took a day longer than promised. Or in the same vein, they might recover faster than their RTO, but don't want to set expectations to that faster recovery time.
Are companies using cloud-based DR more?
Herrera: They are -- we're seeing a shift of companies adopting cloud-based DR. You definitely have to have your own private cloud for data protection purposes.
What's involved in the disaster recovery cost?
Herrera: There's the backup site and hardware, or cloud resources, but there are a lot of soft costs. Resource time: IT is strapped for resources and time. Testing [is done] not just once a year but multiple times, and [there is] ongoing maintenance. These [additional DR] costs add up.
[Ask yourself:] What is it I really need on the backup site side? What's the level of availability we need there? It doesn't [necessarily] need to mirror the production site exactly.
Meredith Courtemanche is the senior site editor for SearchDataCenter. Follow @DataCenterTT for news and tips on data center IT and facilities.
Is DRaaS your new DR plan?
What cloud DR really means
Move DR cautiously into clouds
DR test plan mistakes to avoid