Most businesses recognize the importance of ensuring high availability and quick disaster recovery of their mission-critical applications and data to avoid financial losses after a system failure or disaster. But as I've talked to a number of IT organizations, I've found only a few that have properly balanced the technology costs with the needs of the organization.
Most IT organizations don't know if their critical business applications and data are properly protected. They have applied what they feel are the proper levels of technology, but often they have over-invested, possibly meeting the needs of one or two critical applications (but they don't know that for sure), with overkill for all other applications.
Most IT organizations understand that performing a business impact analysis (BIA) will help them apply the proper technology to prevent business loss. However, I have witnessed a mistake made by nearly all companies that embark for their first time on a BIA.
Is avoiding employee inconvenience mission critical?
The most common mistake businesses make when determining service-level requirements, disaster recovery time objectives (RTO) and recovery point objectives (RPO) is trying to keep the business running as if nothing happened.
Companies attempt to avoid inconvenience to their employees and the overall business process, but inconvenience usually doesn't translate into business-threatening losses or costs. Yes, there is a cost associated with inconvenience, but most often it's a short-term lost opportunity cost because employees must resort to manual business processes.
Here's an example: I spoke with a company that was organized so that the IT department provides services to business units in the company. One of the business units came to IT and asked for no more than 15 minutes RTO and 5 minutes RPO for disaster recovery (DR), plus five-nines of uptime for its enterprise resource planning (ERP) system. This was the first I had heard of someone requesting that level of resiliency for an ERP system -- more common minimums are RTO of 24 hours with RPO of 4 hours.
The IT organization determined the cost to meet those objectives. Rough estimations were about 2.5 times what is normally applied to an ERP system. The recoverability was even more costly -- about 9 times of what is normally applied. The IT service organization then decided to go back to the business unit with the costs of what is normally done and what the business unit requested, and ask the business unit a simple question: "Are you trying to avoid employee inconvenience with your requested service levels?"
Manual workaround procedures do result in employee inconvenience during an outage event, but most of the time result in overall lower costs for business continuity. In a recent column on top disaster recovery budget wasters, I cited the example of a distributor that chose to implement manual workaround procedures in lieu of spending nearly $2 million on redundant Internet connectivity to its data center. When the back hoe struck, the employees had to scramble and were inconvenienced for the 36 hours that the data center connectivity was down to manually fulfill orders using telephone, paper and pencil. But the distributor was able to get all its day's orders shipped by 5 p.m., and in retrospect still saved money, even in light of the inconvenience during the outage.
Exceptions to the rule
There are always cases where employee inconvenience cannot be tolerated. Situations where life or property is at stake must avoid any employee inconvenience. Examples include 911 emergency dispatch systems, where time can mean the difference between life and death.
In addition to employee inconvenience is the level of inconvenience that customers may be willing to withstand. There are a number of factors that must be considered when determining customer inconvenience. An e-commerce website outage in a competitive market can spell financial disaster to a business. If customers cannot obtain your products and services due to an outage, they may go to your competition. You can quickly loose customers forever.
Balancing resiliency against cost
Tight budgets and increased emphasis on cost reduction is driving IT organizations to more closely inspect and balance the risk of inadequate resiliency against the cost of robust resiliency. A thorough BIA with a focus on evaluating the real business costs of an outage as opposed to inconvenience is the first step to achieving balance. At one extreme, a company can spend more than is required for resiliency. At the other extreme, a company can leave itself vulnerable to significant losses, should it apply inadequate resiliency. The cost of various levels of system resiliency can be charted against the business financial impact. The sweet spot for your IT business system is where the two cross. The following figure illustrates this, point showing the cost of different resiliency levels against the business impact loss over time for an e-commerce business site that makes $800 million per year.
The pink area to the left represents over spending on resiliency, and the area to the right represents under spending and putting the business into unnecessary risk. In this example, the sweet spot is around one-hour recovery times with no more than about one hour of outage per year.
ABOUT THE AUTHOR: Richard Jones is vice president and service director for Data Center Strategies at Midvale, Utah-based Burton Group. He can be reached at email@example.com.
What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at firstname.lastname@example.org.