The key to ensuring that an enterprise has optimum uptime reliability begins with the correct philosophy. While...
disaster recovery remains a top priority for senior executives and IT directors alike, the term itself implies that the business has failed, and recovery remains the only option. Conversely, business continuity speaks to a more useful philosophy, in which the business prepares for all possible scenarios and develops a plan that allows the core business to continue free of interruption.
Here are some key strategies for ensuring that your business operations are fully reliable and adequately prepared for any potential threats to uptime:
Maintain a thoughtful approach to communications service provider selection and redundancy: It is important to have multiple service providers for both internal and external connectivity to avoid over-reliance on a particular provider. If an outage occurs at the provider level, the back-up site may be affected along with the main site if too much bandwidth was provided by that particular service provider.
Replicate your telecommunications infrastructure: An enterprise telecommunications infrastructure should be built and administered (via labeling and documentation) in a similar manner to the main sites. This would be advantageous for relocated staff being able to seamlessly enter the facility and operate it quickly and efficiently.
Virtual redundancy: An enterprise no longer needs to put all of its eggs into one critical facility basket. Virtual redundancy via "mirrored" processes in physically separate facilities is now more feasible due to significant improvement in data transmission rates.
Eliminate single points of failure: By instituting dual points of entry into a disaster recovery facility for fiber and cabling, telecommunications and data transmission remains operational in the event of a natural disaster. If one of the telecommunications routes should incur damage, the building will have an alternate route with which to bring telephone and data into the building. Microwave communications capabilities are less susceptible to cable cuts and other regional or off-site failures and where line-of-sight exists between buildings can provide a wireless backup of critical communication systems.
Diminish reliance on outside utilities: Critical facilities must be capable of continuous operation without reliance on outside utilities for an extended period of time in the event of a disaster. This speaks to the importance of on-site power generation, make-up water reserves or wells, and on-site food and water reserves.
Site selection: Perhaps the most effective means of avoiding a disaster is to understand the potential natural disasters (such as earthquakes, floods, droughts, forest fires, blizzards, etc.) and weigh their potential impact to the overall reliability of the enterprise. Additional site selection considerations include availability of multiple network carriers, ability to secure the site, underground vs. overhead cables, reliability of utility services and proximity to HAZMAT routes or potential terrorist targets.
Server virtualization: Server virtualization can provide automatic prioritization and hardware failure "work-arounds" that can mitigate business application impacts regardless of the physical cause.
Training and development: The majority of business "impacts" result from human activity. Investing in the training and retention of qualified staff helps to ensure the site infrastructure provides maximum reliability. It is also important to "inspect what you expect," meaning that realistic emergency drills, testing of failure scenarios, and maintaining a vigilant preventive and/or predictive maintenance program is vital to ensuring emergency plans are executed correctly when the need arises.
Establish Criticality Levels: When assessing the overall reliability of a site, the operation is only as strong as its weakest link. With this in mind, it is important to classify the criticality of each facility and understand that the vulnerability of an organization is directly attributed to the weakest link in its chain. This process must incorporate a wide range of relevant factors, including HVAC and electrical systems, facility security, IT infrastructure, maintenance and operations and disaster preparedness, which will ultimately impact the reliability of the facility and its data center.
Syska Hennessy Group designed the Criticality Levels concept, which assesses the availability and reliability of a business's critical facility. Performing this exercise is useful for existing facilities and in the design process in order to select the proper components that complement the critical mission of the facility and safeguard its data center against downtime.
ABOUT THE AUTHORS: Terry Rodgers is the CPE, Green CF committee chair and senior associate of Syska Hennessy Group. Jeffery Kirchner is the RCDD of Syska-Hennessy's Technology Design Group.
Dig Deeper on Enterprise data storage strategies