Editor's Note: In part two of our series on rebuilding after a disaster, Higbie discusses remote offsite services...
and how to bring business operations back on line at an alternative site.
The success of your remote site operations will depend largely on your recovery team and resources at the remote site.
There are two types of remote sites: One that is company owned and one that is outsourced from a provider. Companies that arranged for redundant sites within the same affected city may find themselves in the predicament that the redundant site was leveled along with the primary site. In these circumstances, business continuity plans will be less than effective or even wasted while trying to rebuild critical services in order to operate. Larger companies that have remote sites in other cities may fare better depending on the location of the other sites. Companies that have outsourced their redundant site to providers may be in the best operating condition as these sites are designed around such disasters and their plans for cutovers may be more in depth.
In either scenario, there are some things that may have been overlooked at the redundant site. First, there will be supplies needed at these sites that may not already be on the premises. This would include supplies such as invoices, stamps, software license numbers and proof of purchase, checks and the appropriate signature graphics, updated security clearances to the information and data entry personnel. Data entry personnel are particularly critical if the backups or information is not a mirror image of the failed data center. The recover/continuity team will need to begin with an assessment of what is on site and what will need to be ordered and delivered to the site. Do not forget that at some point you will have to upgrade your redundant site to match new hardware and software – so maintain a budget log for planning purposes.
The first step will be to evaluate services at the redundant site. This step is critical, in particular, where carrier circuits and bandwidth considerations are concerned, as well as overall processing and storage capabilities. Many redundant sites are configured with minimal services for cost savings throughout the year. These services may not be sufficient, however, for daily operations. For instance, if the redundant site was outfitted with one T-1 circuit that was used predominantly for backups, and now the site must operate all services, this circuit may need to be increased to match services from the downed site for efficiency in operations and to handle additional loads placed on the site from internet services, email traffic, and the like. There are some services that may have been optional in a contract that are now required due to the outage and these should be discovered and addressed very early in the cutover process allowing a healthy environment for operations as the remaining processes continue.
An audit team will need to be in place to determine how current the information and data stores are in relation to the time of failure. They will also be needed for ongoing operations to be sure that data entered is accurate. In many instances, the data entry teams may not be fully trained on the software so auditing is a critical step. The audit team should consist of payroll personnel, accounting AP/AR personnel, fulfillment personnel and someone versed in any specialty application needs. The audit team will first, however, need to determine the accuracy of data that exists at the site.
Without a doubt, there will likely be some discrepancy between the data lost and what's available at the remote site unless the two sites were mirrored in real time. Things to check at the redundant site include software revisions, updated data dictionaries, updated device driver patches and operating system patches, updated security information and financials. Device drivers and patch revisions are critical. There are companies that have not kept the redundant sites up-to-date, counting instead on backups and manual restorations. Without the appropriate revision on both levels, the data could be restored, but the software not function.
The importance of logs
During all phases of restoration, the single most important step is documentation in the form of logs. While people are working feverishly to restore services, one wrong move can have an adverse affect on many other steps. Troubleshooting problems cannot be done effectively without proper logs. There should be one or two people that are in charge of making sure that everything is logged and compare all steps taken to the disaster recovery/business continuity plan. The plan will probably change after a disaster. Most plans are reviewed either once or twice a year and with the changes that occur in operations, there will likely be missing information.
It is also important to have financial and accounting abilities at the redundant site. There will be a need for payment of replacement gear, personnel, shipping charges, security personnel, software replacements, consulting services, printing for customer notifications, rerouting of critical circuits and a myriad of other expenses that may or may not be in your continuity plan.
Another critical step is the notification processes. Notifications will include coordination with provider services to reroute critical links an contact phone numbers, customers, suppliers, employees, insurance representatives, support personnel and of course the disaster recovery team. Critical service providers will likely need to change routes and circuits to the new operating facility and the command center locations. Notifications should be part of your plan and should include updated contact information including phone numbers, email contacts and addresses. The documentation team should update any contacts that are not current. Depending on the location of your carriers and other critical personnel this step may take several days.
The next critical step is to assess all process documentation and run schedules. Any good continuity plan will contain all processes for each critical company function. The best way to develop this documentation is to have each employee document his/her daily job and then train another person using the documentation. This will assure that any workarounds are caught and contained in the documentation. In the absence of this documentation, processes and run schedules will need to be documented for those working at the remote site during the cutover period. The process team will need to work closely with the logging and audit teams to assure that any critical steps are not omitted. Without documentation, there will be some trial and error involved, but if everything is logged, it will be far easier to resolve errors that may occur.
Once the redundant site is functional, you will need to evaluate other factors at the original site. Part three of this series will look at active components including servers, networking gear, and software. The most critical steps for redundant site operations are logging, notification, process documents and assurance that forms and supplies exist at the redundant site. If a disaster recovery/business continuity plan does not exist, this is a good time to develop one. If you have not tested your existing plans, the logging process will help fine tune the plan for future use and for cutting back over to your primary site when it becomes functional.
Carrie Higbie has been involved in the computing and networking industries for nearly 20 years and has taught classes for Novell, Microsoft, and Cisco certifications as well as CAD/CAE, networking and programming on a collegiate level. Carrie currently works as the Network Applications Market Manager with The Siemon Company, where she provides liaison services to assure harmony between active electronics and networking infrastructures. She participates with the IEEE, TIA and other consortiums and works to further educate the end-user community on the importance of a quality infrastructure.