This content is part of the Essential Guide: Building a disaster recovery architecture with cloud and colocation

Should I automate critical application failover on nodes?

Is it a best practice to automate the startup process after critical application failover from one node to another?

When nodes fail in the data center, applications need to restart as quickly as possible.

IT organizations implement a system to fail over from one node to another to allow rapid recovery of service. Manual intervention to bring an application back up slows this process down -- particularly if the node fails in the middle of the night or on a holiday.

Most critical apps are implemented as daemons or services -- they start automatically when the computer boots up. In this case, the failover starts the virtual machine where the application is installed. Virtualization allows this failover methodology for any application that runs inside a VM.

Sometimes applications need more than just an OS restart. Applications that weren't written as services may need a user to log on to the VM and get the app back up. This is usually only a problem on Windows servers. It is fairly easy to set up with auto-logon and startup applications, but some applications also need the user to click buttons or open menus before the app can run again.

Automated application failover is also possible in this scenario. I use AutoIT scripts to automate application launch after failover. Scripts are good, but this type of automation is fragile: Each version upgrade of the application might break the script.

The biggest problem is with applications that don't like to fail. Applications that require a shutdown process, and cannot recover from an unplanned shutdown, are hard to failover. Generally, these apps require further manual intervention, like listing and removing each database lock. It can be simpler to automate the process of alerts to get these applications fixed than it is to automate the fix processes.

About the author:
Alastair Cooke is a freelance trainer, consultant and blogger specializing in server and desktop virtualization. Known in Australia and New Zealand for the APAC virtualization podcast and regional community events, Cooke was awarded VMware's vExpert status for his 2010 efforts.

Next Steps

A modern approach to uptime.

Five tips to prepare for the future data center.

Automate failover in a multicloud model.

Dig Deeper on Enterprise data storage strategies