Virtual machine backup strategies

Virtualization admins can mix traditional data backup technologies and newer tactics for safer, more efficient virtual machine backups.

Virtualization has brought a new focus on management efficiency and the effective use of valuable computing resources. But as fewer physical servers host increasing numbers of virtual machines (VMs), IT administrators face a new series of data protection and backup challenges. Today, it’s more than just making a copy of important data files. Each VM state must also be protected and kept readily accessible.

Each organization must re-evaluate its backup needs within the context of its virtual infrastructure and then select the most appropriate technologies to provide the best data protection. That may mean adapting traditional backup approaches to a virtual environment, while other cases may warrant a shift to other backup paradigms.  

The need for server backups is fundamentally the same in a virtualized server as it is for traditional nonvirtualized servers. Backups guard against data loss caused by factors such as hardware failures, application or operating system failures and human error. But there are some important differences in a virtual environment that will influence an organization’s backup choices.

Traditional vs. virtualization backup
First, virtual infrastructures often have to contend with more data. More applications are certainly producing larger and more plentiful files, but each VM is itself a large file -- such as VMware .VMDK files -- that must also be protected. Because a VM file is essentially an image of server memory that reflects the complete operating state of a VM at some point in time, VM backups usually occur frequently throughout the course of each day.

Backup demands translate to server computing performance. In traditional, nonvirtualized servers, a backup process normally consumes up to 100% of the server’s processing and I/O resources -- nothing else was running on that server anyway. With virtual servers, there may be 10, 20 or even more VMs all sharing a portion of the server’s resources. A backup process will need to operate effectively within the resources allocated to that respective VM.

If the backup of any single VM makes excessive demands on the physical server, the performance of other running VMs can be adversely affected. A similar consideration occurs when VM backups are restored to a server. Each VM that comes back online will use a portion of the server’s computing resources, leaving fewer free resources to restore subsequent VMs. Thus, restoration can take longer as more VMs are restored. This delicate balancing act of time and performance is often overlooked when backing up in a virtual setting.

Server virtualization also places more demands on storage and networks. Unlike traditional servers that use local storage for operating systems, drivers and applications, most virtual data centers store virtual server data centrally in a shared storage resource, like a SAN. This includes application data along with the latest image of each VM and certain types of backups. As a result, the SAN must perform adequately enough to exchange data between dozens -- even hundreds -- of VMs.

The network that connects storage to the physical servers must also support the simultaneous data traffic produced by myriad independent VMs. This is one of the reasons that a separate high-performance Fibre Channel SAN is deployed for a virtual data center, though IP-based SANs like iSCSI (or even NAS) will work in many situations.

Increasing volumes of data, heavier demands on server computing resources and the changing implications of storage and network architectures all conspire to complicate long-established backup processes. Organizations will now need to adapt their established backups or shift to alternative backup methods.

Updating established backup techniques for virtual servers
Virtualization certainly does not exclude tape, and organizations that currently use tape-based backups can continue to use those products. For example, each VM exists as an independent server, so it’s a simple matter to add a tape backup agent to each VM and back up to a tape target.

Tape is relatively inexpensive, well understood and supported by many different backup tools. In fact, organizations that are new to server virtualization often use their existing tape backup approach while researching methods better suited for virtual environments.

Tape works fine with small or noncritical virtualization deployments, but it’s easy to see that server computing resources and network performance can quickly become overwhelmed with backup traffic, especially if multiple VMs attempt a backup simultaneously. The main problem here is time. 

You must quiesce a VM before a backup starts, so it’s unavailable to users during a backup cycle. A backup cycle can take a great deal longer for a VM because of contention with computing and network resources. Consequently, backing up a VM to traditional tape may result in unacceptably long backup windows.

“Those processes take a lot of network bandwidth and storage bandwidth and throughput,” said Ray Lucchesi, president and founder of Silverton Consulting Inc. in Broomfield, Colo. “So when you start loading virtual machines, all doing backups within an 8 p.m. to 5 a.m. window, there can potentially be a problem.”

Unfortunately, there is no practical way to alleviate these drawbacks with direct tape backup architectures. The most common adaptations used to improve performance all require the inclusion of disk storage. Virtual tape library (VTL) technology is one avenue. It passes backup data to disk storage that the backup software has provisioned to emulate a tape system. The disk storage target can be a SAN or a NAS storage subsystem, and a tape drive will typically create a secondary copy of the VTL backup. Most organizations that adopt virtualization today are embracing a variety of newer disk-based data protection options, relegating tape to long-term or archival storage roles.

Considering new virtual server backup techniques
The main challenge with virtual server backups is to minimize the backup window while keeping network traffic at manageable levels. Virtualization technology can be combined with a SAN to provide a variety of powerful backup options.

The most versatile disk-based data protection technique to appear for virtual servers is the snapshot. A snapshot is just what the name implies -- a fast point-in-time (PIT) copy of a VM file to high-performance storage such as a Fibre Channel SAN. Because a snapshot can be accomplished in minutes or less, there is very little application disruption.

In many cases, users don’t even realize a snapshot has taken place. Snapshots can be full and capture the entire VM, or they can be incremental where only changes are collected. Once captured to storage, the snapshot can be used in several important ways. It can be replicated -- or mirrored -- to a remote facility for disaster recovery, it can be cloned to other servers, and it can even be copied to a dedicated backup server that can move the snapshot to tape -- such as VMware Consolidated Backup.

Snapshots do require some management. Each snapshot requires storage space, so storage monitoring and capacity planning can play a greater role when using snapshots. It’s likely that you will only choose to store a limited number of snapshots, so be sure to configure the snapshot tool to delete snapshots that “age out” and reuse that disk space. It’s also important to flush any data buffers so that the exact machine state is captured in its entirety. Otherwise, the machine state may be left incoherent or corrupted, which makes recovery impossible.

A close cousin of the snapshot is continuous data protection (CDP), which closely tracks and journals any changes to a data set, such as a virtual machine state. Although you can take snapshots frequently, CDP is generally regarded as up-to-the-moment data protection best for VMs that are constantly changing. The continuous nature of CDP alleviates the need to flush buffers, but it’s still important to manage storage consumption.

The live migration of VMs between physical servers is certainly not a backup option, but it can affect data protection behavior, and administrators must account for that. Even though VMs can be migrated on demand, they must still be able to access data, and applications must still know where to find VMs.

In many cases, data files must also be relocated to accommodate VM migration. Administrators must consider how migration interacts with snapshot, replication, backup and other data movement tools.

Here is another wrinkle to consider. VMs typically sit on top of a virtualization platform -- the hypervisor. This means an additional backup is needed to protect the underlying “system platform,” which normally includes an operating system and hypervisor such as VMware ESX, Citrix Xen or Microsoft Hyper-V.

“Testing for virtual server backups needs to take into consideration both restoring the guest [and] restoring the entire system,” said Greg Schulz, founder and senior analyst at The Server and StorageIO Group in Stillwater, Minn. “That means an extra layer of testing may be required to make sure that a guest can be restored into a VM, as well as the entire system being able to be restored.”

Any backup strategy -- or change in strategy -- should include a consideration of backup testing. Testing disk-based backups in a virtual environment is generally much easier than testing traditional tape backups. Snapshots and CDP files can be restored quickly to idle servers and tested without any disruption to the production environment. This makes it much easier to train and drill IT staff on effective restoration, which vastly raises their confidence and provides a more responsive vehicle for evaluating, refining and improving backup processes.

Tactics for more efficient backups
Today’s data protection technologies offer far more flexibility than traditional tape systems. As you saw earlier, snapshots of each virtual machine can be taken in a matter of minutes -- even seconds -- and then backed up from the storage system without impacting the production environment. But other tactics can enhance the backup process even further.

Consider the use of data deduplication technologies for storage subsystems. VM files contain a great deal of empty space and redundant data. For example, if you run 50 virtual machines across 10 servers, most of those host and guest machines will probably run the same operating system, which can be deduplicated and significantly reduce the amount of storage needed for snapshots. And the smaller data set means speedier backups to other media and faster replication off-site.

Data protection will always require some amount of storage and network bandwidth, so try staggering the scheduling of VM snapshots. For example, if a physical server hosts 10 VMs, it’s probably a bad idea to take snapshots of all 10 VMs simultaneously. Stagger the snapshots so that only one or two VMs are affected at a time. This limits the spike in network and storage I/O.

Finally, it’s common for virtualization users to adopt multiple data protection schemes and apply them differently to each VM depending on its relative importance to the organization. For example, CDP may be used to protect a mission-critical VM, while standard business applications may receive snapshots once an hour. Noncritical VMs may receive snapshots only a few times per day. And all of that PIT data can be periodically replicated to a DR site or offloaded to another backup medium such as VTL or tape. Administrators should feel free to mix and match their data protection in ways that best fit the respective VM being protected. 

Stephen J. Bigelow, a senior technology writer in the Data Center and Virtualization Media Group at TechTarget Inc., has more than 15 years of technical writing experience in the PC/technology industry. He holds a bachelor of science in electrical engineering, along with CompTIA A+, Network+, Security+ and Server+ certifications, and has written hundreds of articles and more than 15 feature books on computer troubleshooting, including Bigelow’s PC Hardware Desk Reference and Bigelow’s PC Hardware Annoyances. Contact him at [email protected].

Dig Deeper on Data center budget and culture