Storage-based snapshots have almost no performance impact on application servers, but the backup method is hardly foolproof or intuitive.
Storage-area network (SAN) array snapshots provide fast and effective data
Choosing the wrong SAN snapshot configuration, page size or other factor will unduly tax the storage array. Consider the effect of snapshots on array performance, the total array storage burden, snapshot integrity and how you'll restore them in a disaster recovery scenario.
Snapshot configuration and performance
Snapshots are accomplished almost instantaneously, unlike other backups that require systems to be quiesced for a considerable time.
Some arrays create a full copy of each snapshot. The server suffers very little performance overhead, but it takes time for the storage array to create the copy, and each copy iteration demands additional storage capacity. Forgo this method if the data requires frequent snapshots held for long retention periods -- it can tax your storage resources.
Differential snapshots only copy changed data, updating a metadata file with the changes. This is faster and uses less storage space, but frequent snapshots and long retentions cause enormous metadata files that take significant time to process within the array. When many snapshots are in play, it can bog down the array.
Snapshots are just part of an array's tasks, but snapshot frequency and retention often affect array performance. Determine your data center's target snapshot frequency (e.g., per hour or per day) and retention period (e.g., 90 days) and validate the storage array's performance under these conditions. Other mission-critical workloads also rely on access to the SAN array.
Snapshot configuration and storage capacity
Full-copy snapshots demand far more storage capacity than differential snapshots, but full-copy snapshots are easier to recover and generate relatively little metadata. Make full snapshots for infrequently copied, non-critical workloads.
Differential snapshots take less time and space. However, metadata files are absolutely critical for proper snapshot restoration, and frequent changes grow that file excessively. Differential snapshots are best for rapid, short-term data protection.
SAN snapshot page sizing affects storage capacity. When a snapshot page is larger than the block size of a workload it protects, the snapshot process wastes significant storage space. Different applications and versions can have different block sizes, exacerbating the wastage. A SAN array that allows customizable block sizes lets you protect every workload using a matching page size, minimizing slack space.
Consider mixing disk types within the storage array. For example, serial ATA (SATA) disks typically provide more capacity at less cost than high-performance serial-attached Small Computer System Interface (SAS) disks. Enterprises can cost-effectively store a larger number of snapshots over a longer period of time by redirecting them to lower-performance SATA disk space. The tradeoff is that writing to SATA disks rather than higher-performance SAS or other media means slower array performance.
Many SAN arrays offer data reduction technologies as well. Data deduplication removes redundant files, blocks and bytes and can reduce storage demands by 50%, depending on the content being deduplicated.
Capturing stable snapshots
A virtual machine actively runs within the server's memory space. Although a VM represents the current state of an application, there is no guarantee that the machine state is stable -- data may still be in flight within buffers or awaiting updates to local memory.
Often, VMs must be quiesced before a snapshot to flush buffers and achieve a stable, restorable machine state. Any snapshot software or array function must be compatible with the server's hypervisor, such as VMware vSphere or Microsoft Hyper-V, for automatic quiescing.
All file data must be written and stable before a snapshot occurs. Snapshots that don't capture a stable application state risk restoring as corrupt or unusable data. Mission-critical workloads typically handle an enormous amount of file data, as with Microsoft Exchange and the data created as mail flows in and out of the server.
SAN snapshot use should include routine testing and validation. Snapshot backups won't help much after a failure if the data center admins don't know how to restore and restart workloads, or if the workloads lack integrity and cannot be restored.
In virtualized environments, where workloads are abstracted from the underlying hardware, it's a simple matter to restore a workload's snapshot to a lab's test server for validation.
Validation matters when replicating snapshots to remote locations, such as a secondary SAN array in a backup data center or colocation facility. In a disaster recovery scenario, the IT shop will have to retrieve snapshots from the remote storage array across the network, or launch snapshots on remote systems. Remote restoration is a good test of wide-area network bandwidth and performance.
This was first published in February 2014