The public cloud is an attractive option for hyper-converged infrastructure data backup and protection.
Public cloud backup uses low-cost storage capacity. With less backup data stored on the hyper-converged infrastructure (HCI) itself, there is less pressure to buy more on-premises nodes for capacity. Also, the public cloud is inherently an off-site location, so it provides great protection from on-premises disasters. Restoring data from hyper-converged infrastructure cloud backup can pose some challenges, however.
How hyper-converged infrastructure cloud backup works
Most HCI products have some sort of integrated data protection mechanism -- usually backup policies that store additional copies of VM data alongside the production VMs on the HCI cluster.
Typically, rather than making a full copy of the VM, the HCI storage redirects the new data writes into a VM so they do not overwrite the older backup data. These metadata-based backups are more space-efficient than full copy backups, because the infrastructure only needs to store the changed blocks between each backup. These backups are also very fast and have little effect on VM performance. As a result, HCI products often back up VMs multiple times per day -- not only overnight, as many legacy infrastructure administrators prefer.
Although HCI data backup is efficient in a vacuum, dozens of backups per day lead to more storage capacity being used over time. Often, HCI storage does not fill up with VMs, but with backups. Sending older backup data off the HCI storage reduces the need to buy more nodes for capacity. And sending backups to the public cloud avoids the need to buy any type of on-premises backup storage.
The options for performing public cloud backup vary among HCI vendors, however.
Nutanix supports hyper-converged infrastructure cloud backup
The first public cloud option for Nutanix was the ability to send backups to Amazon Web Services (AWS). The company now supports Microsoft Azure as well. Nutanix public cloud backup relies on object storage, which has the lowest cost per GB of data stored. The configuration of these backups takes place from within Nutanix's Prism interface.
In the public cloud, a single VM runs the Nutanix software and forms a single-node cluster. Administrators can replicate VMs to this cluster just as they do to any other Nutanix cluster.
Third-party vendors fill hyper-converged infrastructure cloud backup holes
VMware vSAN, a component of some HCI products, and HPE SimpliVity offer replication to make backups to remote clusters, but neither allows replication to public cloud. (The imminent VMware on AWS service will put a VMware vSphere cluster inside AWS, but it isn't really cost-effective just for backups.)
To fill the hole, there is an abundant collection of vendors that support hyper-converged infrastructure cloud backup. Examples include Druva, Vembu, Veeam and Zerto. These products all have user interfaces that integrate with vSphere, so they are fairly simple to operate. They are also not limited to HCI, so organizations that have not switched to all HCI may find it easier to use the same backup software for all VMs.
Hyper-converged backup: An emerging option
Public cloud support is a standard feature of hyper-converged backup (HCB), a relatively new category of product that applies HCI-like scale-out storage clustering to backup data. Rubrik and Cohesity, two HCB vendors, allow organizations to use public cloud either as a tier of storage or as a repository for disaster recovery (DR) copies of protected VMs. Like other third-party backup products, HCB can protect any VM and even some applications on physical servers, such as Microsoft SQL Server.
Hyper-converged infrastructure cloud data recovery challenges
Although hyper-converged infrastructure cloud backup is usually cost-effective, it can be expensive and time-consuming to get data back out of the cloud when restoring VMs or replicating VMs back after a DR failover.
AWS and Azure both allow organizations to upload data into their data centers for free, but they charge for every gigabyte of data downloaded from their clouds. Administrators should account for these costs as part of their backup and DR planning.
Another issue is the time it takes to transfer and restore VMs. Backups only transfer changed data, but restores transfer the whole VM. A 2TB VM will take a whole day to transfer over a 200 Mbps network link, so it may not be practical to restore multiple VMs from public cloud backup to an on-premises data center.