One of the hottest items on the IT agenda is the implementation of a hybrid cloud model, the joining of on-premises...
cloud operations to public clouds. Most data center teams find that this blending of technologies has its challenges, including cloud data transfer. These issues are due both to the immaturity of the tools and the rapid evolution of the cloud itself.
Several motivations drive IT teams to split cloud efforts over private and public clouds. Foremost is the idea of cloud bursting, which involves the use of the public cloud as an on-demand resource when the workload gets heavy. Other considerations are economic. The public cloud lowers capital expenses and may even cost less than private operations.
A burst in activity
The idea of starting additional instances on a public cloud provider such as Amazon Web Services is attractive, since it allows the data center to size capacity for average workloads rather than overprovisioning for peak load -- plus some margin. With peaks often reaching 20% to 50% or even more of the average capacity usage, the public cloud allows for significant amounts of capital expense avoidance, especially if the peaks occur relatively infrequently.
An organization will like these savings, as well as the flexibility of having a vast pool of cheap computing literally seconds away. Still, the details matter.
As data sets increase in size, they've developed a huge inertia. Moving terabytes takes time, and, if we have to move data from the private cloud in bulk to feed those instantly available public instances, we are already in trouble.
The major reason for a slow cloud data transfer is that U.S. and, to a lesser extent, European telcos have been slow to roll out fiber. We simply don't have the necessary bandwidth for a fast cloud data transfer. This all but rules out the model of loading a data copy into the cloud each time you want to burst to it.
One of the most interesting use cases for the hybrid cloud model is data volatility. Here, we have to decide if the data sets can be duplicated in the two clouds, and, if so, how to keep the multiple copies coherent. This is a difficult challenge, since any traditional lock process carried out remotely is bound to be slow.
The issue is data. If most of the data in the database is nonvolatile -- historical data, for instance -- and much of the rest is slow-changing (e.g., price lists, descriptions), the IT team may create a multisite database operation where the only slow operations are those that require coherence of the copies, such as inventory availability. The IT organization could keep other data elements asynchronously coherent, or even just stored and retrieved directly from the in-house copy. Mapping where these elements are best stored is a major chore, but it clearly is the best answer with wide-area network (WAN) connections running just faster than carrier pigeons.
There are some other hybrid models worth considering. The first is to colocate the data in a telco or other facility with fast links to public clouds. This approach gets data much closer to the public cloud, but the private cloud side suffers from having a slow WAN connection versus the 10 Gigabit Ethernet or 40 GbE local area networks of public clouds.
The security question
Going deeper, the shift of part or all of a private cloud to a well-connected colo largely solves the performance issue of the hybrid cloud model. You'll need to be confident that you can maintain data security. The generally weak state of encryption is an issue, since it is preferable to have all data encrypted both in transit and at rest.
Without hardware acceleration, encryption is too expensive in performance terms and not used often enough. An exception is when encryption at rest is achieved using disk-based encryption, mainly because it scales and occurs at the storage endpoint. While no latency penalty is incurred on writes, this has been shown to be unsafe and isn't recommended. Server-side encryption in a colo environment is a necessity.
You'll still face a bottleneck between the public and private clouds in the colo model, especially for data updates. With a lock and a cloud data transfer, the two tenths-of-a-second latencies add up fast. One solution is to colocate part of the private cloud into the public cloud, either as an explicit colo managed by the service provider or as a long-term, quasistatic contract for servers.
With the heavy investment in security the major data center providers make, this is a safe approach. And with the cloud price wars continuing, it could also be the most economical answer.
How to find the colo that meets your needs
Cover your bases with hybrid cloud knowledge
Know what you're getting into with hybrid management