The evolution of data center storage architecture
A comprehensive collection of articles, videos and more, hand-picked by our editors
A new generation of hyperconverged infrastructure is challenging the data center model of standalone servers networked to a scale-up storage array.
Virtual storage area networks (virtual SAN, server SAN or SAN-free storage) use locally attached flash and hard disk drive storage.
This hyperconverged trend dovetails with virtualization, where traditional storage such as Fibre Channel or iSCSI is problematic, said David Friedlander, senior director of product marketing at Panzura, a cloud storage company. "Traditional SANs were bad for virtualization," he said.
SANs were designed for environments where you know what to expect from I/O patterns, but virtual environments are characterized by random I/O, and arrays are quickly overwhelmed. Flash has alleviated the problem, but increasingly, Friedlander said, "the storage industry is moving away from monolithic storage arrays."
One of the defining features of hyperconverged systems is their reliance on commodity parts, both CPUs and disk drives. CPUs in particular are so powerful that it makes sense to leverage them for both compute and storage processing, said Arun Taneja, founder of the storage-centric analyst firm Taneja Group.
"There's so much surplus CPU today, and you don't need more than 20% of it to run the storage layer," he said. With that, you can use the remaining CPU to run VMs.
Using locally attached disk drives, meanwhile, eliminates the cost and latency overhead associated with accessing storage over a network. "Direct connections over SCSI or PCI are always going to be the fastest," Taneja said. "If you think about it, the only reason we went to SAN in the first place was because DAS was too limiting."
Hyperconverged storage could put a serious dent in external disk sales, Taneja said. If the market continues to grow like it has, back-of-the-envelope calculations suggest that hyperconvergence could consume 30% of the external storage array market within three years, he said.
But low-cost commodity disks drives alone don't provide the I/O performance needed for most virtualized workloads. That's where super-fast flash and solid state drives come in.
In recent years, the price of flash has dropped dramatically. Plus, it's no longer limited to the most demanding workloads. "It's gotten to the point where [flash] is viable for more use cases," said Yoram Novick, CEO at Maxta.
Today, virtually all hyperconverged players make heavy use of flash technology. In fact, one hyperconverged player, GridStore, forgoes spinning disk drives altogether and relies exclusively on flash for its storage capacity. Data-efficiency techniques such as deduplication and compression help it achieve more useable capacity.
Without scale-out clustering technology, all the benefits of low-cost storage, high-performance flash and direct connections would be lost; the capacity couldn't be shared. Clustering multiple nodes together into a virtual SAN means that capacity can be shared by multiple servers, while replication across nodes ensures data availability.
Indeed, the scale-out model has been having a moment, mainly in hyperscale compute environments. In the storage space, scale-out architectures figure prominently in several network attached storage (NAS) designs, but the use of scale-out for block storage is a relatively new phenomenon.
"As an industry, we haven't enjoyed scale-out on the SAN side as we have on the NAS side," Taneja said. But with the exception of the bottom end of the market, where capacity needs can be met by a couple of disk drives, "eventually, everything in compute and storage will be scale-out," he predicted.
Scale-out has tremendous appeal to IT shops, allowing them to start small, and then grow big without much initial capital or trouble.
"The thing about scale-out is that it separates out the hardware maintenance from the system presented above it," said Andrew Warfield, CTO at Coho Data, a scale-out storage startup.
If there a catch to hyperconvergence, it might be in scalability. While many hyperconverged vendors advertise eye-popping scalability, those claims are hard to believe. For example, EMC's VCE division announced its VxRack System in May 2015, based off its ScaleIO acquisition. It says the system can scale to "many thousands of nodes." Not only is such a claim difficult to verify, but there may be practical limitations to how big of a cluster you want to build. "For argument's sake, let's say a system scales to 120 nodes. A customer isn't going to get near that," Taneja said. "For business reasons, they're going to build a 30- or 40-node cluster."
Nor is hyperconvergence necessarily the best fit if what you're looking for is extreme capacity. "If you need to store petabytes of data, you're not going to use a hyperconverged tool -- or at least, you shouldn't," said Jason Collier, CTO at Scale Computing. He believes that there will always be a need for capacity-centric storage, although it might not look like the NAS platforms that dominate the space today. "It could be Hadoop, or shared object storage," he suggested.
But hyperconverged players have started to address another knock on their systems: the need to scale storage and compute together symmetrically. That was true of the early days of hyperconvergence: If you needed more capacity, you had to buy a full node, even if you hadn't maxed out your CPU. Hyperconverged vendors have gotten the message, and offer different sized nodes.
San Mateo County in California, which uses a Nutanix hyperconverged infrastructure cluster, recently added storage-heavy NS-6000 units. The county's IT organization has retired various legacy servers and storage systems. "We didn't need any more compute," said Erik Larson, the storage and virtualization architect for the county.
Explore the role of HCI in private clouds