This content is part of the Essential Guide: Break down the challenges, benefits of hyper-converged infrastructure

All-flash HCI storage requires new management tools, techniques

Although they bring performance benefits to the data center, all-flash HCI systems require IT teams to shake up their storage management and analytic tool sets.

Since hyper-converged infrastructure is relatively new, vendor-driven constraints typically limit drive configurations to a single model. However, this is changing as a result of the Lego-like nature of commercial off-the-shelf designs.

For example, hybrid hyper-converged infrastructure (HCI) nodes continue to hit the market, with either primary, fast solid-state drives (SSD) and slow, secondary hard disk drives (HDD) or, alternatively, all-flash configurations with both fast and slow SSDs. But before deployment, it's important for admins to understand the potential management challenges associated with these emerging HCI storage models.

SSDs vs. HDDs

To put all-flash HCI storage in perspective, it's important to first understand the differences between SSDs, which use flash memory, and HDDs. HDDs are slower than SSDs, usually by large factors at the drive level. This means that, even though flash is currently around two times the price of HDDs for equal capacities, the reduction in server count to handle the workload is more than enough to offset the additional cost. This makes SSDs, in many cases, the preferred option for enterprise server storage.

To put all-flash HCI storage in perspective, it's important to first understand the differences between SSDs, which use flash memory, and HDDs.

There are currently two major classes of SSDs: nonvolatile memory express (NVMe) ultrafast SSDs, which achieve as much as 10 GBPS and 4 million IOPS, and Serial Advance Technology Attachment (SATA) SSDs, which can achieve 400K IOPS and 500 MBPS bandwidth.

NVMe drive premiums are less expensive than they were in the past, at least in the miniature M2.0 form factor. Expect the premium prices to continue to drop across the board. In the future, drives will divide into tiers of fast and slow SSDs, although drives in both of these tiers will beat HDDs.

Challenges with all-flash HCI storage

On the surface, there is no real difference between hybrid HCI nodes with primary, fast SSDs and slow, secondary HDDs vs. the all-flash configurations with both fast and slow SSDs. In both cases, one would think that the fast drives handle compute and the slow tier stores cold data. Fast drives would also handle I/O, so the slow or secondary drives don't get in the way.

The problem, however, is that this isn't always true. HCI creates a pool of storage that any server in the cluster can access through a Remote Direct Memory Access process. Slow accesses create long delays whenever an app calls for cold data within HDD secondary storage. SSD cold storage would still reduce latency per I/O to 50 to 100 microseconds vs. the 10 milliseconds of a SATA HDD. In other words, SSD makes sense for secondary HCI storage, as well.

However, all of these SSD I/Os create management problems. First, everything happens faster, so the metrication process has to speed up. NVMe enables this by concatenating statuses together using a queue system, which significantly reduces system interrupts and state swapping.

Still, IT needs new analytics approaches to handle the rate of operations. The age of manual event log analysis is gone, and storage analytics tools, such as those from Tegile or Enmotus, are essential. This is particularly true in HCI systems, since I/Os to a given drive originate in any server in the cluster.

Take this quiz to see what you know about hyper-converged technologies

Hyper-converged and converged systems continue to make their way into the data center. Test yourself on the current market and major differences between the two technologies.

It is also necessary to have a tool that can discern bottlenecks or failures, including virtual LAN chokepoints; VM issues, such as an instance that holds up other instances; failed software threads; and any app instance issues, such as problems with a part of the data set. It's important to monitor drive failures but also to look for warning signs of workload overload to keep the cluster in top-notch condition.

This is also where HCI storage analytics, while still in its early days, will play a role. For example, analytics programs can provide minable big data for overall HCI workload optimization. In the future, storage analytics could lead to a toolkit that interacts with cloud stacks or orchestration modules to self-tune systems and anticipate and bypass failures.

There are other, longer-term implications of all-flash HCI storage. Nodes will physically shrink in size, since even the large 100 TB SSDs will come in 2.5-inch form factors. Power footprints will drop, too -- SSDs could use 10% of the power of an HDD -- which will lead to quieter HCI systems.

Next Steps

Steps to evaluate an HCI appliance

Weigh the pros, cons of an HCI deployment

Red Hat rolls out software-only HCI system

Dig Deeper on Converged infrastructure (CI)