Essential Guide

The evolution of data center storage architecture

A comprehensive collection of articles, videos and more, hand-picked by our editors
Manage Learn to apply best practices and optimize your operations.

Data aware storage yields insights into business info

Storage isn't just a bunch of dumb disks anymore. In fact, storage infrastructure is smarter than ever.

This article can also be found in the Premium Editorial Download: Modern Infrastructure: Container technology thrives for IT:

Many people think that IT infrastructure is critical, but not something that provides unique differentiation and competitive value. But that's about to change, as IT starts implementing more "data-aware" storage in the data center.

When business staffers are asked what IT should and could do for them, they can list out confused, contrary and naïve desires that have little to do with infrastructure (assuming minimum service levels are met). As IT shops grow to become service providers to their businesses, they pay more attention to what is actually valuable to the systems they serve. The best IT shops are finding that a closer look at what infrastructure can do "autonomically" yields opportunities to add great value.

Advanced data infrastructure

Today, IT storage infrastructure is smarter about the data it holds. Big data processing capabilities provide the motivation to investigate formerly disregarded data sets. Technological resources are getting denser and more powerful -- converged is the new buzzword across infrastructure layers -- and core storage is not only getting much faster with flash and in-memory approaches, but can take advantage of a glut of CPU power to locally perform additional tasks.

Storage-side processing isn't just for accelerating latency-sensitive financial applications anymore. Thanks to new kinds of metadata analysis, it can help IT create valuable new data services.

In the past, metadata (i.e., data about data) primarily helped ensure ownership and secure access to important files. In more object-based archives, it helped enforce longer term data retention policies (keep for at least X years, delete after Y years). To learn anything else about masses of data, we often had to process it all directly. This became one of the motivations for the scale-out Hadoop/HDFS architecture.

Now our data is growing bigger and bigger -- more objects, files and versions, larger data sets, increased variety in structure and format, and new sources of data arrive daily. Instead of powering through the growing data pile every time we want to know something, IT shops can produce and keep more metadata about the stored data, a pre-distilled package to work from. 

Metadata aware

New forms of intelligent storage can automatically create more kinds of metadata, and then use that information to directly provide intelligent, fast and highly efficient data services. Some currently available storage options provide:

  • Fine-grained Quality of Service.  A host system can provide certain metadata at the object/file level that then directs the IT storage infrastructure (e.g., array, storage network, et al) to independently ensure delivery at a preferred level of performance. Oracle's FS1 array, for example, has a Dynamic QoS Tiering capability that tracks which data bits should receive priority service and flash acceleration on a file-by-file basis. With this detailed information, important files in Oracle databases and applications efficiently and automatically receive the best-aligned storage services for optimal database performance.
  • Fine-grained data protection. Metadata also ensures fine-grained data protection. For example, an evolving application-aware paradigm in virtualization environments is to provision storage at the VM level. If the storage array supports the hypervisor's APIs (e.g., Tintri, VMware VVols), storage receives metadata that it uses to provide and enforce per-VM storage policies, such as minimum RAID type or number of required copies.
  • Content indexing and search. For large data stores of unstructured "text-full" data, indexing all the content into a metadata database to power search capabilities lets you derive value out of data that might otherwise just occupy space. In the past, active archives performed searches on aging static data, but today's data can be indexed as it is ingested in to the primary storage. Examples include Tarmin GridBank, or the increase in use of search engines like Lucene/Solr (e.g., LucidWorks) across large data stores outside of any specific application development efforts.
  • Social media analysis. One can create metadata that tracks which user has accessed and/or edited each piece of data over time. Users can find out who in the organization took an interest in any given identified content, look for group collaboration patterns or generate recommendations based on content that other users have also accessed. For example, DataGravity's storage puts their high-availability passive secondary controller to work maintaining and serving this analysis of user/usage based metadata.
  • Active capacity and utilization management. IT admins can see deep into dynamic storage infrastructure behavior when metadata statistics include resource utilization metrics, client file operations, IOPS, and other system management metrics. Qumulo, for example, lets admins see what is actively going on in the storage system down to the file level. This makes it easy to see which files and directories are or have been hot at different times and which clients are hitting which sections of the file structure across billions of files. Data-aware storage actually helps analyze its own behavior and alignment to workloads.
  • Analytics and machine learning. With the growing compute power found in modern storage arrays, data processing and analytics tasks can be hosted directly in the array to run extremely close to data. As mentioned above, the idea driving Hadoop and HDFS is to localize processing (through massive parallelization) over big data sets. But emerging technologies bake intelligence in statistical analytics and even machine learning into a wider swath of common storage infrastructure. Much like search, advanced metadata extraction and transformation will allow users to automatically categorize, transform, classify, score, visualize and report on data simply by storing it in the right place (as a research example, see IBM Storage storlets). In the future, data may even tell us what we need to do with it.

Metadata can help make our infrastructure act intelligently about the data it's holding. Infrastructure that becomes both more data-aware and, in a sense, self-aware could help us stay on top of challenges. IT blindly storing bits with such data growth is a fool's game, when the real value of his new data is to gather as much information out of it as possible. 

Mike Matchett is a senior analyst and consultant at Taneja Group. Contact him via email at mike.matchett@tanejagroup.com.

Next Steps

Storage evolves from dumb data to data aware

Moving to all-flash? Think about your data storage infrastructure storage

Are all software-defined storage vendors the same?

This was last published in May 2015

PRO+

Content

Find more PRO+ content and other member only offers, here.

Essential Guide

The evolution of data center storage architecture

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

(From the author)
It's clear that data aware storage is one of The Next Big Thing(s) coming in the modern data center. No doubt that what I've outlined above will need to be updated and expanded often.
I should clarify a minor point about the bullet item I labeled "Social media analysis." The description is accurate, but when I wrote that bullet title I was being a bit cheeky - twisting the term to cover the analysis and insight one could get from studying how end users use the system (e.g. insight on collaboration, getting recommendations, spotting behavior...). Unfortunately I did not mean to imply that any storage systems today apply "sentiment analysis" today, which is how the term is usually applied (to user data and interaction history from sources like Twitter, Facebook, etc). On the other hand, I wouldn't bet against this being a future analytical option, as NLP and other computational linguistics are a natural step after the straight indexing of content.
Be sure to get in touch if you have or know of developments in this space!
Cancel

-ADS BY GOOGLE

SearchWindowsServer

SearchEnterpriseLinux

SearchServerVirtualization

SearchCloudComputing

Close