Victoria - Fotolia
Small World Big Data
Published: 11 May 2015
Many people think that IT infrastructure is critical, but not something that provides unique differentiation and competitive value. But that's about to change, as IT starts implementing more "data-aware" storage in the data center.
When business staffers are asked what IT should and could do for them, they can list out confused, contrary and naïve desires that have little to do with infrastructure (assuming minimum service levels are met). As IT shops grow to become service providers to their businesses, they pay more attention to what is actually valuable to the systems they serve. The best IT shops are finding that a closer look at what infrastructure can do "autonomically" yields opportunities to add great value.
Advanced data infrastructure
Today, IT storage infrastructure is smarter about the data it holds. Big data processing capabilities provide the motivation to investigate formerly disregarded data sets. Technological resources are getting denser and more powerful -- converged is the new buzzword across infrastructure layers -- and core storage is not only getting much faster with flash and in-memory approaches, but can take advantage of a glut of CPU power to locally perform additional tasks.
Storage-side processing isn't just for accelerating latency-sensitive financial applications anymore. Thanks to new kinds of metadata analysis, it can help IT create valuable new data services.
In the past, metadata (i.e., data about data) primarily helped ensure ownership and secure access to important files. In more object-based archives, it helped enforce longer term data retention policies (keep for at least X years, delete after Y years). To learn anything else about masses of data, we often had to process it all directly. This became one of the motivations for the scale-out Hadoop/HDFS architecture.
Now our data is growing bigger and bigger -- more objects, files and versions, larger data sets, increased variety in structure and format, and new sources of data arrive daily. Instead of powering through the growing data pile every time we want to know something, IT shops can produce and keep more metadata about the stored data, a pre-distilled package to work from.
New forms of intelligent storage can automatically create more kinds of metadata, and then use that information to directly provide intelligent, fast and highly efficient data services. Some currently available storage options provide:
- Fine-grained Quality of Service. A host system can provide certain metadata at the object/file level that then directs the IT storage infrastructure (e.g., array, storage network, et al) to independently ensure delivery at a preferred level of performance. Oracle's FS1 array, for example, has a Dynamic QoS Tiering capability that tracks which data bits should receive priority service and flash acceleration on a file-by-file basis. With this detailed information, important files in Oracle databases and applications efficiently and automatically receive the best-aligned storage services for optimal database performance.
- Fine-grained data protection. Metadata also ensures fine-grained data protection. For example, an evolving application-aware paradigm in virtualization environments is to provision storage at the VM level. If the storage array supports the hypervisor's APIs (e.g., Tintri, VMware VVols), storage receives metadata that it uses to provide and enforce per-VM storage policies, such as minimum RAID type or number of required copies.
- Content indexing and search. For large data stores of unstructured "text-full" data, indexing all the content into a metadata database to power search capabilities lets you derive value out of data that might otherwise just occupy space. In the past, active archives performed searches on aging static data, but today's data can be indexed as it is ingested in to the primary storage. Examples include Tarmin GridBank, or the increase in use of search engines like Lucene/Solr (e.g., LucidWorks) across large data stores outside of any specific application development efforts.
- Social media analysis. One can create metadata that tracks which user has accessed and/or edited each piece of data over time. Users can find out who in the organization took an interest in any given identified content, look for group collaboration patterns or generate recommendations based on content that other users have also accessed. For example, DataGravity's storage puts their high-availability passive secondary controller to work maintaining and serving this analysis of user/usage based metadata.
- Active capacity and utilization management. IT admins can see deep into dynamic storage infrastructure behavior when metadata statistics include resource utilization metrics, client file operations, IOPS, and other system management metrics. Qumulo, for example, lets admins see what is actively going on in the storage system down to the file level. This makes it easy to see which files and directories are or have been hot at different times and which clients are hitting which sections of the file structure across billions of files. Data-aware storage actually helps analyze its own behavior and alignment to workloads.
- Analytics and machine learning. With the growing compute power found in modern storage arrays, data processing and analytics tasks can be hosted directly in the array to run extremely close to data. As mentioned above, the idea driving Hadoop and HDFS is to localize processing (through massive parallelization) over big data sets. But emerging technologies bake intelligence in statistical analytics and even machine learning into a wider swath of common storage infrastructure. Much like search, advanced metadata extraction and transformation will allow users to automatically categorize, transform, classify, score, visualize and report on data simply by storing it in the right place (as a research example, see IBM Storage storlets). In the future, data may even tell us what we need to do with it.
Metadata can help make our infrastructure act intelligently about the data it's holding. Infrastructure that becomes both more data-aware and, in a sense, self-aware could help us stay on top of challenges. IT blindly storing bits with such data growth is a fool's game, when the real value of his new data is to gather as much information out of it as possible.
Mike Matchett is a senior analyst and consultant at Taneja Group. Contact him via email at email@example.com.
Storage evolves from dumb data to data aware
Moving to all-flash? Think about your data storage infrastructure storage
Are all software-defined storage vendors the same?