Big data means big problems -- companies rapidly increase the volume of data stored and processed in the data center...
and scattered about on user devices.
Internet of Things initiatives also collect mass quantities of information from sensors and connected devices. The global big data storage and server market shows a compound annual growth rate of 31.87% from 2012 to 2016, according to market research firm TechNavio.
This data sprawl requires a concerted effort by the data center team to catalog, categorize and contain business information. All enterprises need to take a disciplined approach to data management.
The data you have
Getting organized is the first step toward better enterprise data management. IT must first determine what information the business generates or collects -- often a difficult process. The data center team must conduct an enterprise-wide inventory and identify data residing on central servers, desktops and mobile devices. Make sure top executives back your data management initiative. Users know more about departmental data than IT does, but pay little to no attention to technology issues. Expect only superficial employee input unless executives mandate more effort.
You will likely find data duplicates as you inventory the business's information, such as a single customer's record stored in six different applications. Usually, IT discovers the data is formatted differently in each instance, making it tedious or impossible to move one copy between systems. Data consolidation and standardization is a time-intensive and expensive process. As soon as the IT team gets standard connections in place between applications, new uses for its data emerge, and the process begins anew. Create connections only in places where information flow is vital, and leave the duplicates alone.
Next, businesses need to categorize their data by function and application. This process provides the data center's capacity planners with a big picture to correlate data usage and protect the firm's actual needs. There are no absolutes when assessing data value. Try ranking the potential impact to the business if the data becomes compromised or lost. Convene a committee of technical, management and business unit executives to develop the rankings.
Where data goes
Once information is categorized, set up data tiers. Every firm has a set amount of money allocated to data storage, so not everything will receive top-level storage features.
The number of possible choices in high-performance storage arrays is growing. Flash storage is faster than disks in terms of data access, but it comes at a premium: 20% to 100% higher cost than other storage options. Flash proves to be economically unviable in some cases and architecturally unfounded in certain scenarios. Loading all company data into flash can reduce performance and extend response times for mission-critical applications.
In addition to flash, servers are another high-performance storage option. In a server storage area network (SAN) solution or a hyper-converged virtual SAN, the server performs processing functions as well as storage manipulation. Dell PowerEdge and Fujitsu PRIMERGY high-end servers rely on software-defined storage to create systems where the whole storage stack runs on the server.
Another option is special-purpose storage systems. Here, the storage system autonomously runs deduplication and backup processes. Theoretically, these systems help system administrators by reducing the amount of work to configure, retain and back up data. Better performance is another potential benefit: The bandwidth needed to dedupe and replicate data decreases because it is not sent from the storage system to the server and back again; it remains local.
How to manage it
Integration is an important issue in the data management process. To set up tiers, a company must have data storage management software capable of moving information among different hardware systems. Modern IT organizations are rarely willing or able to standardize on one application platform. A data solution, therefore, needs to support multiple platforms, such as Linux and Windows, as well as VMware and Microsoft Hyper-V virtualization, with data protection. Standards allow information to flow among the various storage and processing systems. IT is able to store, relate, classify and search for data across the enterprise only when those pieces are in place.
Take the time to identify and manage storage system interconnections to prevent data sprawl. Mission-critical applications frequently connect to numerous feeder systems. For example, in tiered storage, a tape system may feed infrequently accessed customer information to a solid-state drive system, which in turn moves the data to the main system for processing. The application can only complete its work if the SSD system and its tape-based feeder system integrate with each other without creating a bottleneck or error-prone workaround step. This chain of storage and processing connections could complicate troubleshooting app problems, which is a common downside of today's increasingly virtualized data center.
High priority information should reside on high availability (usually more expensive) storage systems. The options range from inexpensive, easy to deploy tape systems through to continuous backup systems that copy mission-critical data in near real time. While there has been talk about tape becoming obsolete, vendors continue to improve the medium and it provides a cheap option for regulated industries with archiving demands.
Enterprises typically store backup copies of data off site. While the traditional method is auxiliary data centers, often located far from the primary facility, cloud-based off-site storage is emerging for back up.
About the author:
Paul Korzeniowski is a freelance writer who specializes in data center issues. He has been writing about technology for two decades, is based in Sudbury, MA and can be reached at email@example.com.
Why data management skills are vital for CIOs
User and expert voices on data management processes