Big data challenges IT professionals with new choices on hardware, storage and other aspects of data center infrastructure.
Big data is unstructured information of unprecedented size and form. It consists of videos and graphics, as well as semi-structured data like email and texts, often on the Web. As the increasingly sensor-driven Web monitors mobile device output, available data will continue to grow exponentially.
Pressure to implement a big data strategy often comes down from upper management, based on the concern that businesses that use data effectively will outperform those that don't. Big data strategies require five major data center infrastructure changes.
1. Hardware that supports big data
Big data causes storage requirements to rise 60% to 80% each year. Given this rapid growth and today's cost constraints, IT buyers should choose hardware with the most cost-effective scalability and storage speed. Scale-up architectures like the mainframe see a resurgence because of their ability to scale cost-effectively and reduce total cost of ownership. Likewise, solid-state disks (SSDs) and tape are better than disk at improving speed.
The hardware appliance, such as IBM's Netezza and Oracle's Exadata, is proving a successful combination of scalability and speed for specific business tasks. Consider appliances for business-critical big data tasks, but verify the appliance's architecture can deliver rapid performance increases in the future.
2. Storage decisions around big data
With a successful big data strategy, businesses combine high-quality internal data with lower-quality data available via Hadoop from multiple cloud providers. This upgrades the quality of line-of-business data and makes big data available to decentralized locations in a consistent, timely fashion.
Big data is changing the basis for the decision between central warehousing and loosely coupled data marts, which are much smaller storage repositories that can either replace or feed a central data warehouse. Weigh the importance of high-quality data for central execs against the need to empower decentralized lines of business, like local offices or international subsidiaries.
New software technologies do much of the heavy lifting related to storage. Data virtualization software from Composite Software (newly acquired by Cisco) and Denodo gives the look and feel of a common database across all data sources inside and outside the organization, provides auto-discovery of data resources, and seeds a global metadata repository. Master data management software improves data quality by creating a common master record without the querying lag time of a data warehouse.
The need for out-of-enterprise connections to the Web increases reliance on public and hybrid clouds. Many larger businesses find that they need big data from multiple cloud providers but cannot depend on the providers to combine this data. Enterprises turn to management tools from data virtualization vendors to combine big data across multiple clouds.
3. A storage tiering strategy with SSDs
Storage is costly, and the faster it gets the more pricey it gets. On top of that, big data demands big storage capacity and big performance. Tiered storage mitigates total cost by providing several cost/performance options in the pool, ranging from high-price, high-performance solid-state storage devices, to conventional magnetic disk storage based on Serial-Attached SCSI. Adding a solid-state tier between main memory and disk helps keep performance high for big data tasks without letting storage costs get unmanageable.
SSDs obey the "90-10" rule of storage tiers: For the best combination of cost and speed, use about 10% SSDs and 90% disk. This strategy gives IT shops 10% higher storage costs than pure disk, but 90% more speed. Follow the same rule for main memory to SSD ratio.
SSDs may be on a faster price-performance improvement path than disks, which foretells an 80-20 rule in the near future.
The newest columnar and in-memory databases, like IBM BLU Acceleration, will gain better performance from SSDs rather than disk because they have been designed to effectively take advantage of SSDs as "flat disk."
4. What supports big data analytics and reporting
Embedded analytics have improved business processes via reporting and automated tuning, but big data changes the analytics rules again. For example, rather than delivering one main insight into a customer, big data strategies create a process of iterative insights that track and establish better long-term relationships with the organization's customers.
The typical practitioner of big data analytics is called a data scientist and is more likely to be found with the CMO than the IT director. However, IT pros must be aware of how their company's big data strategy affects the job of the data scientist.
This means adding a third consideration beyond reporting and embedded analytics: ad hoc and loosely coupled analytics. The software prerequisites for this are analytics and statistical tools that support ad hoc querying. Many traditional IT vendors, as well as cloud ones -- such as IBM, Cognos and Birst -- are adding these capabilities.
5. Hadoop in the enterprise
Hadoop provides a distributed file system "veneer" to the MapReduce file system handler framework for data-intensive applications. It allows parallel scaling of transactions against rich-text data, such as social media data.
Many IT shops solve the problem of inhaling Hadoop-accessed data from the Web by creating their own versions of Hadoop within the enterprise. However, lack of expertise is a challenge: Few IT administrators are versed in the art and science of this constantly evolving Web data management framework.
Organizations developing their own data management tools should be aware that major vendors, such as IBM, Oracle and EMC, often either offer proprietary products for accessing Hadoop data or a customizable approach that lets IT shops implement access without needing a dedicated organization. If you decide to create your own setup, the vendors also offer veneers that make Hadoop work better with existing IT resources.
Decisions surrounding big data will be different for each organization. Keep in mind that the strategy can change as the technologies surrounding big data evolve.
About the author:
Wayne Kernochan is president of Infostructure Associates, an affiliate of Valley View Ventures. He has been an IT industry analyst for 22 years.