"Big data" opportunities require equally big computing power to store, organize, process and report findings. This emerging field is changing the way that data center servers and other infrastructure are selected and deployed.
To compete and succeed in today's business climate, a company must make business decisions by analyzing a rich assortment of available data. Analyzing this growing wealth of big data can expose important trends and potential opportunities.
The SearchDataCenter Advisory Board explains how the influx of big data for business purposes is changing the makeup of the enterprise data center and offers a unique perspective on new big data opportunities.
Just add SAN
Sander van Vugt, independent trainer and consultant
Big data is not really an issue. What I mean is that data centers have not suddenly changed their approach to handling large amounts of data with the advent of big data.
The approach that I see is rather simple: Just add another storage area network (SAN) that is more scalable than the previous SAN setup. This means that enterprises can start dealing with data on two different storage networks: one for the key data they're using and one for the not-so-important data that still needs to be stored.
Appliances bring big, big data opportunities
Clive Longbottom, co-founder and service director of IT research and analysis firm Quocirca
We are still at the thin end of the wedge on true big data in the enterprise.
Currently, data centers use storage virtualization to federate database sources. More modern approaches to handling big data for business intelligence (BI) are from Pentaho, Logi, QlikTech and Birst. Hadoop, the Java-based programming framework, is used by those more leading-edge enterprises as a non-persistent filter to handle multiple data types. NoSQL databases, such as MongoDB and Couchbase, then provide a third leg to the stool for persistent storage of less-structured data. Management tools such as Splunk help deal with machine-to-machine and log file data.
Each of these tools requires its own infrastructure for support and careful design to get the desired results. Analytics as a Service providers are emerging, offering BI in the cloud capabilities -- many organizations will end up moving toward this direction to avoid the complexities of a hybrid environment. IBM, Teradata, EMC and other vendors offer hybrid appliances for those businesses that want to keep all data on-site and suck in extra information from outside sources. Hybrid appliances deal with structured and less-structured data feeds in a more engineered manner than current big data infrastructures, but with a considerable price tag.
Selecting servers, storage and structure
Stephen J. Bigelow, senior technology editor
The tools used for big data analysis, such as Hadoop and MapReduce software, distribute tasks and gather results from thousands of nodes (processors).
The highly scalable task distribution scheme used by this software differs radically from traditional single-thread execution, meaning that big data servers are typically the largest and most powerful systems available. Expect individual big data servers to include the largest practical number of processor cores, as in Intel's Xeon E7-8800 v2 processor with 15 cores per socket and hyperthreading. Data centers will group these powerhouse servers together into dedicated racks for big data processing.
Reduced instruction set computer processors are another option for dedicated big data servers, offering a huge number of processor cores while using relatively little energy and producing far less heat than conventional x86 processors. Dell developed the Zinc server based on Calxeda ARM chips for enterprise applications, for example.
While more processor cores need additional memory space to handle calculations and store results, big data focuses on computational tasks, so the total memory in a big data server will rarely exceed several hundred gigabytes. For example, each server in Hewlett-Packard's ConvergedSystem 300 appliance for the Vertica Analytics Platform has 128 GB of memory, and IBM's System x reference architecture for Hadoop calls for servers with up to 384 GB of memory.
Big data servers may also integrate a graphics processing unit, such as NVIDIA Corporation's Tesla K40, since GPUs are designed specifically to handle complex mathematical functions like double precision floating point calculations to as much as 1.4 Tflops. A significant amount of math can be offloaded from individual processors to a GPU, without imposing on system memory.
Any evaluation of big data computing platforms must consider other infrastructure elements like networking and storage. Multiport network interface cards help servers distribute the workload. Shifting from gigabit Ethernet to 10-gigabit Ethernet is often justifiable to take advantage of big data opportunities. There must be enough switch ports available (1 GigE or 10 GigE) to accommodate all of the ports provided by big data servers. In addition, IT architects may choose to spread out each server's ports across multiple independent switches for a more robust and available environment. Data centers should budget for more and newer network switches.
Hadoop and other big data applications commonly boost performance by using local storage close to individual processors (rather than shared storage). Spreading out the storage tasks among many disks runs many individual spindles, minimizing storage latency. Also investigate replacing conventional magnetic disks with solid state disks or even faster PCI Express-based solid state accelerator cards.