BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Big data and Internet of Things projects introduce new stresses on IT infrastructure than the workloads that came before them.
Internet of Things (IoT) and big data applications stress network and storage infrastructure, not to mention the IT experts that must use different skills and tools to manage these deployments.
They're a challenge to execute, but there are some guiding principles for IT teams that take on IoT and big data hosting. It starts with scrutinizing the infrastructure requirements of the large-scale, data-intensive project.
More processing power
Once the scope of the project comes into focus, it will be possible for an IT organization -- in conjunction with its hardware, software and services suppliers -- to identify the appropriate system architecture and operating system, the number of processors per system, and the number of systems -- physical, virtual and cloud-based -- the initiative will need.
Big data projects are often based on Windows or Linux OS, executing on industry standard x86-based servers. In other situations, there are useful tools based on mainframes or single-vendor system architectures and operating systems. For the most part, the IT team will cluster industry standard servers using a scale-out architecture, as a way to support workloads that make heavy use of processing, memory, network and storage.
IoT-based projects tend to also include back-end systems based on single-vendor midrange systems as well as mainframes.
To maximize the available processing power while minimizing an overall investment in hardware, properly configure the systems, clusters and other components. This requires an understanding of what the organization wishes to do and an in-depth understanding of the big data tools and NoSQL databases selected for the project. A similar understanding will inform the selection of tools to communicate with a constellation of smartphones, tablets, automobiles and an ever-enlarging set of other intelligent devices.
An improperly configured server cluster, or other infrastructure blunder, can hinder the operation of these projects and cause a failure -- even when the proper tools are selected.
Some back-end data analysis and reporting tools operate on one large cluster of systems. Others are supported by several smaller clusters: one to support the data store containing the raw data for analysis, another to support the tools to process the raw data into useful information. Another cluster may be needed to support the reporting tools that can transform the useful information into the proper form -- tabular, graphical or other format -- for analysts or data scientists.
IoT-based projects also add the element of reaching back to the client device to offer requested information, guidance or support. An organization will need expertise in each of these tools as well as a complete understanding of how it plans to use the tools.
Invest time with trusted advisors and suppliers during this process to learn what is required to properly support the tools and approaches chosen.
Memory, storage and networking concerns
Just adding more systems, memory and storage does not always ensure better overall performance for IoT and big data deployments. As they did with processing power, different approaches and tools require different amounts of system memory.
Each approach, and its related set of tools, has limitations. IT planners putting together the IoT and/or big data platform must research the resources each tool under consideration requires and what they will use if the resources are available.
If a company installs more memory than the selected tools can use, it may serve only to increase power consumption and heat production without any improvement to overall performance, putting additional and unneeded stress on the data center power and cooling systems.
Another element in the IoT and big data formula is storage performance and capacity. Like processing power and memory capacity, the selection of storage devices, the dedicated capacity and how storage is networked contribute to the optimal performance of a big data product. In the case of IoT technologies, responsiveness, or the lack of it, can drive customers to either love or loath a company.
As with memory and compute components, the storage configuration must match the requirements of the approach and tool set selected. Don't expect simple payoffs by adding more storage, selecting faster devices or upgrading the storage area network. Even if storage performance increases, the upgrade may offset the gain by causing network bottlenecks.
Some big data tools use excess memory capacity as part of the data store creating an in-memory database. This approach can accelerate processing of analysis and reporting. There is a tradeoff here: If the systems are not protected by a reliable power source, this data can be lost if the power fails.
Don't get caught up in the hype surrounding any single type of storage or storage area network. Analysts will point out that memory-resident databases or flash storage won't answer every question.
A few storage virtualization software suppliers, such as DataCore Software, would note that the underlying OS might process only a single I/O request at a time. This vendor's approach is to add software that makes it possible for the system to execute multiple requests in parallel.
What is clear is that an underprovisioned or badly designed storage subsystem will degrade the effectiveness of a big data or IoT system.
Network infrastructure is critical to any distributed or clustered computing tool. Its capacity, latency and performance can facilitate or hinder this type of technology. As with the processing, memory and storage subsystems, network infrastructure must be selected with care.
If the network doesn't have enough capacity, responds slowly or is biased toward small or large I/O requests when the big data tool needs something else, performance will suffer. The same can be said of networks that aren't set up to handle the small, bursty data requests made by intelligent devices in an IoT system. Balancing both types of requests can be challenging.
I saw one project that attempted to capture and then analyze millions of small mobile device messages -- an early IoT project. That company learned its network wasn't fast enough to handle the load, as it had been set up to manage bulk data transfers rather than millions of tiny data requests.
Take on IoT-based projects with serious infrastructure
Supporting fast and big data architectures