Although the concept of artificial intelligence has been around since the '50s, its mainstream application in IT is just beginning to surface. By 2021, workloads such as deep learning and AI will be important factors for data center designs and architectures, according to Gartner research.
AI applications will affect every single vertical and industry, so it's important to take proactive steps to plan, architect and nurture deep learning and artificial intelligence practices in your data center.
Most organizations aren't implementing AI just yet. For the most part, hyperscale public cloud providers such as Google and Amazon Web Services are among the early adopters, while the vast majority of end users struggle to get started.
"Because it's such a moving target, it's hard to start developing practices toward enabling AI and deep learning environments," said Chirag Dekate, research director at Gartner. "The idea is phenomenal, but when you start developing and designing solutions, you start running into problems. And that's where a lot of the end users are."
Deep learning and AI applications require large volumes of data to train, test and validate neural network algorithms, which can present storage challenges for data center administrators.
"If your machine learning algorithms are more regression-based, you can use a limited data set. But for more advanced, high-value neural network ecosystems, you start running into problem of scale," Dekate said. Traditional network-attached storage architectures deliver immediate results with easy deployment and out-of-the-box efficiency, but they also present scaling issues with I/O and latency.
Christian Perryresearch director, 451 Research
Some startup companies are exploring high-bandwidth parallel file systems to both increase throughput and enable scale, but those are the outliers, Dekate said.
Parallel file systems involve many moving parts from metadata servers to storage targets that must be tweaked, tuned and debugged to run at peak efficiency. "[Parallel file systems] are extremely complex and are not for the faint of heart," he said.
However, big data analytics -- another initiative that requires large volumes of data -- has already provided a platform for many IT organizations to readjust storage strategies.
"By the time that AI becomes a deployable reality in enterprises, the capacity from a storage perspective will already be solved … because of big data and analytics," said Christian Perry, research manager at 451 Research. "The internet of things is also expected to drive massive capacity at certain organizations. I think the infrastructure will already be at a place to handle large storage requirements [for AI]."
The limited nature of deep learning frameworks creates scalability challenges -- and for networking architectures that can scale, performance declines dramatically beyond a single compute node. To deliver higher efficiency at scale, admins must upgrade and improve their networks, but most haven't made this move a top priority yet.
"If you look at the deep learning algorithms, they're extremely communication-intensive," Dekate said. "Trying to architect solutions for such a chatty application stack is going to be very hard for organizations to get their heads around."
As data center networking architects prep their infrastructures for AI they must prioritize scalability, which will require high-bandwidth, low-latency networks and innovative architectures, such as InfiniBand or Omni-Path.
The key is to keep all options open for automation, Perry said. The market is quickly maturing with automated data center infrastructure management technologies, a sign that automation is becoming more widely accepted in data centers.
"Once more automation features are in place … this will help set the stage for the introduction of AI," Perry said.
The compute side of the data center presents very different challenges to implement AI applications. A CPU-based environment can handle the vast majority of machine learning and AI workloads, from random forest regression to clustering. But once IT dives into deep learning capabilities, which require traversing multiple large data sets and deploying scalable neural network algorithms, a CPU-based ecosystem may not be enough. To deliver the compute capabilities, IT must integrate technologies such as NVDIA GPUs, Advanced Micro Devices' GPUs and Intel's Xeon Phi.
"You need hybrid or heterogeneous architectures where your core processors are complemented by special-purpose accelerators to deliver greater compute densities and throughput for your applications," Dekate said.
Implementing GPUs also enables admins to optimize the data center infrastructure for power efficiency. When admins scale GPU-based ecosystems beyond a single node in particular, they can become more power hungry.
Hyperscale providers such as Google recognize this need; the company's AI-powered subsidiary DeepMind reduces the energy required to cool its data centers by 40%. But nearly all enterprise data centers in the broader market lack Google's resources and won't be able to replicate this model to solve efficiency issues.
For most enterprises with traditional ecosystems, implementing these innovative technologies is not only complicated -- it's expensive. The latest Xeon Phi chip, for example, comes with a staggering price tag of $6,294 -- Intel's most expensive chip to date. And IT teams that want to integrate deep learning capabilities won't just need one chip; they'll need high densities of accelerator cards. These high-density compute configurations are used in hyperscale environments, healthcare organizations, financial services and beyond.
"We have seen high densities -- almost 2 CPUs to 8 GPU ratio densities," Dekate said. "That means that one server unit in this environment could cost as much as $150,000 for one server node."
There are ways to mitigate the high price tag of these technologies. Many organizations use the public cloud and, in some cases, IBM Watson, to test out reliability of AI applications before making any deep on-premises commitment.
Additionally, the timeframe for server refresh has extended far beyond the traditional three-year refresh timeframe, Perry said. Now, many organizations refresh servers every five to seven years. As a result, their IT budgets are also extended and can be applied to a pricier infrastructure that accommodates what exists on-premises.
"We've already seen this happen with converged infrastructure, and it's happening with hyper-converged infrastructure," Perry said. "Yes, these are very expensive entry points, but the transformation is well worth the cost."
How data centers use GPUs for AI
Explore features of smart data centers
AI makes data center storage "smarter"