There are three main considerations for adding high-end graphics processing units to servers: application suitability,...
installation requirements and server facilities.
First, consider why you're installing GPU hardware and where you'll use it. Even with unified computing architectures such as CUDA or OpenCL, an application only benefits from server GPU cards when it is designed to utilize the GPU and its parallel processing capabilities. Such use cases include virtualization, machine learning and big data processing. For an efficient data center configuration, you might want to migrate workloads that do not utilize the GPU to non-GPU servers.
The GPU must support the underlying operating system and any GPU drivers. You should verify that the application and its OS are fully GPU-compatible before any installation. If the software you're using is not programmed for GPU use, then you won't get any benefit from installing a GPU-based server.
Second, if you're installing server GPU cards as an aftermarket expansion device, consider the GPU's hardware requirements. An enterprise-class GPU can hold up to four GPU chips -- each with hundreds of cores. This can add thousands of watts of additional load on the server's power supply. Common white box servers may not support enterprise-class GPU add-ons without a major power supply upgrade.
Such a substantial load means the expansion bus cannot adequately power the GPU alone. The server needs to provide enough power to run and needs one or two additional connections available from the power supply to directly power the GPU.
The increased power load means that GPU cooling is critical. The GPU card possesses its own cooling devices, but you must ensure that there is plenty of unobstructed physical space and airflow for the GPU hardware. The additional heat ventilated by the GPU also winds up in the server rack, potentially affecting server spacing, rack cooling and even rack power distribution -- especially when you deploy multiple GPU-based servers in close proximity to one another.
Third, think about the effect of server GPU card deployment on workload resilience. Enterprise-class GPUs are expensive, so for the foreseeable future, not every enterprise server includes a GPU. This can affect the ability of IT administrators to establish clusters, migrate or restart workloads, and manage workload availability. If the workload relies on server GPU cards, and there are only a few GPU-based servers that will run the workload, then it limits the deployment and migration options.
Dig Deeper on Server hardware strategy
Related Q&A from Stephen J. Bigelow
Regression tests and UAT ensure software quality and both require a sizeable investment. Learn when and how to perform each one, and some tips to get... Continue Reading
Learn the meaning of functional vs. nonfunctional requirements in software engineering, with helpful examples. Then, see how to write both and build ... Continue Reading
Just because software passes functional tests doesn't mean it works. Dig into stress, load, endurance and other performance tests, and their ... Continue Reading