Best practices for server benchmark testing

Following the scientific method when benchmark testing a server will help reduce variables and produce accurate performance results.

This tip is the first in a series on measuring server performance. Read part two on conducting benchmark tests and part three on server benchmarking tools and stress testing.

Server performance is not subjective. Even if a server manages to run a workload–and run it well–IT engineers need clear and objective means for measuring a machine's metrics on demand and gauging its performance over time. In almost every case, benchmarks are used to measure and monitor server performance. This tip provides an overview of server metric and benchmark testing, and a six-step process for proper server performance testing.

Understanding server metric and benchmark testing
Server metric and benchmark testing techniques aren't new concepts; these ideas have been around for years and were utilized with some of the very first computer systems. However, designing server benchmark tests to measure performance is a complete science unto itself. The idea is this: Perform a process that is typical of what the system will be expected to perform. Then, execute and time the process. Finally, perform the exact same test on different systems and measure the results.

As server architecture advanced, it became more difficult to look at different computer systems by simply analyzing their specifications. As a result, metric and benchmark testing began to show up in server environments. But there are problems; All machines operate uniquely depending on their design and the unique demands of the operating system and workloads. Suddenly, there are many different variables to contend with.

We’ve all used Windows Task Manager to see how a certain application or process affects our RAM or CPU usage. That’s metric testing–albeit at a very simple level. The problem with Windows Task Manager is that it doesn’t show how a machine is really performing. Hierarchical cache subsystems, custom applications, custom hardware, massive databases, non-uniform memory and simultaneous processor multithreading have made a huge impact on the performance of modern computing systems.

The “science” of performance testing
Server performance is usually never affected by just one factor, so conducting a server performance test should resemble a bit of a science experiment. One of the best ways to conduct a server performance test is to utilize the scientific method during the analysis. This method is a six-step process that involves observation, preliminary hypothesis, prediction, tests/controls, and the final results of the test--a theory and conclusion. The conclusion is then supported by the best collection of evidence gathered while running the test. Both the optimum and minimal server performance levels are also obtained by the same evidence that was collected during the process. There are six steps to the scientific method:

1. Observation: Let’s assume a systems admin has just purchased a server and now needs to see how well it performs. The first task is to establish what the server will be doing. Is it a virtual platform or will it be running a dedicated application? Knowing the answers to these questions will set a baseline for where the tests can begin. Remember, metric and benchmark testing will vary depending on what is being tested and what is being utilized on the machine. For example, a system slated for database work may emphasize processor testing, while another system for network services may highlight network performance.

2. Hypothesis: In this step, the engineer establishes a benchmarking goal. What is the assumption or what does the test need to accomplish? Simply conducting a metric test will reveal some results, however, without direction or a clear goal, these results may be useless. Create a base idea that will be tested and surround the testing methodology around this. For example, the engineer may be trying to test the appropriate amount of RAM that a given application requires to run optimally. He or she can therefore hypothesize that ‘x’ amount of RAM will be the optimal amount for the given workload. This can be based on previous research, vendor benchmarks or other sources. Make sure that your hypothesis is testable. That is, don’t make an assumption based on data that cannot be confirmed through benchmark testing.

3. Prediction: Next, make a general prediction to the server benchmark test. Let’s assume that the machine is being dedicated as an application server. A systems admin could predict that by assigning additional cores to the workload, machine performance will increase, and conversely, the application’s performance will also improve. In some cases, an engineer may even predict the amount of improvement and wish to verify it through benchmarking.

4. Setting controls: A control is now set. For example, there may be a certain number of cores assigned to the server. At this point, an admin should change only one setting at a time until he or she observes a change in performance that is more acceptable. An engineer may want to set the machine at 6 GB of RAM and test with that in conjunction with all other preset settings (CPU, video and hard drive all go untouched). Setting a different control may include modifying processor settings while leaving the other settings in their original state.

5. Testing: Now that the controls are set, begin the metric test. The test is conducted from a baseline (a known good starting point) and the machine's setup is adjusted systematically. Each test sequence will have a result, and that result should be recorded and referenced later. In this scenario, a test sequence can be seen as a change in the settings on the hardware. Each time a new setting is applied, the test must be rerun and the results recorded. Once enough cycles have run, the engineer should have a solid spreadsheet of data to go through to complete their conclusion.

6. Theory and conclusion: Conduct the test and confirm that the application actually performs as expected with the resources or setup assigned to it. For example, determine that the application performs optimally with half as many cores as expected. From there, determine that a certain core setting provides the best server performance in combination with all other current variables (amount of total required memory, number of applications currently running, software upgrades/service packs, etc). Note that any change in the variables will require further experimentation.

That's the basic approach to metric testing or one-time performance baselining. The next tip in this series looks at Microsoft's Performance Monitor tool and evaluates the potential effects of adjusting critical settings.

ABOUT THE AUTHOR: Bill Kleyman, MBA, MISM, is an avid technologist with experience in network infrastructure management. His engineering work includes large virtualization deployments as well as business network design and implementation. Currently, he is the Director of Technology at World Wide Fittings Inc., a global manufacturing firm with locations in China, Europe and the United States.

Dig Deeper on IT compliance and governance strategies