It’s just not enough to install new servers, set them up, install applications and then walk away. Servers need regular performance monitoring to ensure that your hardware investment will deliver the service you expect – and provide ample early warning of impending trouble, such as resource shortages or hardware issues. Performance monitoring tools can provide a wealth of useful information, but only when those tools are set up and running properly. Fortunately, a few important insights will help any administrator get the best results from performance monitoring.
Achieving accuracy in performance monitoring
Monitoring is useless if it delivers erroneous information, so
Interoperability. For this discussion, interoperability is basically the ability of a performance monitoring tool to access and read data points from the various pieces of hardware within your data center environment. Homogeneous environments focused on a single vendor’s product line can take advantage of performance monitoring tools that use hooks deliberately integrated into the hardware. These hooks can deliver detailed information to the tool.
The situation can be far more challenging for heterogeneous environments, where tools and hardware don’t mesh. A vendor’s tool may look for data that certain pieces of hardware simply cannot provide with the required level of consistency (if at all). It’s a similar problem for third-party performance monitoring tools that often cannot detect every sensor or hardware nuance on every possible device, and instead rely more on operating system-level data, which usually lacks granularity. In either case, the result is missing data or inaccurate data points that reduce the insight gained from performance monitoring.
This unavoidable correlation between tools and hardware requires comprehensive testing. For example, run the tools before you buy them, and verify compatibility with a long-term proof-of-principle project that will take the tool from a lab setting into a production environment. But the problem also extends beyond the initial purchase to future upgrades and technology refresh cycles. When you change hardware or update the tools, you’ll need to test interoperability to ensure the continued integrity of your performance monitoring system.
Sampling. Accuracy will also depend on the sampling window used to gather data. This is particularly important when workloads or operating parameters can change radically over time. Ideally, performance monitoring should capture the entire “operational cycle” of the machine. The trick is to determine what that “operational cycle” should be. It will depend on the way that each workload and host machine is used. For example, watching the memory performance of a server may require a fast sampling rate with a window that spans just a few minutes. Conversely, watching the CPU utilization of a corporate HR system may require monitoring at a lower rate, but over a 30 day period or longer. There is no single answer, and various system attributes may be monitored at several different rates and windows.
“If you’re testing how well a server environment will work during a usage spike, the administrator should set their schedule that will look at regular operations, then the performance spike, and finally the return to normal operations,” said Bill Kleyman, virtualization architect at MTM Technologies Inc. “Setting the schedule too far back will capture useless data, and setting scheduling too short will miss some of the important data-at-rest statistics prior to the performance peak.”
Tool architecture. Performance monitoring tools rarely operate without the use of agents or drivers installed on each host system (or even each virtual machine). Agents are mixed blessings. They are useful because agents can collect and deliver far more granular information than “agentless” monitoring tools. However, agents are also software “clients” reporting back to a central server that collects and processes the data. So each agent also requires a certain amount of computing resources that can potentially impact the performance of the related workload.
“All the computers in my environment have two agents,” said Chris Steffen, principal technical architect at Kroll Factual Data. “An application agent monitors the health of our applications, and we have System Center [Virtual Machine Manager] agents on all virtual machine hosts.”
The negative impact of agents is generally lower now than in years past, but their influence should still be evaluated, especially on mission-critical or performance-dependent workloads. In addition, Steffen also notes that emerging tools may provide features that can automate the installation, reinstallation and maintenance of agents within the environment.
Virtualization awareness. Virtualization software works by abstracting the application workloads from the underlying hardware. When traditional performance monitoring tools attempt to report within a virtual environment, the abstraction layer often causes erroneous results because the older tool may attempt to monitor hardware directly rather than through the hypervisor, which controls computing resources. Considering the popularity and importance of virtualization technology, administrators should certainly select performance monitoring tools that are virtualization aware. This allows performance monitoring to take place in both physical and virtual targets, and administrators can gather accurate information about the system’s resource utilization and behavior.
“Administrators will sometimes gather metrics of the VMs and the physical host that they are running on,” Kleyman said. “This way, performance can be monitored at the virtual and physical level to ensure the best workload performance and a solid end-user experience.”
Sensor calibration. Don’t overlook the importance of the sensors themselves. Digital data produced from the network switch or server may be quite reliable over time. But some sensors, such as temperature, humidity, air flow or other environmental sensors with an analog element, may also require regular calibration and periodic battery replacement to ensure reliable long-term operation.
Making the most of performance monitoring tools
Tools have little value if they are not employed productively. In far too many cases, performance monitoring tools are deployed, but there is no clear plan on how to use the vast amount of detailed data that the tool produces. The tool winds up being marginalized as administrators only use it for spot checks or occasional troubleshooting; it’s a wasted investment.
Experts suggest boosting the value of your performance monitoring tool by understanding the business implications – why it’s needed and how its data will be used – long before the tool is actually deployed. Also, take full advantage of the tool’s analytical features to help evaluate and report on collected data. It may take time to configure the tool’s reporting features for your specific environment, but the insights gained from proper analytics are worthwhile.
Performance monitoring reports can also provide a factual foundation for capacity planning or help make a case for technology refresh projects. “Performance metrics can help show ROI [return on investment],” Kleyman said. “By knowing what older systems did and how well new ones are performing, we are able to put a dollar figure on our environment and garner more funding for further improvements.”
But Steffen also suggests caution, urging a “trust by verify” attitude to performance monitoring tools, noting that some server tools have proven to be quite accurate when compared against similar tools, but recalls that some network tools produced inconsistent responses. Good business decisions require good data, and tools with inconsistent or unverifiable results make it difficult to formulate critical business decisions confidently.
This was first published in December 2011