Application performance management has become a black art. When end users start calling in with complaints about apps hanging up and taking too long, the finger pointing begins. Where is the problem? Is it the network, the database, the app server?
In today's virtualized, Web-tiered, multiplatform application environment, doing triage on a poorly performing application often means all hands on deck. You wind up with an application flow that looks like a plate of spaghetti, and everyone -- from those on the help desk to those in Web development -- winds up on the phone call.
According to Jasmine Noel of Ptak, Noel & Associates, the responsibility of performance management is distributed -- but organizations don't like to work that way. So the server management teams end up taking on the broader role of Web applications managers and becoming the one throat to choke.
In this guide, we've pulled together our best multidisciplinary performance management content, including new tips on using free and collaborative software tools. Over time, we will add to the guide with blog posts and input from thought leaders at systems management vendors like BMC, CA and from analysts who outline the role of end-user testing and application root-cause analysis. Stay tuned to our new distributed computing blog Server Farming, and bookmark this page for updates.
TABLE OF CONTENTS
Turn to collaborative tools for systems performance management
Some vendors and service providers have recognized the need for collaborative performance management and now provide tools with their products for just this purpose. These emerging methods can be grouped into two categories: sharing configuration and group problem solving.
Use Nagios to trend and troubleshoot performance issues
An ever-vigilant computer that probes system health and can monitor the system performance of several machines is a huge help to an IT administrator who might actually want to catch some sleep now and again.
One solution is the open source offering Nagios. The modular nature of its probes and the fact that the plug-ins themselves are pretty easy to write means that if Nagios doesn't happen to check an attribute out of the box, admins can easily write a new script that does or find one from the OS community. Additionally, a large set of third-party plug-ins are available online that go beyond system load and ping checks and move into storage area network multipathing and more advanced Apache monitoring.
Is effective performance management in the data center possible?
As complexity and interdependence among infrastructure, applications and functions required to deliver business services increase, so does the complexity of performance management, which only makes the task more difficult for system administrators. Additionally, strained budgets and fewer experienced personnel augment the problem.
But new intelligent automation systems provide nonexpert IT staff the access and ability to effectively use analytics, reporting and visualization capabilities in operational settings to conduct performance monitoring activities without extensive training. These automation services save time and reduce downtime for critical business services.
Stop server monitoring tools from crying wolf
If you've ever set up an intrusion detection system (IDS) or central monitoring for a new network, you're familiar with the stream of email you get when you first turn on the system.
Of course, the downside of any alert-based system is that with a steady stream of alerts that amount to nothing, you naturally take these alerts less seriously. A poorly tuned monitoring server or IDS can be more harmful than having no server monitoring system at all. But to stay informed of genuine problems, the next step is to weed out the false alarms.
With end-user performance management tool, credit union hones troubleshooting
When a credit union brought its online banking operations in-house, it focused on monitoring a few aspects of end users' experience: the amount of time a page took to load and the time it took a Web server to respond to a user request. The company evaluated several tools, including one from Gomez Inc., but ultimately opted for Symphoniq Corp.'s TrueView. The software is loaded onto the Web servers that serve up online banking pages. The back-end database system collects all performance data and generates reports. Now the credit union uses the software in test and production environments, and the software enabled data center staff to identify the root causes of a few nagging performance problems.
Sun Solaris DTrace troubleshoots system bottlenecks
An Internet gambling site that claims to process more than 3 million transactions daily has turned to DTrace, a feature of Sun Microsystems Inc.'s Solaris 10 operating system to track and resolve issues in its development environment and, ultimately, to catc miscues before they go into production.
DTrace intercepts calls to various system services. In doing so, it can detect the bottlenecks in an application's code. End users can write scripts that can poke into an OS and applications and see where it performs slowly. Originally designed by Sun for the Solaris operating system, it is also currently being built into the FreeBSD and Mac OS X operating systems.
Solaris system performance tuning
Factors that can undermine a computer system's performance include available memory, disk I/O, and CPU utilization. Solaris provides the
vmstat (virtual memory statistics) and
iostat utilities (among other tools) to monitor these system aspects. These utilities are helpful for performance tuning because once you know the statistics about your system, you can tune it to ensure that physical memory, virtual memory, disk (capacity and I/O), and CPU utilization are all evenly balanced and that no single factor is loaded more than the rest. Solaris also provides
sar (System Activity Reporting) for historical monitoring,
prstat (real-time process and thread activity monitoring),
truss (traces system and library calls). Solaris 10 uses DTrace, which dynamically traces system aspects and application processes without having to "wrap" an application such as
truss around the process.
Improving performance: Virtualization management guide
In virtual environments, managing performance with old standby server metrics like CPU and memory usage won't cut it anymore. These increasingly dynamic and fluid environments don't lend themselves to isolating root causes. Still, infrastructure managers can find an alternative to simply measuring isolated metrics and achieve the true goals of performance management: ensuring uptime, improving application performance and maximizing resources in their data centers.
The guide features advice on minimizing virtual sprawl, setting performance benchmarks, analyzing system metrics, mapping your virtual environment, minimizing downtime and maximizing resources.