Modern Infrastructure Editor-in-Chief
Published: 17 Jun 2014
If ever there was a cautionary tale about the need for application performance management, the HealthCare.gov debacle was it.
The reasons for the spectacular failures of the U.S. government's HealthCare.gov are well-documented: Poor technical design, lack of communication and integration between stakeholders, insufficient testing and an overzealous release plan, to name a few. Those flaws manifested in egregious ways: At the end of October 2013, the site had Webpage load times of eight seconds, error rates of 6% and uptime of 43%. A month before the initial deadline, fewer than 27,000 people had successfully enrolled through the federal exchange -- a small fraction of expectations.
"HealthCare.gov is what happens when a bunch of non-cloud people build a cloud system. It doesn't work," said Michael Coté, research director for infrastructure software at 451 Research.
And it was only when the cloud people -- a team of technologists led by Google site reliability engineer Mikey Dickerson -- stepped in that things started to improve. By the extended December 31 deadline, the "tech surge" produced a Website with error rates of less than 0.5%, sub-second load times, 95% uptime and 1.9 million enrollments through January 31 of 2014.
Here's where the application performance management (APM) vendors fit in. The first thing Dickerson's tech surge team did was install an APM dashboard tool from New Relic to report on and troubleshoot Website performance. APM was the reason the HealthCare.gov was able to make strides.
The more data the merrier
Even in complex infrastructures, performance management players have no lack of data to work with. Due to the rise of open source, open application programming interfaces (API) and the easy-to-deploy software as a service delivery models, gaining access to, storing and reporting on system-level data is easy.
"There aren't that many monitoring tools that don't look at everything because it's not that complicated to look at all the data," said Coté. That blurred the lines between APM vendors and data aggregation and analytics providers like Boundary, DataDog and to a certain extent, Splunk.
Meanwhile, infrastructure improvements around central processing unit (CPU) speeds, memory, bandwidth and data storage capacities further reduce any incentive to cut back on the number of systems monitored and analyzed, said Appnomic's Solnik.
No matter how many systems you're monitoring, today's focus has shifted to the end-user experience rather than devices, as evidenced by HealthCare.gov.
The end-user experience is the central focus of many of today's application performance management vendors. Whereas older APM tools insert instrumentation into the application code and monitor servers and networks, the next generation of APM products takes it one step further and displays application performance from the perspective of the users interacting with the application or Website.
If nothing else, application performance management vendors have HealthCare.gov to thank for highlighting the finer points of contemporary APM issues. "It certainly made it a lot easier for me to explain to my family what I do for a living," said Steve Tack, vice president of product management at Compuware, a longtime provider of APM tools. "End-user experience was what everyone was talking about."
We the users of the application
There are two basic ways to get the end-user experience: Generate synthetic load and transactions and measure how the application responds, and instrument the client code and measure the actual experience. For example, Web application performance monitoring players like Keynote Systems run a global network of computers and measurement devices that simulate what an end user would experience if they were visiting a given Website.
Services like these give IT teams insight into performance from around the world -- a tall order for any one organization. "We are the ones doing the heavy lifting creating this [monitoring] infrastructure," said Aaron Rudger, Keynote's marketing owner for Web performance. The company recently expanded its platform to include the mobile device user experience, which involves monitoring not just Internet providers, but telcos as well.
Keynote's competitors in the mobile space, such as Flurry and Crittercism, meanwhile, take the other tack. Their "observed approach" measures the actual performance of a specific mobile user by instrumenting the mobile application code before the app is published on an app store.
There are pros and cons to each approach, said Ray Solnik, president at Appnomic, a provider of predictive IT performance management tools. Synthetic transactions are useful, but they are "only there because you couldn't [reasonably] measure all the transactions," he said. Measuring actual transactions, meanwhile, tends to color the story in the form of load on the environment, inserting as much as 10% CPU overhead, he said.
His company's tools take a "sidecar" approach. "Basically, we have our software sitting on a box adjacent to the switch," and do not use an agent or a network sniffer. Instead, "we watch and monitor and use network introspection technology to understand the packet," he said.
Other APM products measure synthetic and real user experience. Compuware's User Experience Management, for instance, provides a synthetic view of Website performance. It measures real user experience by instrumenting the client, such as a Web browser or an app on a mobile device.
Ultimately, it's not necessarily the experience that needs to be monitored, but rather the complete user transaction across all the services that make up an application, said Jyoti Bansal, CEO and founder of AppDynamics. "Every time you click, millions of lines of code are executed in hundreds of places," he said. By focusing on the complete end-to-end transaction rather than any one part of the session, you only need to drill down to the device-level views when you see the overall transaction is slow.
AppDynamics customer ExactTarget.com corroborates Bansal's account. "If a user has a problem, we can quickly diagnose it by looking at the transaction at the exact time of the problem, rather than pore through logs and try and mine out the correlating problems," said Kevin Siminski, senior director of global product operations at the digital marketing company.
The ExactTarget application consists of about 15,000 virtual servers, and since the company deployed AppDynamics two years ago, it slashed the mean time to resolution for application performance problems from upward of an hour to "a matter of minutes," Siminski said.
Data does double duty
Recently, application performance management vendors started exposing data stored in their systems to non-technical users. After all, if you're going to collect reams of data about IT systems and their users, it makes sense to extend that data beyond the developer and operations teams and out to the business.
ExactTarget, for example, recently started rolling out an AppDynamics module called Transaction Analytics to its support staff to give them a macro view of performance problems customers may have.
Likewise, Advanced Computer Software, a software development house in the U.K., uses New Relic Insight to extend its performance data to business executives, including the company's managing director. "Insight adds a whole other level," said Martin Reynolds, an application architect and development manager for the firm. "It provides us with an extra tool to engage the customer to give them a better experience," he said.
Then again, extending APM tools to business users isn't a huge technical leap. "The majority of the information that you want already exists in the data somewhere," said Bill Hodak, New Relic's senior director of product marketing. Making that data palatable to business users is a matter of providing a new interface, plus a natural language query option to help users find what they're looking for.
For their next trick, application performance management companies must go beyond merely reporting on what the user is doing -- what transactions they are executing and what performance they are experiencing -- and start to deliver insight into what the user is going to do next. Fortunately, engineers and big data scientists are working on that very problem.
Find out where digital performance monitoring and UX intersect
Does the API era mean single pane of glass monitoring has arrived?