Betfair, an Internet gambling site, claims it processes more than three million betting transactions every day, and any minor application hiccup can slow traffic to a crawl. To make sure that doesn't happen, the firm uses DTrace, a
More on DTrace and Solaris
The U.K.-based site started in 2000 and calls itself an "online gaming exchange." Rorie Devine, the chief technology officer (CTO), said his IT infrastructure dealt with 1.2 billion debit and credit card transactions last year, twice as many as any other European Web site.
"Our load and usage doubles every year," he said. "So we're all about high growth and reliable transaction processing."
The company has three main data centers -- one in England, one in Malta and another in Australia. Together, they are running "hundreds" of Sun Solaris servers, from the Advanced Micro Devices Inc. (AMD) Opteron-based x4100s -- which start around $2,000 -- to the UltraSPARC-based E6900s, which start at about $240,000.
Devine started reading up on DTrace when Solaris 10 came out in 2005, and Betfair eventually worked Solaris 10 into its upgrade schedule and uses DTrace regularly.
"Recently, we had a real performance issue with a piece of back-end storage," Devine said. "It was critical back-end storage for one of our databases. The problem would have been a showstopper, but DTrace was really helpful. It was able to quantify the issue and help us fix it."
How does DTrace work? According to Al Gillen, IDC research vice president for system software, DTrace works by intercepting calls to various system services. By doing that it can sense where the bottlenecks are in an application's code.
"It's going to say that this portion of your code is where the majority of your cycles are being burned," he said. "From a programmer's perspective, it doesn't say, 'This statement is the problem.' It says, 'This is the page you need to be worried about.' "
With it, end users can write scripts using DTrace that can poke into the operating system and applications, and see where it's performing slowly. It won't explicitly say how to fix it but acts as a pointer so system administrators know where to start looking. Originally designed by Sun employees to work in the Solaris operating system, it is also currently being built into the FreeBSD and Mac OS X operating systems.
DTrace on Linux?
Sun also made Solaris 10, along with the DTrace feature, open source under its OpenSolaris project. But it operates under the Common Development and Distribution License (CDDL) rather than the General Public License (GPL) that Linux distributions operate under, a sore point for some Linux users. As a result, Solaris source code, such as that in DTrace, cannot be used directly in Linux, although it can be run on Linux applications using Solaris Containers and BrandZ, virtualization technologies for Solaris 10.
Since Sun touts DTrace as being one of the main differentiators between Solaris 10 and Linux, don't expect the Solaris-to-Linux port to happen soon.
Despite the limitation, DTrace has developed a good fan base. It has won numerous awards for technology innovation from publications like the Wall Street Journal. RedMonk analyst James Governor called DTrace "one of the most powerful instrumentation, performance optimization and troubleshooting platforms ever developed, period."
Bryan Cantrill, one of the engineers behind developing DTrace (the others were Mike Shapiro and Adam Leventhal), said it came about as Sun was working on improving Solaris but kept encountering bottlenecks. Two weeks later, they finally found the source of the problem: One of the production systems thought it was a router when it was supposed to be benchmarking Solaris.
"First and foremost, Sun built DTrace so they could improve the performance of the operating system," Gillen said. "It's allowed them to get some improvements to Solaris, and the same things are applicable to an application, as well."
For Joyent Inc., a developer of online collaboration software, DTrace helped it tune Ruby on Rails, an application that runs on Twitter Inc., an online social networking site. Using DTrace, Joyent discovered that a lot of compute power was being used when "raised exceptions would generate back traces that were going through hundreds of frames," Jason Hoffman, CTO at Joyent, wrote in the Joyent blog.
DTrace does its troubleshooting while systems are in production, something that Sun's Cantrill is particularly proud of.
"One of the things we were trying to solve was what we thought was a glaring problem in computing systems," he said. "When you've got this complicated running system, the system itself can't really be seen. The reasons for that are borderline philosophical, but the software is in this blurry area between information and the machine. The upshot is you can have software that is pathological. and it's difficult to see what's actually going on with it. We wanted to develop a framework facility that could answer arbitrary questions about the system."
For Betfair, that's in stark contrast to how it used to have to operate.
"We've had situations in the past where with each piece of an operation, we would have to write logs and figure out delays in checkpoints," Devine said. "It's just harder when every time it does something you have to write it down. Now we can just dive in there with DTrace."
Let us know what you think about the story; e-mail: Mark Fontecchio, News Writer.