If the Olympics awarded a medal for the biggest and most complex computing environment, New York-based Merrill Lynch & Co. Inc. surely would take the prize. But should it come, the win will not have been cheap or easy, according to an analyst familiar with the deployment.
Jeffrey Birnbaum, Merrill Lynch's managing director and chief technology architect, told LinuxWorld Conference & Expo attendees that Merrill Lynch's "stateless computing" environment, a private, corporate cloud of on-demand IT resources, provides the ultimate flexibility in anywhere, anytime computing.
Stateless computing no cakewalk
Under construction for several years, Merrill Lynch's global compute network consumed a large chunk of its $1 billion IT budget but transformed IT from a fixed number of dedicated servers to a flexible pool of utility compute capacity that can be re-allocated in response to changing requirements, he said. The five-tier network includes 15,000 servers running Red Hat Enterprise Linux and Microsoft Windows in nine different locations and includes a Monte Carlo grid for complex risk analysis computations, according to Rachel Chalmers, the research director of the New York-based 451 Group.
In remarks during and after LinuxWorld's Executive Summit, Chalmers praised Merrill Lynch as an "extraordinary global leader" in technology but observed that deployment of such a large innovative network stretched the limits of existing technologies to the breaking point, with plenty of expensive missteps along the way.
Just getting everything up and running, the physical deployment of a network with so many servers, was daunting, Chalmers said. Merrill Lynch also experienced problems with networking, provisioning and storage, she said.
In particular, the Wall Street firm "really stumbled" on network monitoring using Tivoli Software and Simple Network Management Protocol (SNMP) industry standards, as the environment vastly exceeded the volume and complexity of what these tools can handle, she said.
In addition, application support was "a gigantic mess," given the disconnect between the original development team and those providing support in the field, she said. In the future, Merrill Lynch needs to consolidate these functions in service centers, where the expertise of the original team can help the support staff resolve problems, she said.
Another concern is Merrill Lynch's desktop virtualization initiative. While praising the initiative as "at the forefront" technologically, Chalmers said it exchanged cheap PC storage for more costly back-end storage area network (SAN) storage. In the future, companies may look at cutting costs by storing virtualized desktops in a local server closet instead of a data center, she said.
"Everybody has problems, but theirs are bigger," Chalmers said. "They are pushing the tolerances of the system to the breaking point. So they can spot problems like SAN problems much earlier."
Despite all these problems, Merrill Lynch still has one of the best grid deployments in the industry, she said. And the centralization of resources repositions IT as a service provider whose costs can be charged back to business units in proportion to their usage, she said.The secret sauce: A master file system
In his keynote, Birnbaum explained that the key to Merrill Lynch's stateless computing network is a master file system where all applications, operating systems, software dependencies and libraries reside, eliminating the need for a software stack on individual servers. In Merrill Lynch's centralized data centers, software is pushed up to the master file system, with servers accessing it indirectly through cache. Conversely, instead of fulfilling workload demands directly, servers route all requests to the placement engine, which manages all pooled resources and dispenses them on demand, he said.
The master file system also orchestrates stateless desktop computing, streaming an operating system and applications to standardized desktop hardware as requested, he said.
The backbone "fabric of choice" for stateless computing is 10 gigabit Ethernet, Birnbaum added. Having a single, high-throughput connection between all resources enables companies to boost server utilization above the average rate of 61%, he said. The downside is that a lot of redundancy must be built into the network, which is very expensive, he said.
"It's utility computing, Birnbaum said. "It's a different paradigm."