Tom Vaughan, the director of IT infrastructure at Roswell Park Cancer Center Institute in Buffalo, N.Y., faced...
an all-too-familiar challenge for many data center managers.
He operated two redundant data centers, chock full with 300 servers, managed by a lean staff of six. "The state of affairs typical in many data centers," Vaughan said. "Servers in silos all over the place, running out of space and power. Things were starting to get unwieldy, out of control."
As a leading cancer research institute and hospital, Roswell's small IT staff was constantly picking up support for new research tools, patient records or new applications around the hospital. "We were suffering from the death from a thousand cuts," Vaughan said. "It's the usual way you get server sprawl -- we kept buying them one more at a time."
To make things more complex, Vaughan's team dealt with three sets of auditors coming through the data center annually -- one from Roswell's board, as well as financial and Health Insurance Portability and Accountability Act (or HIPAA)-related audits.
"We're constantly being audited one way or another, and a lot of times they asked 'Are all servers built the same?' The answer was 'no,'" Vaughan said. "We were losing control of everything and it was going to get worse."
Vaughan said he knew it was time for a server consolidation using virtualization. Roswell is primarily a Windows shop, but Vaughan went with VMware over Windows Server and Hyper-V because the latter "didn't have the tools VMware had."
Physical server sprawl gives way to VM sprawl
But virtualization would solve only the power and physical space problems. "That just transferred the problem from physical rackmount servers to virtual machines," Vaughan said. "I read different articles about the next-generation data center, and came across HP's BladeSystem Matrix, which handled a lot of our problems for us."
Converged infrastructure platforms like Hewlett-Packard Co.'s BladeSystem Matrix, Cisco Unified Computing System (UCS) and the Oracle Exadata have all storage and IP and network configurations set up in advance. These systems allow users to create service profiles and templates, enabling IT managers to quickly add new resources to a data center -- automating the provisioning of servers with the same configuration.
Jonathan Eunice, a principal IT adviser at Illuminata Inc., said the complexity of integrating mix-and-match IT components is making these converged systems more attractive to organizations like Roswell.
"Doing the research to select and acquire those components, then assemble them, requires expertise -- staff and effort -- time and money. And it adds risk, because the enterprise is then responsible for their integration on initial build, and at all future points when components are upgraded, patches are applied," Eunice said. "It just basically puts the buyer into the computer engineering, assembly and operation business."
For huge companies with large IT departments or core competencies delivering IT services, custom-built and maintained systems make a lot of sense. In fact, they're a competitive advantage. But most enterprises, according to Eunice, do not have them.
Converged hardware means vendor lock-in
When Cisco UCS and HP BladeSystem Matrix customers are touted in the press, it's somewhat misleading. These customers are basically one-vendor shops. The heterogeneous data center operators are leery of lock-in. But Eunice said blade servers have had the same effect for several years.
"You have to buy a chassis, and generally be emotionally and culturally prepared to make further purchases from the same vendor as you fill up the chassis. It limits the variability and heterogeneity of the shop. But for the most part, that's a good thing given that the mix-and-match-with-abandon strategy of the past few decades has led to considerable sprawl and extra cost of operations," Eunice said. "Every IT shop I know of that's figured out how to operate at high speed and high scale has been extremely disciplined at reducing variability and options in their data centers."
Roswell did not consider Cisco's UCS offering. "I believe in products from within a company's core competency," Vaughan said. "Cisco is a networking/IP phone company. HP servers and C-Class blades are number one, descendant from Compaq -- a healthy heritage. Cisco to me is a 'me-too' company when it comes to servers. I don't buy networking or phones from HP.
"I am an old Digital-Compaq and now HP customer. Maybe that makes me a bit biased," Vaughan said. "But after 25 years of experience, I've seen many computer companies come along as me too -- Next, AST, NCR -- and fail."
Roswell worked with an HP VAR, Affinity Enterprises, to orchestrate the configuration and consultations necessary to get the correct SKU and the order placed with HP. Hewlett-Packard then came in with specialized engineers and brought up the Matrix systems and integrated them into Roswell's environment.
Roswell now has a pair of BladeSystem Matrix systems in two data centers in a redundant configuration, and Vaughan is currently migrating as many workloads as he can to the new platform. "We're moving over anything that's supported by VMware," Vaughan said. "Our electronic medical records system is now supported in test on VMware. And a lot of production systems that have to do with patient accounting, cash flow, payroll, email, have been in VMware for two years."
Vaughan said there are a few holdouts, such as smaller vendors that haven't come onboard with official VMware support. But he hopes to eventually move everything over to the new platform, so in the future when auditors ask whether Vaughan can verify that all the machines are built the same, he can say 'yes.'
The new system has also made it much faster to roll machines out. "In the past year, to provision a new server took about six weeks," Vaughan said. "Now, if we have the resources, we can provision a server within hours."
But that easy provisioning opened up its own can of worms. Vaughan said he sold the new system to the executives on the concept that IT could produce servers quicker, but that has created its own problem.
"We're seeing VM sprawl, and one of the things [that] kept us under control is the economic climate," Vaughan said. "We've held off increasing our storage capacity for one year. I tell people we don't have the storage to grow uncontrollably while we come up with some policies on what gets approved."
Reining in server sprawl
That is one of the big problems with self-service IT systems: How can you control server sprawl?
"Because virtualization and self-provisioning are the technical enablers, they get blamed for what is, in fact, a business policy and governance issue," Eunice said.
For self-service IT to work, organizations need to set up IT governance up front: policies, asset inventory and change management procedures.
"You have to be able to set up workflow approvals and authority," Vaughan said. "We have a change management system in place, but the next phase is getting ITIL [the IT Infrastructure library]. We're trying to eat the elephant one piece at a time."
What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at firstname.lastname@example.org.