As data centers proliferate and bigger data centers emerge each year, IT directors undergoing the data center planning process at a smaller scale can still learn something from next-generation data centers like those built by Facebook.
Data centers are growing larger due to traditional in-house IT operations outsourcing to managed service providers and cloud hosting companies, as well as IT behemoths like Apple and Google building equally behemoth-size data centers.
Customers want fast response times on applications, leading many data centers to be built within close proximity to end users, where real estate is at a premium. Efficiency and scalability are front of mind for new data center builds.
Despite advances in programming, the typical answer to making an application faster is to throw more hardware at it. New hardware resources are easier and cheaper to procure than application-centric toolsets. Simply increasing capacity to meet application demands exacerbates efficiency and scalability problems.
The Open Compute Project
These data center realities set the stage for the Open Compute Project. Some of the largest application providers -- Facebook, Google, Apple -- deal with the same operating costs that encompass data center real estate, power and scalability as enterprise data centers and service providers do. The Open Compute Project allows these leaders to share some of the data center planning and development lessons learned on their tens or hundreds of thousands of servers deployed, where even minor tweaks to design or power efficiency can greatly reduce support and operating costs, as well as hardware issues.
Data center hardware is generally very similar from vendor to vendor, with manufacturing standards and a certain level of interoperability. And as anyone with data center operations experience can attest, bad things happen even with big-name vendor hardware. Hard drives fail, memory goes awry and network interfaces go down. The real test is how your infrastructure and application can handle these situations.
Even though The Open Compute Project started by looking at basic hardware approaches and design, the group's roadmap is even broader. The Open Compute Project has the potential to do two things at once. It could provide a vetted channel for best practices and hardware design improvements with a distinct operational focus, and also expose hardware vendors to operational feedback based on the realities of the data center.
One area of the project that's under development, Hardware Management, reviews best practices for remote machine management. This could easily dive into solution architecture and push vendors to increase application programming interface exposure to simplify deployment and integration.
The project also covers server technology, including motherboard specifications, power supplies, chassis layout and other considerations. Standard hardware and interoperability designs benefit from best practice and tools information. In storage technology, the Open Compute Project is developing virtual I/O to better tie in to the compute model with resource allocation.
One ambitious area under development, Data Center Technology, considers those physical layers that tend to be taken for granted in data center planning and operations, including mechanical and electrical specifications. Project members defined an approach to data center cooling that could apply to a wide range of data centers with similar environmental conditions.
How IT directors and system admins benefit from the Open Compute Project
The Open Compute Project is very ambitious in scope, with industry heavyweights contributing actively and providing useful data. So what does this mean for the typical IT director or system administrator?
Pros. Depending on the size of your organization, you'll be exposed to some or all of the subject matter areas the Open Compute Project addresses. The line between server, storage, network and overall data center operations is more blurred every day, so understanding changes in other environments is a great way to keep on top of trends that will also affect your plans.
Upsides for all data centers include access to best practices developed by the Open Compute Project, holistic information on data center operations and a sneak preview at innovative data center practices and policies.
Even though every aspect of operations affects the whole data center, the elements are typically disparate. From the power entering the data center to the data packets leaving the network, the infrastructure will be more efficient when these separate elements are integrated.
Engineering staff can sometimes be stuck in reactive mode and not able to put efforts into more proactive measures to improve operations. The Open Compute Project exposes best practices and provides real-world improvements for any organization. I especially appreciate the Remote Management area, because there are so many different ways to do remote management. Having some additional vetted data points is very helpful in identifying what will scale with the business and how to avoid mistakes. Tying in management with best practices information is helpful.
Cons. You'll need to do some legwork transforming Open Compute Project information into a digestible data center roadmap for your organization. And the project's data is only as good as the submissions it gets.
It's easy to get excited and try to do everything at once. When presenting new approaches to upper management, provide timelines for implementation along with projected cost savings. Ideally, you can use a conventional staged approach with a limited deployment that proves key concepts and savings.
The Open Compute Project is only as good as the vetted and approved content submissions. The team is taking this seriously, and initial data submissions are high quality. As more content becomes available and the scope of the Open Compute Project expands, they will need to maintain content quality with a larger pool of contributors. The "best practice" that may work for one organization isn't always right for another, so content must be modular and maintained as standards.
With the myriad of content published, the next logical question is "How does this apply to my organization?" The context of the published work will vary depending on the original source and contributor's editing style. This is important when talking about hardware specifications as well as best practices. Without at least an understanding of the context inherent in the publication, your internal data center planning process will be more difficult. The Open Compute Project has some turnkey elements, but plenty of them will require assembly.
A larger data center development community
There are other communities as well as enterprises that IT operations folks and systems administrators can turn to for information on and best practices about data center planning. IP Best Current Operation Practices (IP-BCOP) hosted by NANOG (North American Network Operators Group), for example, is network-centric but also a way for sharing information and best practices among engineers and administrators. By working closely with submission ideas from operators, there is a good flow from initial draft to ratification by the group. For more information, check out bcop.nanog.org.
While eBay technically falls into the "enterprise" category in most people's minds, it is truly a service provider that relies heavily on its infrastructure for revenue. Because of its growth and history, eBay is taking some very interesting steps to look at how to measure operational efficiency as a whole -- and make it transparent to the end user. The company also shares these metrics for other data centers' benefit; check out eBay's Digital Service Efficiency dashboard at http://dse.ebay.com/. This is simply the natural evolution of how businesses leveraging various computing platforms will begin to test and confirm efficiencies as the systems that manage and monitor them catch up.
Pete Sclafani is chief information officer with 6connect. Contact him at firstname.lastname@example.org
Facebook touts new networking switch