Server virtualization has lowered IT costs and improved server utilization. But its proliferation has increased the amount of storage required to alleviate performance issues in virtual environments
Server virtualization has caused many enterprises to overhaul their data backup and disaster recovery (DR) strategies because of virtual server sprawl and the strain that applications distributed across virtual servers places on backup and DR. EMC, Hitachi Data Systems, IBM, NetApp and Dell have addressed the server virtualization storage issue by incorporating storage virtualization, deduplication and thin provisioning in their offerings.
Server virtualization storage issues occur when data centers use storage technologies based on physical servers in their virtual environment. Part of what makes virtual server sprawl a problem is that virtual servers can consume up to 30% more disk space than physical. Another problem is the VM “I/O blender” issue: traditional storage architectures cannot effectively manage the random I/O, scrambled patterns created by virtual servers. Virtual storage management is also more complex in virtual server environments than in physical environments -- provisioning a virtual server means provisioning storage.
Solving the server virtualization storage problem
As an IT manager, you have options for solving this server virtualization storage problem, starting with some less practical solutions. One is to adopt server virtualization at a slower rate. You can run fewer virtual servers per host, reducing the impact of the I/O blender problem. Another is to overprovision storage, but at a high cost.
A much better option is to approach storage purchases more intelligently and seek technologies like storage virtualization, deduplication and thin provisioning, which optimize the storage requirements of virtual environments. Applying this strategy means becoming familiar with new technology and developing relationships with new vendors, such as Vistro, DataCore and FalconStor.
Storage virtualization as a solution
Many analysts and storage vendors recommend storage virtualization as a remedy to the server virtualization storage problem. Even without the problem, storage virtualization reduces data center costs, improves business agility and is an important component of any private cloud.
Conceptually, storage virtualization is similar to server virtualization. Abstracting physical storage systems hides the complexity of physical storage devices. Storage virtualization pools the physical storage from multiple network storage devices into what appears to be a single storage device, and virtualizes disks, blocks, tape systems and file systems. One advantage of storage virtualization is that it helps storage administrators manage storage devices and perform tasks such as backup/recovery and archiving more quickly.
Storage virtualization architectures maintain a map of virtual disks and other physical storage. The storage virtualization software layer (a logical abstraction layer) sits between physical storage systems and the host running virtual servers. When a virtual server needs to access data, the storage virtualization abstraction layer provides the mapping between virtual disks and physical storage devices and transmits data between hosts and physical storage.
While the implementation of server virtualization is understood, storage virtualization gives differing understandings of how and what technology should be used. Contributing to this confusion is that storage vendors host storage virtualization in different ways, some directly on storage controllers or on SAN appliances. Also, some implement storage virtualization commands and data along the same path between server and storage (in-band) while others split the command and data paths (out-of-band).
Various technologies deliver storage virtualization, whether software-, (host-), appliance- or network-based. Host-based provides a virtualization layer and presents storage as a single drive to applications. Appliance-based uses a hardware appliance that sits on the storage network. Network-based is similar to appliance-based, but works at the switching level.
Storage virtualization technologies have some drawbacks. Tools that enable host-based storage virtualization are essentially volume managers and have been around for many years. A volume manager on a server is configured to take several drives and present them as a single resource, which can then be divided up as needed, but that configuration has to be done for each server. This approach is best suited for smaller systems.
With the appliance-based approach, hosts just query the appliance as if it were a storage unit, and the appliance redirects the host request to the appropriate unit. Because the appliance-based approach places both block data and control information (metadata) on the same link, there is a potential bottleneck, slowing the movement of data to the host. To keep latency low, appliances often maintain a cache for both reading and writing operations, increasing their price.
Server virtualization storage innovations: Thin provisioning and deduplication
Two storage innovations, thin provisioning and deduplication, are also ways to reduce the storage capacity required in server virtualization environments. These two innovations can be combined with storage virtualization to provide a solid approach to controlling storage capacity.
Thin provisioning makes storage “go farther” by eliminating allocated but unused capacity. It also relies on on-demand allocation of blocks of data versus allocating all the blocks up front. This methodology eliminates almost all white space, helping avoid the poor utilization rates, often as low as 10%, that occur when large pools of storage capacity are allocated to individual servers but remain unused.
In many implementations on an as-required basis, thin provisioning provides storage to applications from a common pool of storage. In this case, thin provisioning works in combination with storage virtualization.
Data deduplication entails detecting and removing duplicate data from a storage medium or file systems. Detection of duplicate data may be performed at the file, bit or block level. Deduplication identifies identical sections of data and replaces them with references to a single copy of the data. For example, a file system that has a copy of the same document placed in 50 folders (file) can be reduced to a single copy plus 49 links back to the original document.
Data deduplication can be applied in server virtualization environments to reduce storage requirements. Each virtual server is contained in a file, sometimes large. One virtual server feature is that a system administrator can stop a virtual server, copy it, and back it up. It can then be restarted, placing it back online. These backup files are stored on a file server somewhere, generally with duplicate data across the files. Without deduplication, this type of backup practice can increase storage requirements significantly.
Change how you purchase storage
Even though storage virtualization, deduplication and thin provisioning can help with massive data growth, organizations may need to change their storage solution purchase criteria. For example, if you purchase storage with deduplication available, you may not need as much storage as you first thought. With thin provisioning, storage capacity utilization efficiency can be automatically driven up towards 100% with very little administrative overhead.
Traditional storage purchasing includes estimating baseline capacity for workloads needed, estimating potential growth rates over a three-year period, adding some extra capacity for various other requirements and issuing an RFP with the storage configuration details. With the advent of server virtualization and cloud computing, buying more capacity the traditional way is not sensible, especially since budget is the biggest limitation of purchasing storage.
Here are some simple storage purchasing guidelines:
- Unless absolutely necessary, don’t buy storage unique to an application that solves a specific business problem. Doing this precludes economies of scale that storage architected for sharing can provide.
- Focus on storage solutions that support multiple protocols and provide flexibility.
- Consider the breadth of applications/workloads that a storage solution can support.
- Become familiar with concepts such as deduplication and thin provisioning which can help solve storage problems.
- Become familiar with the types of storage management software available for automation to reduce system administration costs.
Many organizations that have some server virtualization in place and are thinking about private
clouds are going to spend on storage hardware. It is important that the storage budget is used to
purchase the right hardware or software. Don’t focus on getting the most storage for the lowest
price. Instead, start with the business problems that need to be solved, and work at getting the
most value around storage that helps address these problems.
ABOUT THE AUTHOR: Bill Claybrook is a marketing research analyst with more than 30 years of experience in the computer industry, with the last 10 years in Linux and open source. From 1999 to 2004, Bill was Research Director, Linux and Open Source, at Aberdeen Group in Boston. He resigned his competitive analyst/Linux product marketing position at Novell in June 2009 after spending over four and a half years at the company. He is now president of New River Marketing Research in Concord, Mass. He holds a doctorate in computer science.
What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at firstname.lastname@example.org.
This was first published in November 2010