Put 20 data center managers in a room and ask their opinions on the best data storage methods and you'll likely...
leave with a list of 45 ideas.
Storage is evolving rapidly after decades of stagnation, as highlighted by Dell's planned EMC acquisition. Choices range from open source to proprietary storage, from iSCSI block storage to Ceph, or from solid-state drives (SSD) to cloud-based offerings. Data centers must plan a roadmap through this maze.
Disk is dying
The slow hard disk drive (HDD) is completely out-matched as an enterprise data storage method by the SSD. The SSD supposedly costs a fortune, but plenty of these comparisons are simply aimed at creating fear, uncertainty and doubt. In the server, SSD beats the price of 10,000 RPM HDD.
The only place HDD is cheaper than SSD today is in low-end 3.5" drives. Even these bulk 3.5" HDDs are coming under pressure, with SSDs matching them size for size. There is still a premium for the SSD, but expect that to erode through 2016 and 2017.
SSD -- and all-flash arrays for larger data centers -- ought to be where you make storage investments. SSD performance significantly changes the data center dynamics. Using SSDs as a primary storage tier, data centers operators can add compression to the bulk secondary storage tier based on cheap hard drives.
Coupled with a rebalancing of primary and secondary data storage made possible by the speed of flash and SSD, we'll see many fewer terabytes in primary storage. Thanks to data compression, data centers will only need perhaps 20% of the capacity in the secondary compared to today's capacity estimates.
The result is footprint shrinkage and some big capital expenditure savings, as well as much better performance.
Backup and primary cloud storage
Cloud-based storage is probably on many minds as an option for backup and archiving. It makes sense -- the cloud automatically creates an offsite backup location for disaster recovery. But all backup is not equal. Google's Nearline is disk-based and reportedly beats access time to data with Amazon's Glacier by hours.
Beyond backup and archiving, cloud-based storage becomes a contested topic. There is always a debate about workloads living in public clouds or on premises. Companies increasingly commit important IT workloads to public clouds; the primary storage for these workloads has to move as well. Transferring data between in-house data center hardware and public clouds is slow and expensive, which creates a dilemma for the popular hybrid cloud approach.
Data storage methods for hybrid cloud workloads won't resolve any time soon. Telcos are coming up with a whole series of excuses why fiber isn't needed unless Google already offers it in a location.
How we store data is also in transition. Block storage has ruled us for so long that it's hard to drop the idea -- apps are built around it. File storage is actually just as fast. The real question, however, is can we use object storage instead?
Storage and computing vendor Data Direct Networks has demonstrated that object storage can be fast, but it is primarily targeted at storing complete objects and not files, such as databases, that workloads munch on all the time. This makes object store less useful when the app handles any data that changes frequently.
Object storage software has made progress on this issue. There is now a way of supporting block storage volumes as objects in the object store. This method of data storage works for a variety of object storage systems, including the popular Ceph open-source package. Such universal storage approaches will become the norm in future IT organizations, especially as big data and object-oriented storage patterns become dominant.
If all this isn't enough for the IT guys, software-defined storage (SDS) -- a combination of hype cycle and technology shift -- is designing the data services run in most appliances to run in VMs while the data is stored on much cheaper hardware from third parties.
SDS will be much more than storage resource management, but it will take time to become a mainstream alternative. Even so, it may be disruptive enough today to motivate EMC to sell out to Dell.
What to buy
Go easy on the 10,000 RPM storage media, since SSD beats it so easily. Data center managers should consider a mix of data storage methods. Invest in SSD/flash front-end storage and repurpose your HDD gear as bulk secondary storage compressed by the all-flash or a similar appliance. That should save quite a bit from the storage budget.
The high-volume vendors -- the ones that supply Google and AWS data centers -- will enter the enterprise market and channels with drastically lower pricing for arrays and especially for object stores than traditional storage suppliers. We already have Lenovo licensing IBM's storage and Supermicro delivering boxes; Quanta and the rest won't be far behind. Enterprises could pay a small premium over Google's bulk pricing for drives, waving goodbye to EMC's notion of over $1000 per drive.
These are good times for the data center managers discussing how they buy storage. Data centers will get lower costs and improved features. It takes work and some knowledge of where the industry moves to keep up, but the savings and benefits are worth it.
When should you leave data center ops?
APIs take over data center management
The changes ahead in server architectures