Problem solve Get help with specific problems with your technologies, process and projects.

Data growth: The worst enemy of your recovery time objective

If your data has grown to the point that your disaster recovery plan is unable to meet your recovery time objective, you may want to look at some new technologies to reduce the amount of data. These include deduplication, storage tiering, and refined data management policies that can work together to reduce the amount of physical data that needs to be restored after a disaster.

What do you do if your recovery data set has grown so large that traditional methods of disaster recovery (DR) won't meet your objectives? New technologies such as deduplication, storage tiering and data management policies can alleviate the high costs of a DR plan while also achieving the desired recovery time objective (RTO).

In a previous tip, I told the story of a company that neglected to keep its DR plan up-to-date. What hit it was the curse of ever-expanding data storage. Data growth over a four-year period has rendered its tape restore process completely inadequate for meeting its RTO. Once discovered, its solution was to change recovery technology from linear tape to asynchronously replicated storage co-located over 400 miles from its primary data center -- a solution that costs a great deal more money.

In today's economy, many IT shops do not have the budget to make this kind of change. Also, tape restores have always been slower than backups. The industry has been so focused on tape backup time windows that restore times tend to get lost. When was the last time you saw a vendor publish its technology's restore time?

There are a number of relatively new technologies on the market that, when coupled with some discipline, can help stave off the need to acquire a more expensive solution in order to meet RTOs. Let's look at each of these:

Data deduplication is a higher form of compression that seeks out duplicate files within a file system.

Data deduplication has been bantered around by vendors recently. If you have not done so yet, you should take a look at this technology. Data deduplication is a higher form of compression that seeks out duplicate files within a file system. This technology can be implemented at the data block level -- seeking duplicate data blocks within a disk volume. What does this give you? It finds and removes duplicate files or blocks from your storage volume and replaces them with small pointers to one "master" data file or block. The stored data size can be significantly reduced. Another advantage is that it can be used in conjunction with traditional file compression so that the "master" file is also smaller as a result of the compression algorithms.

The question to ask is: "Does data deduplication help me with all of the data in my data center?" The answer is no. This technology is especially helpful with unstructured data -- those file servers that hold employee office productivity files. Just think of the case when the human resources department sends out information on benefits open enrollment to all employees. Most employees are going to save that information -- and now you have nearly as many copies of the same information as you have employees. And worse, you are forced to restore all those copies in the case of a disaster. But what about your structured data? Those databases consist of only a handful of large files, so file-level data deduplication is not going to help you. Block-level data deduplication can, however. The savings may not be as great, but in today's world, every little bit helps. Where similar database records yield identical data blocks, those can be deduplicated. However, databases with little repetitive information will not yield noticeable savings.

Most backup and restore vendors' latest products include data deduplication. You should consider upgrading to leverage these features, especially for your unstructured data volumes. Smaller data sets will always result in shorter restore times in a DR situation.

Applying storage tiering where the 20% is maintained on the primary tier and the older 80% is maintained on the secondary tier can speed restore times of the primary data by five times.

Storage tiering may offer relief to data growth as well. Older data is moved to secondary tiers of storage. Vendor products that can split off the secondary storage tier can help you meet your RTOs. The process involves a level of data categorization, meaning that the data on the primary tier is most important and is what the RTO applies to. The data on the secondary tier is not as important and can be recovered later. Older data that is not required for day-to-day business operations but must be available for regulatory or other reasons can be restored later. Let's look at an example. A data volume holding unstructured data typically has only up to 20% of its data in active use. Eighty percent of the data has not been accessed for six months or longer. Applying storage tiering where the 20% is maintained on the primary tier and the older 80% is maintained on the secondary tier can speed restore times of the primary data by five times. RTOs for the business can easily be met. However, don't forget to designate RTOs for the secondary data. It will eventually need to be recovered.

Unfortunately, the number of data tiering solutions that allow for split recovery of primary and secondary tiers is limited. Make sure you check with storage tiering vendors to ensure that the primary can be restored without the secondary tier. Hopefully more solutions will appear in the future.

Data management policies within your organization can also help get a handle on your RTOs. Similar in concept to storage tiering, this includes corporate data deletion policies of older data after it has been archived to offline media (tape, DVD, etc.). Such policies groom the active data by removing older data to keep the size of your active data volumes in check. This is accomplished by automatically purging database records and deleting files that are older than a particular date. Policies need to be determined per data type and importance. While many financial records must be kept permanently, most do not need to be kept on active storage for longer than three years. A good archive of the data is sufficient. Often times end-user office productivity data doesn't need to be kept on active storage for more than 18 months. These policies depend greatly on the line of business and other regulatory requirements, so these approaches may not work in all cases.

The drawback with data management policies is they require some level of effort to manage and audit. Deleting data that has not been archived could spell regulatory problems for a company. The recent Federal Rules of Civil Procedure changes will overrule deletion policies if a legal hold is declared in your organization. Additionally, search-engine technologies can make all data appear as if it has been recently accessed. Search engines must open and read the entire file to create content indexes. Make sure your search technology keeps a record of those files that have already been indexed, otherwise an automated deletion system will never find any files that have not been recently accessed.

If your data growth rate is such that these measures will only buy you a short amount of time, you still need to implement alternative technologies to ensure you can continue to meet your RTOs.

Let us know what you think about this tip. Was it useful? Email Site Editor, Matt Stansberry.

Dig Deeper on Enterprise data storage strategies

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.