Getting the Most Out of the Deduplication Option and Deduplication Storage Folders

Getting the Most Out of the Deduplication Option and Deduplication Storage Folders

Article: 100003452
Last Published: 2021-08-10
Ratings: 2 1
Product(s): Backup Exec


This document provides information about using the Deduplication Option 2010 R2 and later with deduplication storage folders. It does not necessarily apply to the Deduplication Option when it is used with OpenStorage devices.


How Backup Exec Deduplication Works With Deduplication Storage Folders

Deduplication works by dividing data into 128K segments and then storing the segments in a deduplication storage folder, along with a database that tracks the segments.  Data is not stored again when a backup encounters a segment that is already stored in the deduplication storage folder.  So, if you back up the same unchanged file over and over again, it is stored only one time in the deduplication storage folder.

Where the Backup Exec Deduplication Option Works Best

Deduplication happens only when the Deduplication Option detects blocks of data that are in fact the same.  Operating system files deduplicate well.  They are the same across multiple systems and do not change often.  

Deduplication works well in the following scenarios:

  • With Windows and Linux file system data
  • Where the same file is backed up multiple times
  • Where the percentage of data that changes is small

Where Other Backup Exec Options Work Best

Deduplication does not work well if data changes frequently or if the Deduplication Option cannot detect the duplicated blocks of data.  For example, when a new bit of data is inserted at the beginning of a large file, the blocks of data are shifted so that none of them will match.  Therefore, the file is not deduplicated.

This segment shift works against the Deduplication Option in cases where a non-file system backup is sent to the deduplication storage folder.  These backups appear as one very large stream to the deduplication storage folder.  Because of this, adding data early in the data stream causes the rest of the data stream to deduplicate poorly, if at all (Example: Exchange Database maintenance).

Expectations for the Deduplication Option

Deduplication is data-dependent.  That is, the amount of deduplication that you are going to get out of a particular data set depends on what is in the data set.  Data that is all unique is not going to benefit from deduplication.  Data that contains many copies of the same data will benefit from deduplication.

If there is a terabyte of source data that doesn’t have any duplicate information in it, the deduplication storage folder is going to need a terabyte of space to store it.

A deduplication storage folder has significant memory and disk space requirements.  Make sure to review the requirements for the Deduplication Option before implementing it.  While the option may initially work on a system that does not meet these requirements, as time goes by and the deduplication storage folder fills up, a lack of memory and disk space will cause problems.
A deduplication storage folder is significantly more complex than a backup-to-disk folder (known as disk storage in Backup Exec 2012).  Detecting duplicate data, tracking it in a database, and managing the interconnected links in the deduplication folder all adds up to significant memory and CPU usage.  Memory, processing, and time is traded for reduced storage space requirements.  This trade-off needs to be considered when choosing to use a deduplication storage folder over a backup-to-disk folder (or disk storage for Backup Exec 2012).




Was this content helpful?