About best practices for using the FileStore deduplication feature

  • Modified Date:
  • Article ID:000065638


About best practices for using the FileStore deduplication feature

The following are best practices when using the FileStore deduplication feature:

  • Deduplication is most effective when the file system block size and the deduplication block size are the same for file systems with block sizes of 4K and above. This also allows the deduplication process to estimate space savings more accurately.

  • The smaller the file system block size and the deduplication block size, the higher is the time required for performing deduplication. Smaller block sizes, for example, 1K and 2K, increase the number of data fingerprints that the deduplication database has to store.

    Though the file system block size is data-dependent, the recommended block size for optimal deduplication is 4K for file systems less than 1TB. For file systems 1TB and above, it is 8K.

  • For VMware NFS datastores that store Virtual Machine Disk Format (VMDK) images, a 4K block size is optimal.

  • Compressed media files for images, music, and video, like JPEG, mp3, .MOV, and databases do not deduplicate or compress effectively.

  • Home directory file systems are good candidates for deduplication.

  • Data archive and retention (DAR) file systems may be good candidates for deduplication depending on the workload. Deduplication is more effective if deduplication is turned off in Symantec Enterprise Vault (EV) prior to running FileStore deduplication.

  • Deduplication is a CPU and I/O intensive process. It is a best practice to schedule deduplication when the load on your file systems is expected to be low.

  • Evaluation of changes in the file system is done by the file system's File Change Log (FCL). Setting the frequency on a too infrequent basis may cause the FCL to rollover, thereby missing changes and deduplication opportunities to the file system.

  • After enabling deduplication on file systems with existing data, the first deduplication run does a full deduplication. This can be time-consuming, and may take 12 to 15 hours per TB, so plan accordingly.

  • The deduplication database takes up 1% to 7% of logical file system data. In addition, during deduplication processing, an additional but temporary storage space is required. Though 15% free space is enforced, it is recommended to have 30% free space when the deduplication block size is less than 4096 (4K) bytes.

  • Any file system can be enabled for both DAR and deduplication, but you have to enable DAR first, then deduplication. The reverse sequence is not supported.

  • If you plan to use the deduplication scheduler, you must have a Network Time Protocol (NTP) server enabled and configured.

    See About coordinating cluster nodes to work with NTP servers.

Terms of use for this information are found in Legal Notices.



Did this article answer your question or resolve your issue?


Did this article save you the trouble of contacting technical support?


How can we make this article more helpful?

Email Address (Optional)