How to adjust the MSDP configuration for sampling and predictive cache from the deduplication shell

Article: 100060440
Last Published: 2025-09-22
Ratings: 2 0
Product(s): NetBackup, Appliances

Description

For the new installations starting from NBU 10.2, there is a new set of parameters that can be set with the dedupe shell (WORM CLISH), but the dedupe guide document of 10.2 on this topic was missing. We address it here for the possible incoming queries, mainly about the concepts for sampling cache and predictive cache, and how to adjust their related parameters when needed.

Note: Performance problems may be observed after upgrades of NetBackup Appliances from 4.1.x.x, 5.0.x.x, and 5.1.x.x. Symptoms may include slow backups and replications after upgrade, login to appliance takes several minutes, reboot of appliance resolves performance problems for a short time but performance problems return. Those symptoms may be resolved by editing the contentrouter.cfg on the appliance and enable Predictive Sampling Cache with default settings as listed in the MSDP non-BYO platforms table below.

Introduction to sampling and predictive cache

 

MSDP uses memory up to a size configured in MaxCacheSize to cache fingerprints for efficient deduplication lookup. A new fingerprint cache lookup data scheme introduced in NetBackup release 10.1 reduces memory usage.

It splits the current memory cache into two components, sampling cache (S-cache) and predictive cache (P-cache):

      S-cache caches a percentage of the fingerprints from each backup and is used to find similar data from the samples of previous backups for deduplication.

      P-cache caches the fingerprints that are most likely to be used in the immediate future for deduplication lookup.

At the start of a job, a small portion of the fingerprints from its last backup is loaded into P-cache as initial seeding. The fingerprint lookup is done with P-cache to find duplicates, and the lookup misses are searched from S-cache samples to find the possible matches of previous backup data. If found, part of the matched backup fingerprints is loaded into P-cache for future deduplication.

The S-cache and P-cache fingerprint lookup method is enabled for local and cloud storage volumes with MSDP non-BYO deployments including Flex, Flex Worm, Flex Scale, NetBackup Appliance, AKS, and EKS deployment. This method is also enabled for cloud-only volumes for MSDP BYO platforms. For platforms with cloud-only volume support, local volume still uses the original cache lookup method. S-cache and P-cache configuration parameters can be found under the Cache section of the configuration file contentrouter.cfg.

From NetBackup 10.2, the S-cache and P-cache fingerprint lookup method for local storage is used with the new setup for Flex, Flex WORM, and NetBackup Appliance. The upgrade does not change the S-cache and P-cache fingerprint lookup methods.

The default values for MSDP BYO platforms:

Configuration Default value
MaxCacheSize 50%
MaxPredictiveCacheSize 20%
MaxSamplingCacheSize 5%
EnableLocalPredictiveSamplingCache (contentrouter.cfg) false
EnableLocalPredictiveSamplingCache (spa.cfg) false


The default values for MSDP non-BYO platforms:

Configuration Default value
MaxCacheSize 512MiB
MaxPredictiveCacheSize 40%
MaxSamplingCacheSize 20%
EnableLocalPredictiveSamplingCache (contentrouter.cfg) true
EnableLocalPredictiveSamplingCache (spa.cfg) true


For MSDP non-BYO deployments, the local volume and cloud volume share the same S-cache and P-cache size. For the BYO deployment, S-cache and P-cache are only for cloud volume, and MaxCacheSize is still used for local volume. In case the system is not used for cloud backup, MaxPredictiveCacheSize and MaxSamplingCacheSize can be set to a small value, for example, 1% or 128MiB. MaxCacheSize can be set to a large value, for example, 50% or 60%. Similarly, if the system is used for cloud backups only, MaxCacheSize can be set to 1% or 128MiB, and MaxPredictiveCacheSize and MaxSamplingCacheSize can be set to a larger value.

The S-cache size is determined by the back-end MSDP capacity or the number of fingerprints from the back-end data. With the assumption that the average segment size is 32KB, the S-cache size is about 100MB per TB of back-end capacity. P-cache size is determined by the number of concurrent jobs and data locality or working set of the incoming data with a working set of 250MB per stream (about 5 million fingerprints). 

          Example: 100 concurrent streams need a minimum memory of 25GB(100*250MB). 

The working set can be larger for certain applications with multiple streams and large data sets. As P-cache is used for fingerprint deduplication lookup and all fingerprints that are loaded into P-cache stay there until its allocated capacity is reached, the larger the P-cache size, the better the potential lookup hit rate, and the more memory usage. 
Under-sizing S-cache or P-cache leads to reduced deduplication rates and over-sizing increases the memory cost.

 

Tuning the MSDP configuration from the deduplication shell

 

The default MSDP configuration should work for most installations. However, if you need to adjust, use the following restricted shell commands to set or view the parameters.

Parameter Description Commands
AllocationUnitSize The allocation unit size for the data on the server To set the parameter: setting set-MSDP-param allocation-unit-size value=<number of MiB>

To view the parameter: setting get-MSDP-param allocation-unit-size
DataCheckDays The number of days to check the data for consistency To set the parameter: setting set-MSDP-param data-check-days value=<number of days>

To view the parameter: setting get-MSDP-param data-check-days
LogRetention The length of time to keep logs To set the parameter: setting set-MSDP-param log-retention value=<number of days>

To view the parameter: setting get-MSDP-param log-retention
MaxCacheSize The maximum size of the NetBackup Deduplication Engine(spoold) fingerprint cache To set the parameter: setting set-MSDP-param max-cache-size value=<number of GB>

To view the parameter: setting get-MSDP-param max-cache-size
MaxRetryCount The maximum number of times to retry a failed transmission To set the parameter: setting set-MSDP-param max-retry-count value=<number of retry times>

To view the parameter: setting get-MSDP-param max-retry-count
SpadLogging The log level for the NetBackup Deduplication Manager(spad) To set the parameter: setting set-MSDP-param spad-logging log_level=<value>

To view the parameter: setting get-MSDP-param spad-logging
SpooldLogging The log level for the NetBackup Deduplication Engine(spoold) To set the parameter: setting set-MSDP-param spoold-logging log_level=<value>

To view the parameter: setting get-MSDP-param spoold-logging
WriteThreadNum The number of threads for writing data to the data container in parallel To set the parameter: setting set-MSDP-param write-thread-num value=<number of threads>

To view the parameter: setting get-MSDP-param write-thread-num
MaxCacheSize
(Cluster)
The maximum size of the spoold fingerprint cache for all nodes in a cluster To set the parameter: setting set-MSDP-param max-cache-size-cluster value=<number>

To view the parameter: setting get-MSDP-param max-cache-size-cluster
MaxPredictiveCacheSize
(Cluster)
The maximum size of the spoold predictive cache for all nodes in a cluster To set the parameter: setting set-MSDP-param max-predictive-cache-size-cluster value=<number of bytes>

To view the parameter: setting get-MSDP-param max-predictive-cache-size-cluster
MaxSamplingCacheSize
(Cluster)
The maximum size of the spoold sampling cache for all nodes in a cluster To set the parameter: setting set-MSDP-param max-sampling-cache-size-cluster value=<number of bytes>

To view the parameter: setting get-MSDP-param max-sampling-cache-size-cluster
UsableMemoryLimit
(Cluster)
The maximum usable memory size in spoold for all nodes in a cluster  o set the parameter: setting set-MSDP-param usable-memory-limit-cluster value=<number>

To view the parameter: setting get-MSDP-param usable-memory-limit-cluster
EnableLocalPredictive
SamplingCache(Cluster)
The parameter to enable or disable the local predictive sampling cache for all nodes in the cluster. Both spoold and spad have this parameter, and it should be synced between them. To set the parameter: setting set-MSDP-param enable-local-predictive-sampling-cache-cluster value=<true/false>

To view the parameter: setting get-MSDP-param enable-local-predictive-sampling-cache-cluste


 

 

References

Etrack : 4133395

Was this content helpful?