How to complete MSDP Fingerprint Pre-Conversion - Best Practices

How to complete MSDP Fingerprint Pre-Conversion - Best Practices

Article: 100045193
Last Published: 2019-10-30
Ratings: 16 1
Product(s): NetBackup

Description

MSDP Fingerprint Pre-Conversion Best Practices

Contents 

 

Executive Summary 

NetBackup 8.1+ changes the MSDP fingerprint algorithm from MD5 to SHA2. This change was necessary to meet current industry security standards such as FIPS 140-2.  As fingerprint comparisons are central to the function of a deduplication system, all existing data in the pool must have fingerprints converted from MD5 to SHA2. Several issues that can occur during this process have been identified that can result in poor performance or unexpected storage growth.                         

This document is targeted at upgrade from pre-8.1 (NBU Software)/3.1 (NBU Appliance) to 8.1/3.1 or later. Fingerprint conversion is a onetime event and there are no future conversions required on subsequent releases (i.e.  3.1 to 3.1.1 or 3.1.2). Careful planning of the upgrade is critical to success. This document provides best practices for this upgrade and will assist in planning to avoid most issues that may arise. 

Please read this document in its entirety to understand the entire process and incorporate the relevant best practices into your upgrade plan. If additional questions arise, please contact your Veritas Account Team or Veritas Technical Support. 

 

Background Information 

MSDP Structure 

To fully understand the conversion process and its impact on NetBackup, it is important to understand the basic structure of MSDP. There are several terms that will be used throughout this document: 

Term 

Definition 

Deduplication Manager (spad) 

Process that collects metadata information about backup jobs. 

Deduplication Engine (spoold) 

Process that manages the storage of deduplicated data. 

Segment Object (SO) 

One deduplicated segment of data. Identified with an MD5 (Pre-8.1) or SHA2 fingerprint. SO’s are stored in Data Containers. SO fingerprints are converted from MD5 to SHA2 after upgrade to 8.1+. 

Data Container 

Physical file that contains data in the deduplication pool. Can contain multiple SO’s and DO’s. Managed by the Deduplication Engine.Each data container has a Data Container ID (DCID). 

Data Object (DO) 

Contains map of SO’s needed to reconstruct a file stored in MSDP. Corresponds to a single PO. Each DO has its own fingerprint. Pre-conversion DO’s use an MD5 hash. Post conversion DO’s use a SHA2 fingerprint. The DO fingerprints are not converted as they are used for internal MSDPO reference only. Stored in Data Containers. 

Path Object (PO) 

Contains metadata information about the backup data such as image name, path, and other information. Contains a reference to the corresponding DO fingerprint. Stored in the Deduplication Managers mini-catalog. 

Deduplication Plugin (pdplugin) 

Deduplication OST plugin. Performs data segmentation and fingerprinting. Runs on the MSDP server and fingerprint media servers. Also runs on the client when Client Direct is used. 

Client Fingerprint Cache 

In memory fingerprint cache for a given client-policy combination that is created dynamically as needed by the deduplication plugin. Contains the most recent fingerprints for the client. The default size of this cache is 20 MB 

Global Fingerprint Cache 

In memory fingerprint cache containing the most recent fingerprints from the entire MSDP pool. 

Client Direct 

Also known as client-side deduplication. Deduplication plugin runs on the client and communicates directly with the deduplication engine. 

Fingerprint Map 

During the conversion a map is created for each container that contains the MD5 to SHA2 mappings for each converted SO. 

Mount Point 

Large MSDP pools will have multiple filesystems in use to store the MSDP data. Each of these is a mount point.  

 

Objectives During Conversion 

There are two primary objectives during fingerprint conversion: 

  1. Limit impact on customer operations and backup performance 

  1. Limit unexpected data growth during conversion 
     

Pre vs Post Upgrade Conversion 

Fingerprint pre-conversion is available for NetBackup 7.7.3 and 8.0 MSDP pools running on Windows, RHEL, Flex, and NetBackup Appliance platforms only. On these platforms, this is the recommended procedure to follow, as it creates the appropriate MD5 to SHA2 fingerprint maps before they are needed. This has multiple advantages: 

  1. Reduced operational impact from the conversion process. Since the SHA2 fingerprints are calculated before they are needed, conversion can be performed at a slower rate without any impact to the operation of the system. 

  1. Reduced risk. Pre-calculation of SHA2 fingerprints reduces several areas of risk, such as unexpected storage growth from fingerprint cache overflows and conversion failures due to undetected issues in the pool. 
     

Planning the Upgrade 

Planning the upgrade is critical to success 

  1. Plan an Outage. Expect it to be longer than a typical outage due to installation of post-upgrade EEBs and setting of tuning parameters. If possible, take an extended outage to allow fingerprint conversion to run without competition for system resources. 

  1. Inventory your systems: 

  1. Masters 

  2. Media Servers, identifying replication and opt-dup pairs 

  3. Clients using Client Direct 

  4. Policies using Accelerator 

  1. Obtain the Software and required EEBs. 

  1. Understand normal operating parameters (system load, throughput, deduplication rates) before the upgrade. 
     

Running Commands on Appliances 

There are several command line tools referenced in this document. These commands can be run from a NetBackupCLI user and do not require the elevated Maintenance prompt.  
 

Required EEBs 

The following hotfixes/EEBs are available in the Download Center.  For an overview on how to use the Download Center go here.

Pre-Upgrade for Systems Supporting Pre-Upgrade Conversion 

These EEBs should be applied to all systems using the pre-upgrade conversion method. 

NetBackup 7.7.3/NetBackup Appliance 2.7.3 

NetBackup 7.7.3 / 2.7.3 HotFix - MD5-SHA256 Fingerprint Pre-conversion (article 100044681)

NetBackup 8.0/NetBackup Appliance 3.0 

NetBackup 8.0 / 3.0 MSDP Hotfix - MD5-SHA256 FP pre-conversion (article 100044682)

Post-Upgrade, all systems 

The following hotfixes/EEBs are available in the Download Center.  For an overview on how to use the Download Center go here.

These EEBs should be applied to all system post-upgrade. 

NetBackup 8.1 / NetBackup Appliance 3.1: 

NetBackup 8.1 / 3.1 MSDP EEB Bundle (article 100041057)

NetBackup 8.1.1 / NetBackup Appliance 3.1.1: 

NetBackup 8.1.1 / 3.1.1 MSDP hotfix - EEB Bundle (article 100043461)

NetBackup 8.1.2 / NetBackup Appliance 3.1.2: 

NetBackup 8.1.2 / 3.1.2 MSDP HotFix - post-upgrade performance issues (article 100044088)

The features and best practices shared in this document depend on these EEBs being present.  
 

Pre-Upgrade Conversion Process and Performance 

The pre-upgrade conversion EEBs create the required MD5->SHA2 map files before the upgrade is performed. The conversion process is managed via two lists of DCID’s: 

  1. Normal DCID List: Created at spoold startup, all DCIDs are added to this list. A separate list is created for each mount point. 

  1. Prior DCID List: All newly created DCID’s referenced by new backup images are appended to this list. A separate list is created for each mount point. 

Each conversion thread will fetch a batch DCIDs from the Prior DCID List. If the Prior DCID list is empty, it will fetch DCID’s from the Normal DCID list. After each batch is processed, the thread will sleep for a pre-defined period. 

There are three modes that can be used.  

Mode 

Switch 

Description 

Normal 

Default mode. One thread is created for each mount point. The priority of this thread is higher than CRC checking but lower than compaction.  

Fast 

Fast mode: In the fast mode, the data conversion disables cyclic redundancy checks and compaction. Multiple conversion threads are created for each mount point, and these run at the same priority as backup threads. 

Single Mount Point Fast Mode 

Like Fast mode, except only one mount point runs in fast mode. All other mount points run in normal mode. Once the fast mode mount point completes, the process will switch and run the next mount point in fast mode. Available on appliances only. 

 

Client Fingerprint Cache 

The deduplication plugin loads fingerprints into a cache for each client backup. This cache is used to determine if data segments are new or not. If a fingerprint is not found in the client fingerprint cache, the segment is passed to the deduplication engine and compared against the global fingerprint cache. If there is no match in the global fingerprint cache, the segment is considered new and is written to disk.  
 

Pre-Upgrade Conversion Process Considerations and Best Practices 

The pre-upgrade conversion EEBs provide the least disruptive process for fingerprint conversion, and generally also requires less monitoring and intervention during the process.  

 

Stage 1: Tuning parameters after applying 8.0/7.7.3 EEB (several days) 

After installing the pre-conversion EEB, fingerprint conversion will run automatically in normal mode. Allow the process to run for 24-48 hours with normal backup and replication jobs. After this time, the progress of the conversion can be checked with: 

nbuser@appliance:~> crcontrol –-fpconvertstate 
***** Fingerprint Conversion (MD5 to SHA256) ****** 
Status                    : ON 
Mode                      : Normal 
Busy                      : No 
Converted Prior DC count  : 9 
Total Prior DC Count      : 10 
Converted Normal DC Count : 10 
Total Normal DC Count     : 1000 
Converted percentage      : 1.9%(total), 90.0%(prior), 1.0%(normal) 

 

There will be some inaccuracy in this report, as the DC count is not decreased at the time of container deletion for performance reasons. If a more accurate picture of progress is required, progress can be re-calculated with: 

nbuser@appliance:~> dcscan --check-fp-convert 
Container count need to be converted : 991 
Total container count                : 1010 
Converted percentage                 : 1.9% 

 

There are two items to check at this point. First, we want to be sure that the converted percentage of normal DCIDs shows progress. If it does not, that indicates that we are only converting new DC’s, and we will need to adjust to make the conversion process more aggressive. The total converted percentage can be used to extrapolate how long the conversion is expected to take. For example, if 3.3% of the pool is completed in 24 hours, the total conversion should take about 30 days. If the conversion rate is slower than desired, adjustments should be made to make the process more aggressive.  

 

The minor tuning is needed, the following parameters can be added in contentrouter.cfg. The example below shows the default values: 

[FPConvert] 
FPConvertCheckBusyEachTime =true 
FPConvertSleepSeconds=5 
FPConvertBatchNum=20  

 

This section does not exist by default and must be created.

The table below provides detail on these parameters.  

 

Parameter 

Default Value 

Effective 

Description 

FPConvertCheckBusyEachTime 

true 

Stop->Start Conversion 

If true, check to see if the pool is busy (backup/CRQP) for each container. If false, check to see if the pool is busy (backup/CRQP) each batch. Setting to false makes conversion more aggressive in normal mode 

FPConvertSleepSeconds 

Stop->Start Conversion 

The interval in seconds that threads sleep between converting each container in normal mode. 

FPConvertBatchNum 

20 

Stop->Start Conversion  

The number of containers in each batch in normal mode 

 

After changing any of these values, conversion must be stopped/started in order for the changes to take effect: 

nbuser@appliance:~> crcontrol --fpconvertoff 
FP conversion turned off 
nbuser@appliance:~> crcontrol --fpconverton 
FP conversion turned on 

 

If more aggressive tuning is required on a NetBackup Appliance, conversion should be switched to mode 2: 

nbuser@appliance:~> crcontrol -–fpconvertmode 2 
FP conversion changed to single mount point fast mode 

 

For this mode choice to be preserved across MSDP services restart, the following must be added to contentrouter.cfg: 

[FPConvert] 
FPConvertMode=2 

 

The number of threads running on each mount point in fast mode can be controlled with the following entry in contentrouter.cfg: 

[FPConvert] 
FPConvertThreadNum=2 

 

The default is 2. Note that this only impacts the mount points running in fast mode. Mount points running in normal mode will always use one thread. Changes to this parameter take effect by stopping/starting conversion as noted above. 

 

Stage 2: Running normal jobs with 8.0/7.7.3 EEB  

After initial tuning, allow the conversion to run, checking progress occasionally. Further tuning can be performed at any point during the process, if desired. For some heavily loaded systems, it may be necessary to switch between modes from time to time. For example, when full backups are running over the weekend, run the conversion in normal mode to reduce the impact to backup performance. After fulls have completed, switch to single mount point fast mode during the week when fewer backups are running. 

 

Stage 3: Finalizing Conversion 

Once conversion reaches over 90% complete, closer monitoring should be performed. The goal is to reach 100% of fingerprints converted before upgrading. The completed percentage should be monitored as it may plateau at a value below 100% depending on the workloads on the appliance. If backups are running continuously there will likely always be unprocessed containers in the prior container list. It is suggested that the process be allowed to run in normal mode until crcontrol –-fpconvertstate shows the normal DCID list at 100% completed. 

Once the normal DCID list is completed and the upgrade RPM and post upgrade EEB’s are downloaded, backups and SLPs should be suspended, and the fingerprint conversion run in fast mode to complete remaining containers. To ensure no additional data is written to the pool,  writing of new data should be disabled at the pool level: 

nbuser@appliance:~> crcontrol -m Put=No 

The PUT mode of the pool can be validated with: 

nbuser@appliance:~> crcontrol –-getmode 
Mode : GET=Yes PUT=No DEREF=Yes SYSTEM=Yes STORAGED=Yes REROUTE=No COMPACTD=Yes 

Note that any backups or replications to the pool will fail when in this mode. Setting “Put=No” with this method is not persistent across service restarts and the pool will be fully operational after a service restart. Conversion should then be switched to fast mode: 

nbuser@appliance:~> crcontrol --fpconvertmode 1 

P conversion changed to fast mode 

Monitor conversion until completed: 

# /usr/openv/pdde/pdcr/bin/crcontrol --fpconvertstate 
***** Fingerprint Conversion (MD5 to SHA256) ****** 
Status                    : ON 
Mode                      : Normal 
Busy                      : No 
Converted Prior DC count  : 0 
Total Prior DC Count      : 0 
Converted Normal DC Count : 10080 
Total Normal DC Count     : 10080 
Converted percentage      : 100.0%(total), 100.0%(prior), 100.0%(normal) 

 

At this point, the pool is ready to be upgraded. This final stage of conversion should only take a few hours at most. If the pool is upgraded before the conversion has reached 100%, storage growth up to the unconverted percentage of containers is possible. 

 

Stage 4: Upgrade to 8.1+ and applying 8.1+ EEB 

Follow standard upgrade procedures and immediately after upgrade apply the appropriate post-upgrade EEB. After upgrade, it is desirable to wait for fingerprint cache loading to complete before performing any backups. This can be done by monitoring the spoold log and waiting for the message that begins with “ThreadMain: Data Store nodes have completed cache loading” 

 

Stage 5: Post-Upgrade  

Post-Upgrade Background Processing 

After the upgrade has completed, the pool will show a low completion percentage: 

nbuser@appliance:~> crcontrol –-dataconvertstate 
***** Data Conversion ***** 
Status            : ON 
Mode              : Normal 
Busy              : No 
Current Group ID  : 8 
Current DCID      : 8192 
Min     Group ID  : 0 
Max     Group ID  : 10 
Progress          : 0% 

 

This behavior is normal and expected. While the SHA2 fingerprints exist and are being used, there are several tasks that must be completed still, such as any necessary encryption conversion from Blowfish to AES and moving the SHA2 fingerprints into the data containers from the map files. These processes can continue in the background at normal priority and the performance and storage utilization of the pool will not be impacted. 

 

Accelerator Backups  

Post-Upgrade, attention must be paid to accelerator backups. Until a successful full backup is run post-upgrade, all accelerator backups (full and incremental) will need to read all fingerprints for the previous full backup before they can run. This can cause client time-outs on busy systems resulting in job failures. In many cases the jobs will succeed on retry. Once a full backup has been completed post-upgrade, the system will again behave as expected. For this reason, it is recommended to run full backups for policies that use accelerator as soon as possible after upgrade. 

The failure due to client timeout can be mitigated by increasing the client read timeout on the media server for the impacted MSDP pool (both the host media server and any fingerprint media servers) from the default of 300 seconds to 3600 seconds. Making this change does not remove the recommendation to run a full backup as soon as possible as upgrade, as significant additional workload is generated reading all the fingerprints from the last full before the upgrade. The client timeout can be changed in the timeouts section of the relevant media server properties: 

 

Once the full backups have completed, the timeout should be returned to the normal value (default is 300). 

Best Practice (Performance): For policies that use accelerator, run a full backup (instead of incremental) as soon as possible after upgrade. If possible, stagger first runs of accelerator backups as much as possible to minimize load on the MSDP server. Temporarily increase the client read timeout to avoid backup failures. 

 

 

Was this content helpful?