NetBackup deduplication client WAN backup: how to seed the fingerprint cache to speed up the initial backup

Article: 100003816
Last Published: 2014-06-16
Ratings: 0 0
Product(s): NetBackup & Alta Data Protection

Problem

The first backup of a large data set from a NetBackup deduplication client from a remote site to a data center over a WAN can be very time consuming.

Cause

Because it's the first backup, the NetBackup Deduplication Engine on the storage server/media server at the data center has no knowledge of the client's data. Therefore, all of the client data must be sent over the WAN to the storage server at the data center.

Solution

The solutions require that you seed the fingerprint cache. The fingerprint cache contains a set of fingerprints that are known to exist within the storage server. When a backup is run, fingerprints are checked against this cache first before querying the storage server. When the cache hit rate is high, network communication between the client and storage server is significantly reduced. Reducing such communication is especially important when using client direct to backup remote clients over high latency networks.

Determining which fingerprints should be loaded into the fingerprint cache before the backup begins is essential to ensuring a high cache hit rate. By default, fingerprints from the previous full backup, and subsequent incremental backups, are loaded into the cache. Fingerprints for a previous backup are found by examining backup images for that client and policy within the PureDisk Catalog:

<Deduplication Catalog Home>/<client>/<policy name>/*.img  

The first step in seeding a client is to identify the source client and policy to seed from. This can be a backup policy from a similar client, or a seeding policy that was used to backup data from the client locally on the storage server from a transfer drive. Consider these scenarios:

  • A new remote Windows 2008 client, windows123, is being backed up for the first time. The majority of the data on windows123 are Windows system files. Several Windows 2008 machines are already being backed up. Picking one of the existing Windows 2008 machines as the seeding client/policy would be a good candidate. Loading cache from a similar Windows 2008 machine should produce a high cache hit rate, as most of the data being backed up in this case would be Windows system files, which should be similar between the two clients.
  • A new remote Windows 2008 client, windows456, is being backed up for the first time. The majority of data on this system is user data. Data from windows456 has been copied to a transfer drive and sent to the site where the storage server is located at. The contents of the storage drive have been backed up to the storage server locally. The client and policy used for this backup should be used as the seeding policy.

To ensure that NetBackup uses the seeded backup images, the first backup of a client after you configure seeding must be a full backup with a single stream. Specifically, the following two conditions must be met in the backup policy:

  • The Attributes tab "Allow multiple data streams" attribute must be unchecked.
  • The backup selection cannot include any NEW_STREAM directives.

If these two conditions are not met, NetBackup may use multiple streams. If the Attributes tab "Limit jobs per policy" is set to a number less than the total number of streams, only those streams use the seeded images to populate the cache. Any streams that are greater than the "Limit jobs per policy" value do not benefit from seeding, and their cache hit rates may be close to 0%. After the first backup, you can restore the original backup policy parameter settings.

The following are the solutions, which depend on the NetBackup release level:

 

For NetBackup 7.0, 7.0.1, and NetBackup 7.1

Note: This method works on all deduplication releases of NetBackup. However, changes in NetBackup 7.1.0.4 and 7.5 add seeding into the product; therefore, Veritas recommends that you use the other seeding methods for those releases.

  1. At the remote office, copy the data set to a removable storage device such as a portable disk drive or large USB flash drive.
  2. Send that portable device to a NetBackup administrator at the data center.
  3. At the data center, attach the portable device to a computer of the same type as at the remote site.
  4. At the data center, create and run a backup policy that backs up the data set.  Because you want to back up only the data set, select only the storage location on which the data set resides.  Do not select All local drives.
    The backup must be to the same Media Server Deduplication Pool that will receive the remote client's backups. The backup does not have to be from a deduplication client; the storage server can do the deduplication.
  5. On the MSDP storage at the data center, locate the catalog files for the DataCenterClient's backup.  The catalog files are stored in the MSDP storage directory structure and use the client name and the backup policy name, as in following example:
      E:\DedupeStorage\databases\catalog\2\DataCenterClient\Backup_Policy_DC
  6. Create the directories on the MSDP database directory that correspond to the catalog location for the remote deduplication client.  The directory names use the client name and the backup policy name, as in the following example:
    E:\DedupeStorage\databases\catalog\2\RemoteClient\Backup_Policy_Remote
  7. Using command-line 'copy' command or 'xcopy' command, copy all of the catalog files from the DataCenterClient catalog directory into the catalog directory for the remote client. Note: Using Windows Explorer to copy/paste this data can unintentionally copy/create files like Shortcut.lnk and Thumbs.db to the destination location. If you use Windows Explorer, be sure to configure Explorer to show hidden files and operating system files. Do not copy __dirpo__ files along with the image files.
  8. Rename all of the files so that the client name portion of the file names correspond to the client name of the remote client, as shown in the following example:
      E:\DedupeStorage\databases\catalog\2\RemoteClient\Backup_Policy_Remote
  9. Start the remote client backup. Only data that is unknown to the NetBackup Deduplication Engine should be backed up.
  10. After the remote client backup completes, delete the catalog files created in step 8.

For NetBackup 7.1.0.4 and later, NetBackup 7.5 and later, and NetBackup 7.6 and later

The following table shows the NetBackup releases on which the two seeding methods are supported.

Seeding methods support
Seeding host Supported releases
Configuring seeding on the client

NetBackup 7.1.0.4 and later

NetBackup 7.5 and later

NetBackup 7.6 and later

Configuring seeding on the storage server

NetBackup 7.1.0.4 and later

NetBackup 7.5.0.2 and later

NetBackup 7.6 and later

 

 

Configuring  seeding on the client 

Note: Applies to NetBackup 7.1.0.4 and later, NetBackup 7.5 and later, and NetBackup 7.6 and later.

Seeding configuration on the client is accomplished by setting the FP_CACHE_CLIENT_POLICY field in the pd.conf file of the new client:
 
FP_CACHE_CLIENT_POLICY = clienthostmachine,backuppolicy,date
 
This setting consists of three fields:
  • clienthostmachine – Name of source seeding client
  • backuppolicy – Name of source seeding policy
  • date – Last date to use this setting, in mm/dd/yyyy format. This date expires this setting, in case it has not been removed from pd.conf. If the setting was not expired, then the cache from this client and policy would continue to be loaded even after the first backup has been established from this client.
For example, if the client to seed from is windows789, the policy to seed from “full_local_drives”, and the current date January 19, 2012, then FP_CACHE_CLIENT_POLICY would be set to:
 
FP_CACHE_CLIENT_POLICY = windows789,full_local_drives,01/19/2012
 
Using the example setting above, the fingerprint cache would be populated using backup images from:
 
<Deduplication Catalog Home>/windows789/full_local_drives/*.img
 
If a large number of clients need to be seeded, then configuring this way is not recommended as it requires manual effort on each client. Instead, client seeding should be configured on the storage server, which will be described next.
 
 
 

Configuring  seeding on the storage server

Note: Applies to NetBackup 7.1.0.4 and later, NetBackup 7.5.0.2 and later, and NetBackup 7.6 and later.

Note: Veritas recommends that you use this seeding method for client-side deduplication backups. This seeding method is not intended for normal deduplication backups because it will affect all backup jobs on the storage server.

Note: The client and policy names used by seedutil are treated as case sensitive and need to match the case of the client and policy names in NetBackup.

A more flexible method of configuring client seeding is to use the new “seedutil” utility. This utility, run on the storage server, creates a special seeding directory in the PureDisk catalog for a client, and populates it with image references to another source client and policy’s backup images. The special seeding directory will appear in the PureDisk catalog as follows:

<PureDisk Catalog Home>/#pdseed/<client>
 
When a backup is run on a client, a check will be made first to find images from the previous backup. If images are found, they are used for the cache. If no images are found for a previous backup, a second check will be made to find images in the special seeding directory.
 
Changes to the special seeding directory should be made using the seedutil program. Assuming default installation locations, this utility can be found at:
  • UNIX:  /usr/openv/pdde/pdag/bin/seedutil
  • Window:  C:\Program Files\Veritas\pdde\seedutil.exe
Full usage info for this tool is as follows (from seedutil -help):
 
Usage: seedutil[-v <log level>] [-seed -sclient <source client name> -spolicy <policy name> -dclient <destination client name> [-backupid <backup id>]] [-clear <client name>] [-clear_all] [-list_clients] [-list_images <client name>] [-help]
 
where:
 -v : Verbose mode
 
-seed -sclient <client name> -spolicy <policy name> -dclient <destination client name>
                       : Create links in the <destination client name> directory to all the
                       *.img, *.fmk and *.hdr files found in the path <client name>/<policy name>
 
-seed -sclient <client name> -spolicy <policy name> -dclient <destination client name> -backupid <backup id>
                       : Create links in the <destination client name> directory to all *.img,
                       *.fmk and *.hdr files found in the path <client name>/<policy name> that
                       have <backup id> in their names
 
-clear <client name> : Clear the contents of the directory specified by <client name> in the
                       seeding location
 
-clear_all           : Clear the contents of the seeding directory
 
-list_clients        : List the contents of the seeding directory
 
-list_images <client name> : List the contents of the <client name> directory
 
The most common options will be -seed, -list_images and -clear. The -seed option will setup a special seeding directory for a client and populate it with references to images from another client and policy. The -list_images option can be used to verify the contents of the special seeding directory for a client. Finally, the -clear option should be run to remove the contents of the special seeding directory after it is no longer needed
 
For example, assume two new remote clients, remote_client1 and remote_client2, are being backed up for the first time. Data for both clients has been copied via a transfer drive and backed up locally to the media server media1, using a policy called “transfer_drive”. 
  1. Run the following commands on the media server to setup a special seeding directory using the transfer_drive backup images for each client:

    $ seedutil -seed -sclient media1 -spolicy transfer_drive -dclient remote_client1
    $ seedutil -seed -sclient media1 -spolicy transfer_drive -dclient remote_client2
     
  2. Verify the seeding directory has been populated for each client:

    $ seedutil –list_images remote_client1
    $ seedutil –list_images remote_client2
     
  3. Run backups for remote_client1 and remote_client2.
     
  4. Clean-up the special seeding directory.:

    $ seedutil –clear remote_client1
    $ seedutil –clear remote_client2  
Clearing the special seeding directory is important. The source backup images referenced in the special seeding directory will not be expired until they are no longer referenced. To help with this, the special seeding directory for a client will automatically be cleared whenever an image is expired by NetBackup for that client. That being said, it is good practice to explicitly cleanup the special seeding directory when it is no longer needed.
 
Considering all seeding configuration techniques, NetBackup choses the directory for fingerprint cache loading in the following order:
 
1.         Client and policy set in FP_CACHE_CLIENT_POLICY, if it is not expired
2.         Client and policy from previous backup
3.         Special seeding directory, if no images from previous backup were found
 
The following flow chart shows the order in which fingerprint cache is loaded:
 
 
These improvements provide the ability to seed the cache for remote client backups, which should significantly improve the initial backup performance with client direct.
 
 

 

 

 

 


Applies To

A NetBackup client at a remote office deduplicates its own data.  The data is a large data set, such as a database.  The data is sent over a WAN to a Media Server Deduplication Pool at a different site (a data center).  All hosts are in the same NetBackup domain.

Was this content helpful?