NetBackup deduplication client WAN backup: how to seed the fingerprint cache to speed up the initial backup
Problem
The first backup of a large data set from a NetBackup deduplication client from a remote site to a data center over a WAN can be very time consuming.
Cause
Because it's the first backup, the NetBackup Deduplication Engine on the storage server/media server at the data center has no knowledge of the client's data. Therefore, all of the client data must be sent over the WAN to the storage server at the data center.
Solution
The solutions require that you seed the fingerprint cache. The fingerprint cache contains a set of fingerprints that are known to exist within the storage server. When a backup is run, fingerprints are checked against this cache first before querying the storage server. When the cache hit rate is high, network communication between the client and storage server is significantly reduced. Reducing such communication is especially important when using client direct to backup remote clients over high latency networks.
Determining which fingerprints should be loaded into the fingerprint cache before the backup begins is essential to ensuring a high cache hit rate. By default, fingerprints from the previous full backup, and subsequent incremental backups, are loaded into the cache. Fingerprints for a previous backup are found by examining backup images for that client and policy within the PureDisk Catalog:
<Deduplication Catalog Home>/<client>/<policy name>/*.img
The first step in seeding a client is to identify the source client and policy to seed from. This can be a backup policy from a similar client, or a seeding policy that was used to backup data from the client locally on the storage server from a transfer drive. Consider these scenarios:
- A new remote Windows 2008 client, windows123, is being backed up for the first time. The majority of the data on windows123 are Windows system files. Several Windows 2008 machines are already being backed up. Picking one of the existing Windows 2008 machines as the seeding client/policy would be a good candidate. Loading cache from a similar Windows 2008 machine should produce a high cache hit rate, as most of the data being backed up in this case would be Windows system files, which should be similar between the two clients.
- A new remote Windows 2008 client, windows456, is being backed up for the first time. The majority of data on this system is user data. Data from windows456 has been copied to a transfer drive and sent to the site where the storage server is located at. The contents of the storage drive have been backed up to the storage server locally. The client and policy used for this backup should be used as the seeding policy.
To ensure that NetBackup uses the seeded backup images, the first backup of a client after you configure seeding must be a full backup with a single stream. Specifically, the following two conditions must be met in the backup policy:
- The Attributes tab "Allow multiple data streams" attribute must be unchecked.
- The backup selection cannot include any NEW_STREAM directives.
If these two conditions are not met, NetBackup may use multiple streams. If the Attributes tab "Limit jobs per policy" is set to a number less than the total number of streams, only those streams use the seeded images to populate the cache. Any streams that are greater than the "Limit jobs per policy" value do not benefit from seeding, and their cache hit rates may be close to 0%. After the first backup, you can restore the original backup policy parameter settings.
The following are the solutions, which depend on the NetBackup release level:
- NetBackup 7.0, 7.0.1, and NetBackup 7.1
- NetBackup 7.1.0.4 and later, NetBackup 7.5 and later, and NetBackup 7.6 and later
For NetBackup 7.0, 7.0.1, and NetBackup 7.1
Note: This method works on all deduplication releases of NetBackup. However, changes in NetBackup 7.1.0.4 and 7.5 add seeding into the product; therefore, Veritas recommends that you use the other seeding methods for those releases.
- At the remote office, copy the data set to a removable storage device such as a portable disk drive or large USB flash drive.
- Send that portable device to a NetBackup administrator at the data center.
- At the data center, attach the portable device to a computer of the same type as at the remote site.
- At the data center, create and run a backup policy that backs up the data set. Because you want to back up only the data set, select only the storage location on which the data set resides. Do not select All local drives.
The backup must be to the same Media Server Deduplication Pool that will receive the remote client's backups. The backup does not have to be from a deduplication client; the storage server can do the deduplication. - On the MSDP storage at the data center, locate the catalog files for the DataCenterClient's backup. The catalog files are stored in the MSDP storage directory structure and use the client name and the backup policy name, as in following example:
- Create the directories on the MSDP database directory that correspond to the catalog location for the remote deduplication client. The directory names use the client name and the backup policy name, as in the following example:
E:\DedupeStorage\databases\catalog\2\RemoteClient\Backup_Policy_Remote - Using command-line 'copy' command or 'xcopy' command, copy all of the catalog files from the DataCenterClient catalog directory into the catalog directory for the remote client. Note: Using Windows Explorer to copy/paste this data can unintentionally copy/create files like Shortcut.lnk and Thumbs.db to the destination location. If you use Windows Explorer, be sure to configure Explorer to show hidden files and operating system files. Do not copy __dirpo__ files along with the image files.
- Rename all of the files so that the client name portion of the file names correspond to the client name of the remote client, as shown in the following example:
- Start the remote client backup. Only data that is unknown to the NetBackup Deduplication Engine should be backed up.
- After the remote client backup completes, delete the catalog files created in step 8.
For NetBackup 7.1.0.4 and later, NetBackup 7.5 and later, and NetBackup 7.6 and later
The following table shows the NetBackup releases on which the two seeding methods are supported.
Seeding host | Supported releases |
Configuring seeding on the client | NetBackup 7.1.0.4 and later NetBackup 7.5 and later NetBackup 7.6 and later |
Configuring seeding on the storage server | NetBackup 7.1.0.4 and later NetBackup 7.5.0.2 and later NetBackup 7.6 and later |
Configuring seeding on the client
Note: Applies to NetBackup 7.1.0.4 and later, NetBackup 7.5 and later, and NetBackup 7.6 and later.
Seeding configuration on the client is accomplished by setting the FP_CACHE_CLIENT_POLICY field in the pd.conf file of the new client: FP_CACHE_CLIENT_POLICY = clienthostmachine,backuppolicy,date
- clienthostmachine – Name of source seeding client
- backuppolicy – Name of source seeding policy
- date – Last date to use this setting, in mm/dd/yyyy format. This date expires this setting, in case it has not been removed from pd.conf. If the setting was not expired, then the cache from this client and policy would continue to be loaded even after the first backup has been established from this client.
FP_CACHE_CLIENT_POLICY = windows789,full_local_drives,01/19/2012
<Deduplication Catalog Home>/windows789/full_local_drives/*.img
Configuring seeding on the storage server
Note: Applies to NetBackup 7.1.0.4 and later, NetBackup 7.5.0.2 and later, and NetBackup 7.6 and later.
Note: Veritas recommends that you use this seeding method for client-side deduplication backups. This seeding method is not intended for normal deduplication backups because it will affect all backup jobs on the storage server.
Note: The client and policy names used by seedutil are treated as case sensitive and need to match the case of the client and policy names in NetBackup.
A more flexible method of configuring client seeding is to use the new “seedutil” utility. This utility, run on the storage server, creates a special seeding directory in the PureDisk catalog for a client, and populates it with image references to another source client and policy’s backup images. The special seeding directory will appear in the PureDisk catalog as follows:
<PureDisk Catalog Home>/#pdseed/<client>
- UNIX: /usr/openv/pdde/pdag/bin/seedutil
- Window: C:\Program Files\Veritas\pdde\seedutil.exe
- Run the following commands on the media server to setup a special seeding directory using the transfer_drive backup images for each client:
$ seedutil -seed -sclient media1 -spolicy transfer_drive -dclient remote_client1
$ seedutil -seed -sclient media1 -spolicy transfer_drive -dclient remote_client2
- Verify the seeding directory has been populated for each client:
$ seedutil –list_images remote_client1
$ seedutil –list_images remote_client2
- Run backups for remote_client1 and remote_client2.
- Clean-up the special seeding directory.:
$ seedutil –clear remote_client1
$ seedutil –clear remote_client2
Applies To
A NetBackup client at a remote office deduplicates its own data. The data is a large data set, such as a database. The data is sent over a WAN to a Media Server Deduplication Pool at a different site (a data center). All hosts are in the same NetBackup domain.