How to tune NetBackup Auto Image Replication (A.I.R.) operations for maximum performance

How to tune NetBackup Auto Image Replication (A.I.R.) operations for maximum performance

Article: 100046559
Last Published: 2021-01-13
Ratings: 11 6
Product(s): NetBackup

Description

Getting maximum performance when replicating images across domains often needs some tuning.  This article describes the tuning options available in NetBackup, beginning with the 8.1.2 release.  The goal of the tuning operation is optimizing use of the storage server hardware, network bandwidth, and NetBackup infrastructure to maximize the amount of data that can be replicated. An additional goal is to prevent replication jobs from oversubscribing resources and prevent backup jobs from executing. Tuning parameters are available to adjust the number and size of jobs that will be submitted, how many jobs will be submitted concurrently, how many jobs can actually run concurrently, and how often jobs will be submitted

A.I.R. performance depends on storage servers and networks that are operating properly.  Any hardware, storage, or networking issues should be resolved before applying these tuning parameters.

Veritas recommends use of the Targeted A.I.R. option when replicating data.  The throttling operation described below applies only to Targeted A.I.R., and is applicable to all OST storage servers (MSDP, DataDomain, etc).

Step 1 - Update NetBackup

Veritas has released several EEBs that impact A.I.R. performance.  Those EEBs should be installed before proceeding with the tuning operations.

Please contact Veritas Technical Support to obtain the latest EEB's, or locate latest EEB them using the ET number on the download site: https://www.veritas.com/content/support/en_US/downloads
  • Release 8.0
    • ET 3993104, version 1 - Replication delays due to polling wait times. To be installed on the MSDP media server.
  • Release 8.1
    • ET 4012830, version 1 - Implement A.I.R. throttling (Targeted AIR only). To be installed on the master server.
  • Release 8.1.1
    • ET 3994191, version 1 - Implement A.I.R. throttling (Targeted AIR only). To be installed on the master server.
    • ET 3994012, version 1 - Replication delays due to polling wait times. To be installed on the MSDP media server.
    • ET 3942191, version 27 - MSDP hotfix bundle. To be installed on the MSDP media server.
  • Release 8.1.2
    • ET 3987112, version 2 - Implement A.I.R. throttling (Targeted AIR only). To be installed on the master server.
    • ET 3996230, version 1 - Replication delays due to polling wait times. To be installed on the MSDP media server.
    • ET 3956103, version 14 - MSDP hotfix bundle. To be installed on the MSDP media server.
  • Release 8.2
    • ET 3987486, version 1 - Implement A.I.R. throttling (Targeted AIR only). To be installed on the master server.
    • ET 3981133, version 11 - MSDP hotfix bundle. To be installed on the MSDP media server.

Note: EEB version could be change hence you have to download only latest one and install it.

For EEB installation steps, please see Knowledge Base articles:

Using the NetBackup Emergency Engineering Binary (EEB) installer

How to install EEBs, HotFixes and Maintenance Releases on NetBackup Appliances
 

Step 2 - Set the limit on how many active jobs can run concurrently

A large number of replication jobs running concurrently can cause performance problems due to contention.  The jobs contend for NetBackup resources, for storage server cycles, and for network bandwidth.  This contention causes all replication jobs to perform poorly.  The A.I.R. throttling introduced by the EEBs noted above can limit the number of active jobs (the throttling operation applies only to Targeted A.I.R.).  Throttling is controlled based on the target storage server that will receive the replicated images.  The limit should describe how many jobs can run concurrently. When setting the limit, the capacity of the source and target storage servers should be considered along with the expected load on both servers and the bandwidth available for the network connection between the two servers. In addition, the limit should be lowered if backup jobs will run concurrently with replication jobs.

The limit is set using a new SLP parameter named SLP.REPLICATION_TARGET_JOB_LIMIT and is set using the nbsetconfig command on the source master server.  The parameter syntax is:

SLP.REPLICATION_TARGET_JOB_LIMIT = <limit_spec>[,<limit_spec>][,...]

Each "limit-spec" value can be a numeric value which sets the limit for every target storage server.  It can also take the form "<storage_server_name>:<number>" which sets the limit for a specific target storage server. The two limit types can be specified on the same parameter setting.  Examples:

# Set the replication limit for each target storage server to 10.
->nbsetconfig
nbsetconfig>SLP.REPLICATION_TARGET_JOB_LIMIT = 10
nbsetconfig><end-file marker - Unix: Ctrl+D Enter, Windows: Ctrl+Z Enter>

# Set the limit for two named target storage servers to 12 and 6.  Set the limit for
# all other target storage servers to 8.
->nbsetconfig
nbsetconfig>SLP.REPLICATION_TARGET_JOB_LIMIT= targetServerA:12, targetServerB:6, 8
nbsetconfig><end-file marker - Unix: Ctrl+D Enter, Windows: Ctrl+Z Enter>

On Unix systems, the nbsetconfig program is located at /usr/openv/netbackup/bin/nbsetconfig.  On Windows systems, it is located at install_path\NetBackup\bin\nbsetconfig.  Note that each use of the SLP.REPLICATION_TARGET_JOB_LIMIT parameter replaces any previous specification.

 

Step 3 - Set the limit on how many jobs can be submitted

If there are a large number of images waiting to be replicated, the active job throttle described in step #2 will cause the many remaining jobs to be in a queued state.  An excessive number of queued jobs will slow down NetBackup job processing and resource allocation on the master server so it is desirable to also limit the number of queued jobs.  But it is also important to keep enough jobs in the queue so that there are jobs ready to start as soon as an active job finishes.  SLP processing uses the SLP.REPLICATION_RESOURCE_MULTIPLIER to control the number of submitted jobs.  The default value is 2 and can be changed via the nbsetconfig command:

# Set the resource multiplier for replication
->nbsetconfig
nbsetconfig>SLP.REPLICATION_RESOURCE_MULTIPLIER = 3
nbsetconfig><end-file marker - Unix: Ctrl+D Enter, Windows: Ctrl+Z Enter>

The SLP.REPLICATION_TARGET_JOB_LIMIT value for the target storage server is multiplied by the SLP.REPLICATION_RESOURCE_MULTIPLIER value to determine the submission limit. SLP processing sessions run every 5 minutes by default.  During each session, the submission limit will be determined and the number of active/queued replication jobs that use the target storage server will be computed.  Subtracting the active job count from the submission limit will determine how many additional jobs should be submitted.  This will keep a ready set of queued jobs available to run without overwhelming the NetBackup job infrastructure.  Example:

  • The SLP.REPLICATION_TARGET_JOB_LIMIT for the target storage server is 10
  • The SLP.REPLICATION_RESOURCE_MULTIPLIER value is 2.  The submission limit is 20.
  • There are 10 active jobs running and 5 that are queued waiting for resources.
  • During the SLP processing session, 5 additional jobs will be submitted to bring the number of active/queued jobs back up to 20.

If the average job size is large and runs for one or more SLP processing sessions, the multiplier value can be low.  If the average job size is small and jobs run quickly (less than the 5 minute session time), the multiplier value should be increased so that more jobs will be submitted during each session.  The goal is to submit enough jobs to keep the storage servers busy.

Step 4 - Set the limit on the size of a replication jobs

The maximum size of a replication job is an important factor in determining how long each replication job will take to run and how many jobs will be created. Both factors influence the behavior noted in the sections above. Images assigned to a given replication job are processed sequentially.   If there are only a few very large jobs, each job is processing a single image at a time and there is little parallelism.  This can leave the storage servers underutilized. On the other hand, there is overhead involved in starting, controlling, and stopping each job.  This happens at both the NetBackup level and at the storage level.  A large number of small jobs generates unwanted overhead which impacts performance.  The goal is to create a reasonable number of reasonably sized jobs.  "reasonable" is a subjective value and will vary for each user.

The job limit is set with SLP.MAX_SIZE_PER_BACKUP_REPLICATION_JOB parameter.  As with other parameters, the nbsetconfig command is used to set the value:

# Set the maximum job size for replication
->nbsetconfig
nbsetconfig>SLP.MAX_SIZE_PER_BACKUP_REPLICATION_JOB = 200 GB
nbsetconfig><end-file marker - Unix: Ctrl+D Enter, Windows: Ctrl+Z Enter>

The default value is 100 GB (gigabytes).  The image size used for the size computation is the front-end size of the data moved from the NetBackup client.  Deduplication performed by the source and target storage servers will impact the amount of data actually replicated and how log the job actually takes to run.

Step 5 - Set the limit on how often to poll for completion of a replicated image.

The poll interval that NetBackup uses to check for completion of a replicated image can be adjusted.  By default, NetBackup will compute a poll interval based on the size of the image to be replicated.  If this value is not optimal, it can be adjusted.  Shortening the interval will improve throughput for replication operations at the cost of higher overhead.  See KB article 100045506 for the details on how to change the value.

Was this content helpful?