Aggressive NetBackup buffer tunings may cause resource exhaustion on NetBackup Appliances

Article: 100031616
Last Published: 2015-12-01
Ratings: 2 0
Product(s): Appliances

Problem

Aggressive NetBackup buffer tunings may cause resource exhaustion on NetBackup Appliances

NetBackup Appliances are provisioned memory based on expected Media Server Deduplication (MSDP) storage needs.  All performance testing of Appliance performance is done at NetBackup default size and number buffer tunings.  Values higher than defaults can exhaust appliance resources as the MSDP storage fills and additional roles are added to the Appliance, most notably, the VMWare backup host role.  As the load increases on the Appliance, signs of resource exhaustion will gradually become more acute until the Appliance becomes inaccessible/unresponsive on any interface.  E.G. ssh, console, or remote console.  Inability to access IPMI is not a related symptom.

Some or all of the following symptoms may be present prior the the machine becoming completely unresponsive:
  • bptm and bpbkarv allocate too much physical RAM, forcing spoold and spad to swap memory
  • Inability to run certain CLISH commands
    • Main > Monitor > Hardware > ShowHealth
    • Main > Support > DataCollect
  • Erratic behavior of some system commands
    • su - admin (Fails with ACCESS NOT ALLOWED)
  • Loss of Network Stack

Cause

System Overload

Buffer tunings common for legacy dedicated (Non-MSDP) tape/disk media servers used on a deduplication appliance:

# / usr/openv/netbackup/bin/goodies/bpconverttouch -f

Touch files found by this utility:
Found file: /usr/openv/netbackup/db/config/CD_NUMBER_DATA_BUFFERS      Contents:128
Found file: /usr/openv/netbackup/db/config/CD_SIZE_DATA_BUFFERS        Contents:524288
Found file: /usr/openv/netbackup/db/config/CD_UPDATE_INTERVAL          Contents:180
Found file: /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS         Contents:256
Found file: /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS_DISK    Contents:512
Found file: /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS_FT      Contents:16
Found file: /usr/openv/netbackup/db/config/OST_CD_BUSY_RETRY_LIMIT     Contents:1500
Found file: /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS           Contents:262144
Found file: /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS_DISK      Contents:1048576
Found file: /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS_FT        Contents:262144

Factory configured buffer setting are shown below, values can be returned to defaults by removing the touch files. 

nbappliance.Settings> NetBackup Misc Defaults
DEFERRED_IMAGE_LIMIT                              : 64
DPS_PROXYDEFAULTRECVTMO                           : 800
nbappliance.Settings> NetBackup DataBuffers Number Defaults
NUMBER_DATA_BUFFERS                               : 30
NUMBER_DATA_BUFFERS_DISK                          : 30
NUMBER_DATA_BUFFERS_FT                            : 16
NUMBER_DATA_BUFFERS_RESTORE                       : 30

nbappliance.Settings> NetBackup DataBuffers Size Defaults
SIZE_DATA_BUFFERS                                 : 262144 B
SIZE_DATA_BUFFERS_DISK                            : 262144 B
SIZE_DATA_BUFFERS_FT                              : 262144 B
SIZE_DATA_BUFFERS_MULTCOPY                        : 262144 B
SIZE_DATA_BUFFERS_NDMP                            : 262144 B




 

Solution

Appliances are optimized and tested at default settings for all NetBackup tunables.  Remove any size or number buffer related touchfiles with the exception of those listed below:
 
/usr/openv/netbackup/NET_BUFFER_SZ    Contents:0

     Further attempts to tune the Appliance for greater performance in a particular role should be undertaken with caution,
working upward from incremental changes and testing under load until a balance of optimal performance and system
resource utilization is reached. 
     Further testing should be done as roles (backup host, NDMP host, tape out, replication target or source) are added to
the Appliance, additional storage is added, and storage growth approaches 60% of the maximum capacity of the Appliance. 
Note that an increase in the size and number of the data buffers uses up more shared memory, which is a limited system resource.
The total amount of shared memory that is used for each stream is equal to the number_data_buffers * size_data_buffers,
for an MSDP server one can calculate this as: number_data_buffers * size_data_buffers * MaxIO streams, with the addition of
the backup host role this can be estimated as: number_data_buffers * size_data_buffers * MaxIO + (No. VMWare Backup Host Streams * 524288)

Was this content helpful?