NetBackup™ Backup Planning and Performance Tuning Guide
- NetBackup capacity planning
- Primary server configuration guidelines
- Size guidance for the NetBackup primary server and domain
- Factors that limit job scheduling
- More than one backup job per second
- Stagger the submission of jobs for better load distribution
- NetBackup job delays
- Selection of storage units: performance considerations
- About file system capacity and NetBackup performance
- About the primary server NetBackup catalog
- Guidelines for managing the primary server NetBackup catalog
- Adjusting the batch size for sending metadata to the NetBackup catalog
- Methods for managing the catalog size
- Performance guidelines for NetBackup policies
- Legacy error log fields
- Media server configuration guidelines
- NetBackup hardware design and tuning considerations
- About NetBackup Media Server Deduplication (MSDP)
- Data segmentation
- Fingerprint lookup for deduplication
- Predictive and sampling cache scheme
- Data store
- Space reclamation
- System resource usage and tuning considerations
- Memory considerations
- I/O considerations
- Network considerations
- CPU considerations
- OS tuning considerations
- MSDP tuning considerations
- MSDP sizing considerations
- Cloud tier sizing and performance
- Accelerator performance considerations
- Media configuration guidelines
- About dedicated versus shared backup environments
- Suggestions for NetBackup media pools
- Disk versus tape: performance considerations
- NetBackup media not available
- About the threshold for media errors
- Adjusting the media_error_threshold
- About tape I/O error handling
- About NetBackup media manager tape drive selection
- How to identify performance bottlenecks
- Best practices
- Best practices: NetBackup SAN Client
- Best practices: NetBackup AdvancedDisk
- Best practices: Disk pool configuration - setting concurrent jobs and maximum I/O streams
- Best practices: About disk staging and NetBackup performance
- Best practices: Supported tape drive technologies for NetBackup
- Best practices: NetBackup tape drive cleaning
- Best practices: NetBackup data recovery methods
- Best practices: Suggestions for disaster recovery planning
- Best practices: NetBackup naming conventions
- Best practices: NetBackup duplication
- Best practices: NetBackup deduplication
- Best practices: Universal shares
- NetBackup for VMware sizing and best practices
- Best practices: Storage lifecycle policies (SLPs)
- Best practices: NetBackup NAS-Data-Protection (D-NAS)
- Best practices: NetBackup for Nutanix AHV
- Best practices: NetBackup Sybase database
- Best practices: Avoiding media server resource bottlenecks with Oracle VLDB backups
- Best practices: Avoiding media server resource bottlenecks with MSDPLB+ prefix policy
- Best practices: Cloud deployment considerations
- Measuring Performance
- Measuring NetBackup performance: overview
- How to control system variables for consistent testing conditions
- Running a performance test without interference from other jobs
- About evaluating NetBackup performance
- Evaluating NetBackup performance through the Activity Monitor
- Evaluating NetBackup performance through the All Log Entries report
- Table of NetBackup All Log Entries report
- Evaluating system components
- About measuring performance independent of tape or disk output
- Measuring performance with bpbkar
- Bypassing disk performance with the SKIP_DISK_WRITES touch file
- Measuring performance with the GEN_DATA directive (Linux/UNIX)
- Monitoring Linux/UNIX CPU load
- Monitoring Linux/UNIX memory use
- Monitoring Linux/UNIX disk load
- Monitoring Linux/UNIX network traffic
- Monitoring Linux/Unix system resource usage with dstat
- About the Windows Performance Monitor
- Monitoring Windows CPU load
- Monitoring Windows memory use
- Monitoring Windows disk load
- Increasing disk performance
- Tuning the NetBackup data transfer path
- About the NetBackup data transfer path
- About tuning the data transfer path
- Tuning suggestions for the NetBackup data transfer path
- NetBackup client performance in the data transfer path
- NetBackup network performance in the data transfer path
- NetBackup server performance in the data transfer path
- About shared memory (number and size of data buffers)
- Default number of shared data buffers
- Default size of shared data buffers
- Amount of shared memory required by NetBackup
- How to change the number of shared data buffers
- Notes on number data buffers files
- How to change the size of shared data buffers
- Notes on size data buffer files
- Size values for shared data buffers
- Note on shared memory and NetBackup for NDMP
- Recommended shared memory settings
- Recommended number of data buffers for SAN Client and FT media server
- Testing changes made to shared memory
- About NetBackup wait and delay counters
- Changing parent and child delay values for NetBackup
- About the communication between NetBackup client and media server
- Processes used in NetBackup client-server communication
- Roles of processes during backup and restore
- Finding wait and delay counter values
- Note on log file creation
- About tunable parameters reported in the bptm log
- Example of using wait and delay counter values
- Issues uncovered by wait and delay counter values
- Estimating the effect of multiple copies on backup performance
- Effect of fragment size on NetBackup restores
- Other NetBackup restore performance issues
- About shared memory (number and size of data buffers)
- NetBackup storage device performance in the data transfer path
- Tuning other NetBackup components
- When to use multiplexing and multiple data streams
- Effects of multiplexing and multistreaming on backup and restore
- How to improve NetBackup resource allocation
- Encryption and NetBackup performance
- Compression and NetBackup performance
- How to enable NetBackup compression
- Effect of encryption plus compression on NetBackup performance
- Information on NetBackup Java performance improvements
- Information on NetBackup Vault
- Fast recovery with Bare Metal Restore
- How to improve performance when backing up many small files
- How to improve FlashBackup performance
- Veritas NetBackup OpsCenter
- Tuning disk I/O performance
Proper mind set for performance issue RCA
It is said that troubleshooting a performance issue is like looking for a needle in a haystack. The problem is vague and unstructured, moreover, it can be anywhere in the product and can be from both H/W components and software stack. Most non-performance engineers struggle with where to start the troubleshooting and many of them will dive into the area of their own expertise. For example, an FS expert will start at file system component, while a network engineer may start investigating the network layer. The mind set detailed in this section provides a structured approach to guide the resolution of an otherwise unstructured problem.
By following these guidelines, finding an entry point to start drilling down performance issue will become easier.
Block level understanding of both the hardware and the software components. Understanding the process flow to help narrow down the problem area.
Systematically drilling down the issue - top down and outside in, like peeling an onion. Always start by ensuring that the system has enough H/W resource bandwidth to handle the workload before jumping into the application tuning right away.
Tailor the tuning for each customer if necessary. Tuning that works for one customer may not work for the other, because differences in workload may create different tuning needs. So do not blindly apply a known tuning to other system unless the root cause is the same.
Be meticulous in data collection. Troubleshooting performance problems is an iterative process. As one bottleneck is resolved a new one may emerge. Therefore, automating data collection to ensure consistent data collection throughout the RCA process is critical for efficient problem resolution. In addition, avoid adding additional jobs or allowing unrelated jobs to run on the system while the data collection is in progress.
Remain relentless in RCA. Don't attempt to tune the system until a root cause is identified. Without knowing the root cause, the tuning will be trial and error. It is time consuming and risky. Incorrect tuning can destabilize the system and can result in further performance degradation.
Keep laser focus on the four major resources - CPU, memory, IO, network. All performance issues manifest themselves in one or more of the 4 major H/W resources. By focusing on the usage patten of the four major resources, you can quickly identify an entry point to start the iterative RCA. Look for patterns that defy the common sense or the norm. For example, in general, higher throughput will consume more CPU cycles. If the throughput decreases, while CPU usage increases or remains the same, then your entry point should be the CPU. You may want to look for processes that consume more CPU. Another example is when throughput has plateaued, but disk queue length increases. This is an indication of an I/O subsystem bottleneck and the entry point to RCA should be the I/O code path.
Performance numbers, both throughput and performance statistics, are relative. A number is meaningless until you compare with another number. For example, a disk queue length of 10 is meaningless until you compare with a similar workload which has a queue length of 5. That is why it is important to keep a set of performance data when system is running normally, and when a performance problem occurs, collect the same kind of data for comparison. Having a set of baseline numbers to compare with throughout the iterative process is key for successful problem resolution.
Identify changes in the Environment, such as newly implemented security requirements, changes in workloads applications, hardware of network infrastructure changes, and increases in size of data to the workloads.