Veritas NetBackup™ Appliance Capacity Planning and Performance Tuning Guide

Last Published:
Product(s): Appliances (3.2, 3.1.2)
Platform: NetBackup Appliance OS
  1. About this Guide
    1.  
      About this guide
    2.  
      About the intended audience
    3.  
      About the NetBackup appliance documentation
  2. Section I. Capacity planning
    1. Analyzing your backup requirements
      1. Analyzing your backup requirements
        1.  
          What do you want to backup?
        2.  
          How much data do you want to back up?
        3.  
          When should the backup take place?
        4.  
          What is the retention period?
        5. Record your backup requirements
          1.  
            Template to record computer system information
          2.  
            Template to record database information
          3.  
            Template to record application server information
    2. Designing your backup system
      1.  
        Addressing use cases of backup systems for enterprises
      2.  
        Addressing use cases of backup systems for remote or branch offices
      3. About NetBackup appliances
        1.  
          About the Master Server role
        2.  
          About the Media Server role
      4.  
        About NetBackup 53xx High Availability solution
      5. Selecting new appliances
        1.  
          Selecting a media server
        2.  
          Selecting a master server
  3. Section II. Best Practices
    1. About the best practices section
      1.  
        About best practices
      2.  
        References to maintenance sites
    2. About implementing deduplication solutions
      1.  
        About implementing deduplication solutions
      2.  
        General recommendations
      3.  
        Oracle
      4.  
        Microsoft SQL
      5.  
        DB2
      6.  
        Sybase
      7.  
        Lotus Notes
    3. Network consideration
      1.  
        About network considerations
      2.  
        About Fibre Channel connectivity
      3. About SAN zone configurations
        1.  
          About zoning the SAN for NetBackup appliances
      4.  
        Validating network bandwidth
    4. Storage configuration
      1.  
        About storage configuration
      2. About configuring a shared storage pool
        1.  
          Calculating the basic stream count for backups
      3. About moving a storage partition for better performance
        1. Moving a partition
          1.  
            Move dialog
        2.  
          Moving a partition using the NetBackup Appliance Shell Menu
        3.  
          Moving the MSDP partition from a base disk to an expansion disk for optimum performance
    5. Generic best practices
      1.  
        Generic best practices
      2.  
        About Notification settings
      3.  
        About IPMI configuration
      4.  
        Disaster recovery best practices
      5.  
        Job performance
      6.  
        Architecture
      7.  
        NetBackup Catalog Backup
      8.  
        Patching with the SLP (storage lifecycle policies)
      9.  
        VMware backups using appliances
      10.  
        Improving NetBackup Appliance restore performance
  4. Section III. Performance tuning
    1. Role-based Performance Monitoring
      1.  
        Role-based configuration as a factor affecting performance
      2.  
        What affects the performance of a Master Server?
      3.  
        What affects the performance of a media server (MSDP)?
    2. Optimize network settings and improve performance
      1.  
        Optimize network settings and improve performance
      2.  
        SAN Fibre Channel setup
      3.  
        Network bonding
      4.  
        VMware VADP
      5.  
        Jumbo frame implementation for increased MTU
    3. Storage configurations
      1.  
        Storage configurations
      2.  
        Deduplication disk I/O and RAID level settings
      3. RAID Controller operations
        1.  
          RAID Controller commands
      4.  
        Deduplication load balancing
      5.  
        Storage Lifecycle Policies
      6.  
        Auto Image Replication (AIR)
      7.  
        AdvancedDisk settings
      8.  
        Tape Out operations
    4. NetBackup Appliance tuning procedures and performance monitoring
      1. About diagnosing performance issues
        1.  
          About CPU monitoring and tuning
        2.  
          About memory monitoring and tuning
        3.  
          About network monitoring and tuning
        4.  
          About I/O monitoring and tuning
        5.  
          General guidelines to spot a resource bottleneck
      2. About performance tuning practices
        1.  
          About I/O performance tuning
        2.  
          About Oracle backup and restore performance tuning
        3.  
          Setting NET_BUFFER_SZ to 0 on the NetBackup client for better WAN Optimization performance
      3. About tuning procedures and performance monitoring
        1. NetBackup Client performance
          1.  
            Using nbperfchk to validate disk performance on NetBackup Appliances
        2.  
          Other Performance Monitoring commands
  5. Section IV. Quick reference to Capacity planning and Performance tuning
    1. Capacity Planning checklist
      1.  
        Checklist for Capacity planning
      2.  
        Capacity measuring worksheet
    2. Best practices checklist
      1.  
        Best practices checklist
    3. How to monitor the performance of your appliance
      1.  
        About monitoring the performance of your appliance
      2.  
        Performance Monitoring matrix

About CPU monitoring and tuning

Table: Sample vmstat output (collected with vmstat 5) displays a sample output of the vmstat command when 120 streams of 98% deduplication backup jobs are running on a 53xx appliance.

Table: Sample vmstat output (collected with vmstat 5)

r

b

swpd

Free

Buff

Cache

si

so

us

sy

id

wa

89

0

1006344

348907856

37632

11694512

0

0

62

30

8

0

84

0

1006316

348450264

37640

12016276

11

0

62

30

8

0

63

0

1006316

348104004

37664

12260816

0

0

63

30

7

0

76

0

1006288

347857280

37664

12491148

5

0

61

29

9

0

46

0

1006288

347538340

37684

12756108

0

0

61

30

8

0

72

0

1006260

347111556

37692

13083760

3

0

62

30

8

0

72

0

1006252

346786820

37692

13332416

6

0

62

30

8

0

61

0

1006164

346485836

37712

13612680

28

0

59

29

13

0

92

0

1006156

346136540

37720

13902248

0

0

60

30

10

0

106

0

1006132

345721588

37724

14190992

6

0

61

31

9

0

82

0

1006128

345355448

37732

14465996

0

0

61

30

9

0

113

0

1005972

345072276

37740

14760008

30

0

61

30

10

0

66

0

1005964

344747824

37740

15004520

1

0

61

30

9

0

98

0

1005924

344446500

37748

15282376

8

0

60

30

10

0

118

0

1005920

344035148

37760

15582400

0

0

61

30

9

0

96

0

1005900

343802084

37764

15882380

4

0

62

30

9

0

60

0

1005900

343406276

37784

16175128

0

0

58

29

13

0

61

0

1005872

343038168

37792

16470724

3

0

62

30

7

0

60

0

1005868

342653976

37792

16747684

1

0

61

30

9

0

116

0

1005836

342343076

37800

17001952

5

0

62

30

8

0

Note:

Some of the columns from the output have been removed to simplify the display.

From the above table, we can conclude that the system is CPU bound, because the id column (which displays the % of CPU idle) is mostly in single digit. This indicates that the 53xx CPU utilization is constantly over 90%. Another indication that the system is CPU bound is from the first column r. The value of column r is fluctuating between 46 and 118. r stands for "CPU ready to run queue". It is a count of processes that are currently running or ready to run but waiting for free CPU. 53xx has 40 logical CPU threads, it can at most handle 40 concurrent processes at a time. You can derive the number of processes that are ready to run but waiting for CPU cycles by subtracting 40 from the value in column r.

With the above CPU statistics and the fact that this happens while the system was running 120 concurrent 98% backup streams, there are two possible actions that you can take to lower the CPU consumption:

  • Lower the batch size of job. If CPU is overly busy, the jobs could spend too much time waiting for available CPU cycles. Lowering the number of concurrent jobs per batch can improve overall performance.

  • Adding another 53xx as the fingerprint server to double the CPU capacity is a natural solution.

A quick internal experiment with an additional fingerprint server showed that the performance increased almost 40% up to 10GB/sec while CPU usage reduced almost 50% on the appliance. At this point, the bottleneck switched to network since the 53xx can support up to 10 x 10 Gbps NIC which cap the network throughput around 10 GB/sec. We can probably see even higher performance improvement if there were more than 10 x 10 Gbps NIC installed on the system.