V-269-38 - Improving slow backups and poor performance in Backup Exec.

Articolo: 100017820
Data ultima modifica: 2018-04-20
Valutazioni: 5 2
Prodotto/i: Backup Exec

Problem

V-269-38 - This article provides information designed for troubleshooting poor Backup Exec job rate performance.

Cause

 

Backup operations run in a group of systems. These systems can be compared to a pipeline of varying sizes from the disk containing the data all the way to the backup destination.  If any one of these sections of the pipe is constricted it becomes the bottleneck that causes the entire backup process to slow down.  The goal of this document is to identify the bottleneck, or bottlenecks in a system that is causing backup (or restore) performance issues.

Many variables can affect throughput performance of backups and restores.  These can include:

Hardware
 
The speed of the disk controller and hardware errors caused by the disk drive, the tape drive, the disk controller, the SCSI bus, or improper cabling/termination can slow performance. Confirm that the controller is rated for the tape backup hardware.  That is, don't expect high speeds of a 120MB/Second tape drive connected to a 10 MB/Second SCSI controller.  Check that the SCSI Bios Settings are set properly as follows:
  • Initiate Wide Negotiation is set to Yes when the tape device is connected to a 68 pin wide SCSI Cable Connector.
  • Tape drives are not connected to a SCSI Raid Controller.

System
 
The capacity and speed of the media server performing the backup, or the remote system being backed up significantly impacts performance. System activity during backup also impacts performance. Fragmented disks take a longer time to back up. Heavily fragmented hard disks not only affect the rate at which data is written to tape, but also affect the overall system performance. Fragmented files take longer to back up because each segment of data is located at a different location on the disk, which causes the disk to take longer to access the data. Make sure to defragment disks on a regular basis.
 

Memory
  
The amount of available memory will impact backup speed. Insufficient memory, improper page file settings, and a lack of available free hard disk space will cause excessive paging and slow performance.Make sure that when any program or process starts the amout of memory it consumes at the start it should release the same at the end of the same process or program (it is also called memory leak).
Definition: Memory leak
The amount of memory consumed by any process should also be released when the process stops or terminates ,if its not doing so then its called Memory leak.

File Types
  
The average file compresses at a 2:1 ratio when hardware compression is used. Higher and lower compression occur depending on the type of files being backed up. Average compression can double the backup speed, while no compression runs the tape device at its rated speed. Image and picture files are fully compressed on disk. Therefore, no hardware compression takes place during the backup causing the tape drive to operate at its native (non-compression) rate of speed. Hardware compression is performed by the tape device and not the backup software.
 

Compression
  
Successful compression can increase the tape drive's data transfer rate up to twice the native rate. Compression can be highly variable depending on the input data.  Image files from a graphical program like Microsoft Paint, may compress at 4.5:1 or more, while binary files may compress at just 1.5:1. Data that has already been compressed or random data (such as encrypted data or MPEG files) may actually expand by about five percent when attempting to compress it further. This can reduce drive throughput.
 
 
Files
  
The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. A large number of files located in the same directory path back up more efficiently than backing them up from multiple directory locations.
 

Block Size
  
Larger block sizes can improve the compression ratio, which helps the drive to achieve better throughput and more tape capacity. Make sure that the block and buffer size are set properly. The throughput will increase in proportion to the compression achieved, until the drive's maximum throughput is reached. Veritas does not recommend increasing the Block Size above the default settings.
 

Network
  
The backup speed for a remote disk is limited by the speed of the physical connection. The rate at which a remote server's hard disks are able to be backed up depends on:
  • The make/model of network cards.
  • The mode/frame type configuration for the adapter.
  • The connectivity equipment (hubs, switches, routers, and so on).
  • Windows Settings.
  • Local disk drives on the media server can usually be backed up at a faster than backing up remote servers across a network.
A common reason for slow network backups can be networking configuration.
 
Features such as "full-duplex" and "auto-detect" may not be fully supported in every environment. Manually set the speed to 100 Mb and the duplex to half/full for the server side. Find out which Ethernet port the server is connected to on the switch, and set the SWITCH PORT setting to 100 MB and half/full duplex. Do this for the backup server switch port, and any switch ports for machines being backed up.
 
Note: When a hub is in place instead of a switch, full duplex may not be supported, see the Original Equipment Manufacturer for details on device features.
 
Note: Both the switch and the network card must have matching settings, for instance, if the switch port is set to 100 half, the NIC for the server should also be set to 100 half.
 
If a full duplex backup is slower than the half duplex backup, full duplex may not be supported for the combination of NIC, driver and switch. Contact the NIC and Switch manufacturer for updated drivers, firmware, or other support documentation.
 
Another common cause could be the NIC driver. The NIC driver can be easily overwritten by an operating system service pack. If a service pack has been applied and the driver has been overwritten, reinstall the Original Equipment Manufacturer (OEM) driver.
 

Debugging
 
Debugging that is enabled for troubleshooting purpose can also affect the system performance.
 
Debugging that occurs through the Services applet is temporary and cycling the services or rebooting the machine will stop the debugging. Debugging configured through the Windows Registry allows for continuous debugging. Leaving the services in debugging mode will cause the logs to build up. For this reason, it is advisable to either take the services out of debugging mode when the problem is resolved, delete the older debug files, or configure the logs directory to be compressed. Please keep this in mind when determining which method of debugging to perform on a system.
 
Veritas Quick Assist (VQA) can automatically detect if the Backup Exec debugging has been left enabled from the Windows registry.
Click here to download the tool:  https://www.veritas.com/support/en_US/vqa
For more information on how to disable debugging, please refer to the Related Documents Section.
 
Backup Exec Database
 
Installing the Backup Exec Database (BEDB) into an existing SQL instance that is used by other applications can also cause performance issues. This is particularly observed in a Central Administration Server Option (CASO) environment. Other applications may cause resource issues and use all the available resources within the instance.

 

Solution

Troubleshooting Performance Issues:

Listed below are the possible troubleshooting steps that can be performed to improve Backup Exec performance.  Skip ahead to the scenario of interest.  The scenarios are:
  • Local Backup to Disk.
  • Remote Backup to Disk.
  • Local Backup to Tape.
  • Remote Backup to Tape.
     
Note:  For both Local and Remote Backup to Disk, run the Backup to Disk Test Tool (B2DTest.Exe) prior to performing the steps below, to confirm the device will function properly with Backup Exec.  www.veritas.com/docs/000037869
Local Backup to Disk:
 
1. Get a baseline. Review the old job logs and note the speed for previous jobs and the overall time required during these backups. (Go to Job Monitor tab in Backup Exec and select the Job log in the Job History window at the bottom)  Observe the total time the job takes to complete instead of the actual byte count rate. Compare it with the old logs. If the job is taking a considerable amount of more time to finish than before or not matching the expected speed, continue troubleshooting further.
 
2. Narrow the problem.  If multiple drives or agents are being backed up in one job, then split the job up into separate jobs for each of those drives and agents. For Example, if a backup job backs up C, D, E, and the Exchange agent, then create 4 separate jobs: one for C, one for D, one for E, and one for Exchange (To do this click on the Backup button and select the C$ drive, schedule the job, and click Submit. Follow this procedure for each of the drives and agents).  If the speed is slow only for a particular job, continue troubleshooting that job.  
 
3. Narrow the problem.  If only one particular job above is slow, then the job can be split further to determine if one particular part of the data is causing the performance impact.  If a particular data set is identified as the problem.   Suggestions:
 
Check that the data is not being redirected elsewhere.  Some file systems allow a directory to remotely mount data.  The files in this directory can be located on a remote server, which may slow down the whole backup.
 
Check if there are many small files and directories in this section of the backup.  Many small files and directories will slow down the backup, and this is expected.
 
4.  Test B2D disk throughput.  Use a Windows Copy (drag and drop from one drive to another) at least 2GB of the data in the backup job to the B2D disk.  Compare the performance to the backup.  If performance matches, then the bottleneck is probably the disk subsystem where the B2D folders are located.  Relocate the B2D folders to a faster disk subsystem, or troubleshoot the disk subsystem further.
 
5. Test system throughput.  Create a similar backup in NTBackup (Windows Backup) and run the backup to disk ( Assuming that only a file based job is performed and not backing up Exchange, SQL, or other databases.) To start NTBackup go to Start | Run and type in ntbackup . Compare the jobs. If Exchange, SQL, or other database agents are being backed up in Backup Exec, then create a Backup to Disk job inside Backup Exec that backs up 2GB of data wherever that agent resides, and then perform the same test with NTBackup. It is possible to backup Exchange using ntbackup if it is local and compare the performance. If the performance rates are similar, then Backup Exec is performing at the capacity of the system.
 
 
Remote Backup to Disk:
 
1. Get a baseline. Review the old job logs and note the speed for previous jobs and the overall time required during these backups. (Go to Job Monitor tab in Backup Exec and select the Job log in the Job History window at the bottom)  Observe the total time the job takes to complete instead of the actual byte count rate. Compare it with the old logs. If the job is taking a considerable amount of more time to finish than before or not matching the expected speed, continue the troubleshooting.
 
2. Narrow the problem. If multiple drives or agents are being backed up in one job, split the job up into separate jobs for each of those drives and agents. For Example, if a backup job backs up C, D, E, and the Exchange agent, then create 4 separate jobs: one for C, one for D, one for E, and one for Exchange (To do this click on the Backup button and select the C$ drive, schedule the job, and click Submit. Follow this procedure for each of the drives and agents). If the speed is slow only for a particular job, continue troubleshooting that  job.
 
3. Narrow the problem.  If only one particular job above is slow, then the job can be split further to determine if one particular part of the data is causing the performance impact.  If a particular data set is identified as the problem.   Suggestions:
 
Check that the data is not being redirected elsewhere.  Some file systems allow a directory to remotely mount data.  The files in this directory can be located on a remote server, which may slow down the whole backup.
 
Check if there are many small files and directories in this section of the backup.  Many small files and directories will slow down the backup, and this is expected.
 
4. Test network throughput. Copy 500MB to 1GB of data from the backup server to the remote server and note the time required to complete the copy operation (this can be done by creating a path to another server by going to Start| Run and typing in <\\remote servername\c$> and then copying the data once the drive is displayed). Follow the same steps and copy data from the remote server to the backup server and note the required time. Divide the "data backed up" by "time" in order to get the speed in MB/min, and compare it with Backup Exec's performance.  If Backup Exec's performance is similar to the file copy tests, then the backup is running at the capacity of the network.  If this is slow, focus on troubleshooting the network components.  If Backup Exec's performance is slower than the file copy tests, then the network is probably not the bottleneck.
 
If the network does appear to be the bottleneck, consider performing this same test to a different remote server, or between two completely different servers to determine if the performance issue is associated with the network in general, or a particular server on the network.
 
5. Test system throughput. Perform the following, only if  the steps in point 3 have been performed and no network issue has been detected.
 
Try backing up the remote server with NTBackup (Windows backup) (Go to Start| Run and type in ntbackup ). If the remote server is not visible in NTBackup, create a mapped drive to the server's drive (assuming the slow speeds are not caused due to an agent backup like Exchange or SQL) and try again. Perform the similar steps and backup at least 2GB of data. Review the log and compare it with Backup Exec logs. This helps in narrowing down the problem and indicates whether there is any issue with the server or with Backup Exec. If the remote backups are not working with NTBackup, open NTBackup locally on the remote server and run it there (Backup from one drive to another drive on that server).
 
 
Local Backup to Tape:
 
1. Get a baseline. Review the old job logs and note the speed for previous jobs and the overall time required during these backups. (Go to Job Monitor tab in Backup Exec and select the Job log in the Job History window at the bottom)  Observe the total time the job takes to complete instead of the actual byte count rate. Compare it with the old logs. If the job is taking a considerable amount of time to finish than before or not matching the expected speed, continue troubleshooting further.
 
2.  Clear out temporary hardware glitches. Power cycle the server and tape drive/library:
 
Shut down the backup server first, then the tape drive or library (as applicable). Wait for a couple of seconds, then power up the tape drive/library and wait until it is on ready or until everything stops moving. Then power up the server. Run the backup job again and check the speed.
 
3. Check SCSI subsystem.  The speed of the disk controller and hardware errors caused by the disk drive, the tape drive, the disk controller, the SCSI bus, or the improper cabling/termination can slow performance. Ensure that the controller is rated for the tape backup hardware and that the SCSI Bios Settings are set properly.  Additionally make sure that:
  • Initiate Wide Negotiation is set to Yes when the tape device is connected to a 68 pin wide SCSI Cable Connector.
  • Tape drives are not connected to a SCSI Raid Controller.
The performance of the verify operation shows the health of the SCSI subsystem.  Because the verify operation only reads data and performs in memory operations on the Media Server, it is usually limited by the speed of the SCSI subsystem. Verify performance can be checked by looking at the logs of jobs that include a verify operation.  If the verify speeds are low, then the SCSI subsystem is probably a performance bottleneck.
 
4. Narrow the problem. If multiple drives or agents are being backed up in one job, then split the job up into separate jobs for each of those drives and agents. For Example, if a backup job backs up C, D, E, and the Exchange agent, then create 4 separate jobs: one for C, one for D, one for E, and one for Exchange.  If the speed is slow only for a particular job, continue troubleshooting that job.
 
5. Narrow the problem.  If only one particular job above is slow, then the job can be split further to determine if one particular part of the data is causing the performance impact.  If a particular data set is identified as the problem.   Suggestions:
 
Check that the data is not being redirected elsewhere.  Some file systems allow a directory to remotely mount data.  The files in this directory can be located on a remote server, which may slow down the whole backup.
 
Check if there are many small files and directories in this section of the backup.  Many small files and directories will slow down the backup, and this is expected.
 
6. Test system throughput. Open NTBackup (Windows backup) and run a backup job of the drive/agent (To start NTBackup go to Start |Run and type in ntbackup ). Use the tape drive for backup by clicking on the Backup tab, selecting it from the drop down box under the Backup Destination field. If the drive is not seen, uninstall the Veritas drivers and install the OEM drivers for this drive. This will require a couple of reboots. Preferably backup the whole drive or agent, or backup at least 2GB of data. Note that some areas of the drive might actually be the issue like a drive with bad sectors or a drive that contains millions of small files. Review the log and compare the speed with Backup Exec log.  If the speeds are similar, then the backup is running at the capacity of this system.
 
7. Successful compression can increase the tape drive's data transfer rate up to twice the native rate.  Compression can be highly variable depending on the input data. Image files from a graphical program like Microsoft Paint, may compress at 4.5:1 or more, while binary files may compress at just 1.5:1. Data that has already been compressed or random data (such as encrypted data or MPEG files) may actually expand by about five percent if it is attempted to compress it further. This can reduce the drive throughput. If hardware compression is not performing as expected, then switch to software compression, or vice versa (this can be done by editing the Backup job properties, clicking on General under Settings, and then selecting a different type of compression under the Compression Type drop down).
Remote Backup to Tape

All the points mentioned under Local Backup to Tape section are applicable here. Additionally refer to the following points:
 
1. Test network throughput. Copy 500MB to 1GB of data from the backup server to the remote server and note the time required to complete the copy operation (this can be done by creating a path to another server by going to Start| Run and typing in <\\remote servername\c$> and then copying the data once the drive is displayed). Follow the same steps and copy data from the remote server to the backup server and note the required time. Note down these times and divide the "data backed up" by "time" in order to get the speed in MB/min, and compare it with Backup Exec's performance.  If Backup Exec's performance is similar to the file copy tests, then the backup is running at the capacity of the network.  If this is slow, focus on troubleshooting the network components.  If Backup Exec's performance is slower than the file copy tests, then the network is probably not the bottleneck.
 
If the network does appear to be the bottleneck, consider performing this same test to a different remote server, or between two completely different servers to determine if the performance issue is associated with the network in general, or a particular server on the network.
 
2. Test system throughput. Perform the following, only if  the steps mentioned in point 1 in this section have been performed and no network issue has been detected.
 
Try backing up the remote server with NTBackup (Go to Start| Run and type in ntbackup). If the remote server is not visible in NTBackup, create a mapped drive to the server's drive (assuming the slow speeds are not caused due to an agent backup like Exchange or SQL) and try again. Perform the similar steps and backup at least 2GB of data. Review the log and compare it with the Backup Exec logs. This helps in narrowing down the problem and indicates whether there is any issue with the server or with Backup Exec.
 
Note: If the remote backups are not possible with NTBackup, open NTBackup locally on the remote server and run it there (Backup from one drive to another drive on that server). This also implies that the user needs to run a backup to disk job of the same data through Backup Exec so that a proper comparison can be drawn. However, in most cases, backup to disk jobs will be faster than backup to tape jobs. 
For additional performance troubleshooting assistance, review what shows in the Related Articles Section on the right.
 

 

Riferimenti

UMI : V-269-38

Il contenuto è stato utile?