Problem
Unix client backup jobs hang during the backup
Error Message
status 13, status 41, errno = 110
Cause
NetBackup bpbkar process may be:
- busy reading an extremely large file or reading a large disk with few file changes.
- waiting for an application to release a lock on a file.
- waiting for the OS response which could be hung on a corrupt or inaccessible file / directory
Troubleshooting
To test a problem data path without running a backup:
- Open Command Prompt on the Windows client.
- Run the tar / ls / bpbkar commands below manually on the client to see if there are problems with data corruption or directory access problem.
- In the testing with bpbkar, this will just read the client data to /dev/null. No backup will be done.
To test with Unix 'tar' and 'ls' commands:
# ls -ltR /YOUR-DATA-PATH-TO-TEST-HERE
# tar cvf /dev/null /YOUR-DATA-PATH-TO-TEST-HERE
If problems show up using the above commands, have the administrator resolve the client issue before continuing.
To test with NetBackup 'bpbkar' command:
To view the last file / directory where the backup is hanging, do the following on the client:
- Create the bpbkar debug log directory by running:
mkdir /usr/openv/netbackup/logs/bpbkar
- Create an empty file named bpbkar_path_tr to enable debug logging into bpbkar log.
touch /usr/openv/netbackup/bpbkar_path_tr
- Note: The use of this touch file 'bpbkar_path_tr' will cause larger bpbkar logs than the usual.
- Enable verbose 5 client logging through the NetBackup GUI on the master.
- Under NetBackup Management > Host Properties > CLIENTS > double click client-name > Properties > Logging > Global > 5
- Or, add the VERBOSE = 5 into the /usr/openv/netbackup/bp.conf file on the client server.
- Note: No restart is needed after these settings are added.
- Run a backup to generate the log information.
Run the commands to start the test:
# cd /usr/openv/netbackup/bin
# ./bpbkar -dt 0 -r 888 -nocont -nfsok /YOUR-DATA-PATH-TO-TEST-HERE > /dev/null
If the bpbkar command stops or hangs, view the end of the /usr/openv/netbackup/logs/bpbkar/log.<date> file for the last directory / file name and possible OS messages.
Example:
17:03:25.110 [642176] <2> bpbkar SelectFile: cwd=/var/apache/logs path=access_log
17:03:25.297 [642176] <2> bpbkar SelectFile: cwd=/var/apache/logs path=error_log
Note: It hangs here, there are no further messages for PID [642176].
File Hang:
The last file in the bpbkar log may have corruption and should be tested with Linux commands: ls, cp, mv
Directory / Mount Point Hang:
If the last file is a directory or mount point, cd to the path and see if it hangs the cursor. Hit Ctrl+C to exit out of the hang.
Solution
- For file or directory hang tested at the OS level:
- As a workaround to the file system issue, put the path into the /usr/openv/netbackup/exclude_list on the client.
echo "/Path/file_name" >> /usr/openv/netbackup/exclude_list
- For hang on active files in use by applications, add the 'LOCKED_FILE_ACTION = SKIP ' entry into the /usr/openv/netbackup/bp.conf file on the client server.
echo "LOCKED_FILE_ACTION = SKIP" >> /usr/openv/netbackup/bp.conf
- Note: No restart of any server is needed after making this change
- This setting can also be made through the NetBackup GUI on the master.
- Under NetBackup Management > Host Properties > CLIENTS > double-click client-name > Properties > Unix Client > Client settings > Locked File Action > 'Skip'
- Note: No restart of any server is needed after making this change
- For clients with very large files, very large disks, and for servers that are heavily loaded, have the media server allow them more time before shutting down the backup job.
- Increase the media server 'Client Read Timeout'.
- Open the NetBackup GUI on the master
- Under NetBackup Management > Host Properties > Media Server > double-click media server name > Properties > Timeouts
- Increase the media server's "Client read timeout" to 1800, 3600 seconds, or higher.
- Warning: Do not adjust the "Client connect timeout". That setting should remain at default of 300 seconds.
- Note: No restart of any server is needed after making this change