Backing up raw native or VxVM volumes with a 'Standard' policy on RedHat 6.x or 7.x to a local storage unit can fail with status 6

Article: 100032477
Last Published: 2016-05-20
Ratings: 0 0
Product(s): NetBackup

Problem

Several factors can contribute to the failure outlined below. They are:
  • RedHat kernel version in use.
  • Storage Foundation (VxVM) / InfoScale software version in use.
It is only raw (character device) volume backups with a 'standard' policy type that have the potential for this issue to occur. A regular file system backup is not affected.

The error message, cause and solution to this problem are outlined below.

Error Message

The Detailed Status tab of the failed job in the NetBackup activity monitor will show an error similar to the one below (relevant error text is highlighted in bold):

01/22/2016 10:44:32 - Info bpbrm (pid=20999) starting bpbkar on client
01/22/2016 10:44:32 - Info bpbkar (pid=21009) Backup started
01/22/2016 10:44:32 - Info bpbrm (pid=20999) bptm pid: 21010
01/22/2016 10:44:32 - Info bptm (pid=21010) start
01/22/2016 10:44:32 - Info bptm (pid=21010) using 262144 data buffer size
01/22/2016 10:44:32 - Info bptm (pid=21010) using 30 data buffers
01/22/2016 10:44:32 - Info bptm (pid=21010) start backup
01/22/2016 10:44:34 - Warning bpbrm (pid=20999) from client dn: WRN - /dev/vx/rdsk/shdg/vol1 is a character special file. Backing up the raw partition.
01/22/2016 10:44:37 - Error bpbrm (pid=20999) from client dn: ERR - Read error at byte 1073737728 reading 262144 bytes in file /dev/vx/rdsk/shdg/vol1. Errno = 5: Input/output error
01/22/2016 10:44:37 - Info bptm (pid=21010) waited for full buffer 113 times, delayed 186 times
01/22/2016 10:44:37 - Info bpbkar (pid=21009) bpbkar waited 0 times for empty buffer, delayed 0 times
01/22/2016 10:44:43 - Error bptm (pid=21010) media manager terminated by parent process
01/22/2016 10:44:46 - Info bpbkar (pid=21009) done. status: 6: the backup failed to back up the requested files


The syslog (/var/log/messages) will also report attempted access beyond the end of the device at the same time:

Jan 22 10:44:37 dn kernel: attempt to access beyond end of device
Jan 22 10:44:37 dn kernel: VxVM12000: rw=0, want=2097656, limit=2097152
 

Cause

With RedHat 6 (RHEL6) and beyond, code under generic_file_aio_read was no longer responsible for checking for and handling the corner case of a direct read which went beyond the end of the device.  blkdev_aio_read was introduced for the block dev code to handle this corner case before calling generic_file_aio_read.  

Storage Foundation 6.x and the renamed InfoScale 7.0.1 are affected, with any version of NetBackup (as the issue lies in the storage layer beneath the level at which NetBackup operates).
 

Solution

SOLUTION

Depending on the volume type that you are backing up, you may need to follow step 1 OR both steps 1 and 2.

1.  If you are backing up a Redhat native raw device, upgrading to the kernel version listed below will resolve the problem (as per https://access.redhat.com/solutions/1191763)
For RHEL 6.x,  upgrade to kernel 2.6.32-504.16.2.el6  (or later)
For RHEL 7.x,  upgrade to kernel 3.10.0-253.el7  (or later)
 
2.  If you are backing up from a VxVM volume, you must upgrade the kernel as per step 1 (above), and also apply Hotfix 002 for VxVM 7.0.1 (VRTSvxvm-7.0.1.002-RHEL6 for VRTSvxvm).

These changes will ensure that the blkdev_aio_read() function will be used instead of generic_file_aio_read() (which cannot handle end of device reads/writes).
 

WORKAROUND

1) If you are unable to apply the solution above (kernel upgrade and VxVM hotfix update), you can get backups to succeed by creating the 'NOSHM' touchfile on the media server, using the following command:

touch /usr/openv/netbackup/NOSHM

Backup performance will be impacted slightly, but the backup will be successful.

2) Another alternative is to backup to a remote media server (not the local media server) - this effectively has the same effect as the NOSHM touchfile, as bpbkar will open a socket to transfer data blocks to bptm (rather than copying read data directly to shared memory).

3) Use a regular file system backup not a raw device.
 

References

Etrack : 3879236 Etrack : 3880027 Etrack : 3866915

Was this content helpful?