Problem
NetBackup for VMware agent backup is failing with Status 11 for one client.
Error Message
Operation Status: 11
system call failed (11)
NetBackup Activity Monitor parent job for backup reports:
06/30/2014 23:23:35 - Info bpbrm (pid=97412) INF - vmwareLogger: WaitForTaskCompleteEx: Unable to access file <unspecified filename> since it is locked <197>
06/30/2014 23:23:35 - Info bpbrm (pid=97412) INF - vmwareLogger: WaitForTaskCompleteEx: SYM_VMC_ERROR: TASK_REACHED_ERROR_STATE
06/30/2014 23:23:35 - Info bpbrm (pid=97412) INF - vmwareLogger: ConsolidateVMDisks: SYM_VMC_ERROR: TASK_REACHED_ERROR_STATE
06/30/2014 23:23:35 - Info bpbrm (pid=97412) INF - vmwareLogger: ConsolidateVMDisksAPI: SYM_VMC_ERROR: TASK_REACHED_ERROR_STATE
06/30/2014 23:23:40 - Info bpfis (pid=97421) done. status: 0
06/30/2014 23:23:40 - end VMware: Delete Snapshot; elapsed time 0:00:26
06/30/2014 23:23:40 - Info bpfis (pid=97421) done. status: 0: the requested operation was successfully completed
06/30/2014 23:23:40 - end writing
Operation Status: 0
Operation Status: 11
system call failed (11)
NetBackup Activity Monitor child job for backup reports:
06/30/2014 23:10:18 - begin writing
06/30/2014 23:11:06 - Critical bpbrm (pid=95625) from client <client_name>: FTL - cleanup() failed, status 11
06/30/2014 23:11:09 - Error bptm (pid=95637) media manager terminated by parent process
06/30/2014 23:11:34 - Info <media_server_name> (pid=95637) StorageServer=PureDisk:<media_server_name>; Report=PDDO Stats for (<media_server_name>): scanned: 3 KB, CR sent: 0 KB, CR sent over FC: 0 KB, dedup: 100.0%, cache disabled
06/30/2014 23:11:35 - Info bpbkar (pid=0) done. status: 11: system call failed
06/30/2014 23:11:35 - end writing; write time: 0:01:17
system call failed (11)
Cause
A stale bpbkar process on the NetBackup VMware backup host has a lock on VMware snapshot related files preventing deletion/consolidation of earlier successfully created snapshot resulting in message:
INF - vmwareLogger: WaitForTaskCompleteEx: Unable to access file <unspecified filename>
Stale bpbkar process was identified on the NetBackup VMware backup host (appliance/Linux) with following command:
# /usr/openv/netbackup/bin/bpps | grep bpbkar
When no backups are running this should report back with no bpbkar processes.
In this case, this resulted in the following output:
#/usr/openv/netbackup/bin # bpps | grep bpbkar
root 92023 1 0 Jun10 ? 00:00:03 bpbkar -r 8035200 -ru root -dt 87099 -to 0 -bpstart_time 1402453164 -clnt <vm_name> -class <policy_name> -sched <schedule_name> -st INCR -bpstart_to 300 -bpend_to 300 -read_to 300 -blks_per_buffer 512 -use_otm -fso -ifr -pid 91994 -mediasvr <media_server_name> -bt 1402452864 -t 1 -b <vm_name>_1402452864 -kl 14 -fi -S <master_server_name> -fim NONE -ct 40 -fscp -S <master_server_name> -storagesvr <media_server_name> -bidlist bid@<policy_name>_<vm_name>_1402452864 -shm
Notice:
1. -clnt <vm_name> is the name of the virtual machine failing with Status 11.
2. Jun10 is the date this bpbkar process and backup were executed.
In this case, this bpbkar process was dated from three weeks prior to the reported failure. This virtual machine had been failing for three weeks, dating back to the time of this stale bpbkar process.
Solution
1. Log into the NetBackup VMware backup host at the command-line at a time when it has no virtual machine backups in progress.
2. Execute the following command to identify any bpbkar processes:
#/usr/openv/netbackup/bin/bpps | grep bpbkar
If no backups are running there should be no bpbkar processes.
3. If a bpbkar process is observed, for example:
#/usr/openv/netbackup/bin # bpps | grep bpbkar
root 92023 1 0 Jun10 ? 00:00:03 bpbkar -r 8035200 -ru root -dt 87099 -to 0 -bpstart_time 1402453164 -clnt <vm_name> -class <policy_name> -sched <schedule_name> -st INCR -bpstart_to 300 -bpend_to 300 -read_to 300 -blks_per_buffer 512 -use_otm -fso -ifr -pid 91994 -mediasvr <media_server_name> -bt 1402452864 -t 1 -b <vm_name>_1402452864 -kl 14 -fi -S <master_server_name> -fim NONE -ct 40 -fscp -S <master_server_name> -storagesvr <media_server_name> -bidlist bid@<policy_name>_<vm_name>_1402452864 -shm
a. Confirm whether this is for the failing virtual machine by checking the output entry -clnt <vm_name>
b. Make sure this is not for a currently running backup as the steps below will cause the backup to fail.
c. Attempt to stop this process by stopping and restarting NetBackup services, when no other backup/restore operations are in progress on this backup host.
d. If stopping and restarting NetBackup services does not kill this process, in the output above locate the process ID (PID), in this case the second entry in the output which is "92023."
e. Kill this process with the relevant operating system command for this backup host. For most Linux operating systems this will be:
kill <PID>
or, in this case:
kill 92023
If the bpps command above still shows this bpbkar process, the following command may work:
kill -9 <PID>
or
kill -9 92023
f. Attempt another backup once the bpps shows no bpbkar processes for this virtual machine.
Applies To
NetBackup 7.6.0.x VMware Agent backups.
VMware backup host is a Linux server or a Veritas NetBackup appliance.
VMware vCenter Server version 5.5.