Backup images under Storage Lifecycle Policy (SLP) control are being marked as complete or may actually expire even though the duplication or replication was not done.
Problem
It was observed that a second copy did not exist for certain images that were defined to be under SLP control. Checking bpimagelist output for the image showed it was marked as completed. In some cases the backup image expired without the expected secondary copy(s) being made.
Error Message
The raw unified log for nbstserv, (oid 226), captures one possible reason for the issue:
51216-226-1359191174-150506-0000000002.log:0,51216,226,226,834880,1430903891043,2228,2292,0:,140:Canceling un-started copies for image clientfs01_1430866654 as all the copies have exceeded longest retention period(DiskWorkGroup.cpp:82),16:BuildDiskBatches,1
The vxlogview output of the same nbstserv log above shows the details as follows:
05/06/15 09:18:11.140 [Debug] NB 51216 nbstserv 226 PID:834880 TID:1430903891043 File ID:226 [No context] 1 [BuildDiskBatches] Cancelling un-started copies for image clientfs01_1430866654 as all the copies have exceeded longest retention period(DiskWorkGroup.cpp:82)
Cause
The general purpose of Storage Lifecycle Policy (SLP) is to guarantee the duplication or replication of an image. There are some scenarios however that could interrupt or prevent the duplications from occurring.
Example:
- Canceling an image using the nbstlutil cancel -backupid <backupid> command.
Canceling an image in this manner effectively marks the image complete and any operations that would have taken place had the duplication happened will occur. For example, if the backup step was set to expire on copy the image would be expired when the cancel was performed. Please note it is not possible to prevent the expiration of the image once the cancel is initiated and the backup was set to expire on copy or if the retention of the backup copy has passed.
- If the longest retention of any copy within the SLP policy has been exceeded then NetBackup will cancel the image.
For example, an SLP was defined with a replication step with the retention level for copy 1 (the backup copy) being very short (1 day). The retention level on the source side replication step was also set very short (2 days). If the backup took 2 days to complete or for some reason the replication could not be performed within 2 days (perhaps the destination storage was down?) then the image would not be replicated. Once the retention levels have been exceeded making the copy would not make sense since the copy would just be expired . The NetBackup nbstserv daemon will recognize this so would not waste resources duplicating/replicating an image that would immediately expire.
Note: The nbstlutil command can be located in the following path...
- Unix/Linux path: /usr/openv/netbackup/bin/admincmd/
- Windows path: install_path\NetBackup\bin\admincmd\
Solution
Setting the retention levels correctly to ensure all copies are made before the image will expire will prevent the problem from occurring.
Also ensuring you understand the resulting behavior of the SLP policy if it completed its steps prior to attempting to cancel a backup image will avoid losing a backup image prematurely.