Problem
Handling image cleanup failures for Amazon Glacier vault when vault lock policy is applied to the vault
Cause
Retention period set in the NetBackup policy is less than the period enforced by the vault lock policy applied on the Amazon Glacier vault storage unit. This causes the image cleanup to fail.
Note: There can be instances in which your backup is successful, but either due to an accidental deletion or a ransomware attack, metadata from the S3 bucket is lost. In such scenarios, your metadata and data is still safe in the Amazon Glacier vault.
Now, if you perform image cleanup, even though your cleanup is successful, the orphaned images in the vault are not deleted. To check for such issues, see the bpdm logs for error messages such as gateway: Failed to get image prop by name, error code = 2060044.
If you want to recover image data that is safe in the vault, you can rebuild the S3 metadata cache and then import the images. Contact Veritas Support for data recovery from the vault.
To clean up the data from the vault, see Cleaning up orphaned images manually.
Solution
Complete at least ONE of the following options to avoid storage leak:
Note: The following commands bpplschedrep, nbstl, bppllist, bpstulist, bpexpdate, bpimport, nbdelete, nbdevquery, bpdbjobs can be found in installpath>/netbackup/bin/admincmd.
The log files can be found at <installpath>/netbackup/logs/bpdm.
Option 1: Set the retention level to be greater than the Amazon Glacier vault lock policy. Apply the new retention level value to all images. Import (involves additional cost) the expired images.
Note: If you already know the policy or SLP name, go to step 5.
- Get the disk pool name and media ID from the job details.
Command: bpdbjobs -all_columns -jobid <job_id>
Example: bpdbjobs -all_columns -jobid 4Output example:
granted resource MediaID=@aaaag;DiskVolume=nbu-vault-worm;DiskPool=dp-vault;Path=nbu-vault-worm;StorageServer=amazonfake.com;MediaServer=hostname.veritas.com- Info bpdm (pid=65193) initial volume nbu-vault-worm: Kbytes total capacity: 9007199254740991, used space: 0, free space: 9007199254740991
- Critical bpdm (pid=65193) Storage Server Error: Access denied to delete Amazon Glacier archive. Ensure image retention period is greater than the period enforced by the vault lock policy.
Get the storage unit name from disk pool.
Command: bpstulist | grep -i <diskpool name>
Example: bpstulist | grep -i dp-nbu-vault-worm
Output example:
stu-nbu-vault-worm 0 _STU_NO_DEV_HOST_ 0 -1 -1 1 0 "*NULL*" 1 1 524288 amazonfake.com 10 6 0 0 0 0 dp-nbu-vault-worm *NULL* 2186455
stu2-nbu-vault-worm 0 _STU_NO_DEV_HOST_ 0 -1 -1 1 0 "*NULL*" 0 1 524288 amazonfake.com 10 6 0 0 0 0 dp-nbu-vault-worm *NULL* 2186455Get the list of policies that use the above list of storage units. Note the policy name and schedules.
To get the policy names:
Command: bppllist
Output: pol-non-slp
pol-slp
Example: bppllistTo get the details of each policy:
Command: bppllist <policy_name>
Example: bppllist pol-non-slp
Output example:
CLASS pol-non-slp ...
RES stu-nbu-vault-worm ...
SCHED full 0 1 604800 5 (Retention level of the schedule) 0 0 0 *NULL* 0 0 0 0 0 0 -1 0 0
SCHEDRES *NULL*...
SCHED diff 1 1 604800 6 0 0 0 *NULL* 0 0 0 0 0 0 -1 0 0
SCHEDRES stu2-nbu-vault-worm...
Get the list of SLPs that use the vault storage unit.
To get the list of SLP names:
Command: nbstl -b
Example: nbstl -b
Output Example: slp-wormTo get the details of each SLP and check if the vault storage unit is being used:
Command: nbstl slp-worm -l
Example: nbstl slp-worm -l
Output example:
slp-worm *NULL* 0 0x0 8
0 stu-nbu-vault-worm (Storage Unit) *NULL* *NULL* 0 8 (Retention Level) *NULL* 0 0x0 0 0 *NULL* 1(Operation Index) Default_24x7_Window *NULL* *NULL* ..
1 stu2-nbu-vault-worm *NULL* *NULL* 0 8 *NULL* 0 0x0 1 0 *NULL* 2 Default_24x7_Window *NULL* *NULL* ..
Change the retention level of all the policies using Glacier vault storage unit to be equal to vault lock policy plus 20 days. Ensure to modify only those policies or SLPs whose schedule retention is less than the vault lock policy plus 20 days (buffer period of 20 days ensures that the archives are unlocked before deleting).
Note: Create a custom retention period (if not already created) with number of days that equals the vault lock policy plus 20 days. See https://www.veritas.com/support/en_US/article.100024413.Policies:
Command: bpplschedrep policy_name sched_label -rl <retention_level>
Example: bpplschedrep pol-non-slp full -rl 15- SLPs: Change the retention level only for the storage unit being used.
Command: nbstl storage_lifecycle_name -modify | -modify_current | -modify_version -rl retention_level1, retention_level2, <new retention level>, retention level4 ..., retention level n
Example: nbstl slp-worm -modify -rl 15,8
- Clean the failed image list from NetBackup.
Command: nbdelete -purge_deletion_list -media_id <media_id> -force
Example: nbdelete -purge_deletion_list -media_id @aaaad -force - [Optional if the retention period is equal to vault lock policy] Perform a Phase 1 import on the media ID.
Note: This task will incur additional cost.
For CLI, see https://www.veritas.com/support/en_US/article.100023376.html.
For GUI, see https://www.veritas.com/support/en_US/article.100017201.html. - Perform Phase 2 import for the specific images you require.
Note: This task will incur additional cost to read metadata of all backed up images.
For CLI, see https://www.veritas.com/support/en_US/article.100023376.html.
For GUI, see https://www.veritas.com/support/en_US/article.100017201.html. - Calculate the expiration time of the backed up images for the media ID and apply the value to the respective backed up image.
Note: Get the list of images for the media ID and change the expiration time only if the backed up image expires earlier than its vault lock policy plus 20 days.- To get the stype for the storage server:
Command: nbdevquery -liststs -storage_server <storage_server_name>
Example: nbdevquery -liststs -storage_server amazonfake.com
Output: V7.5 amazonfake.com amazon_raw 9 - To get the list of images:
Command: bpimmedia -mediaid <media_id> -dp <disk_pool_name> -stype <stype>
Example: bpimmedia -mediaid @aaaad -dp dp-nbu-vault-worm -stype amazon_cryptc
Output:
IMAGE hostname.veritas.com hostname.veritas.com_1517932400 pol-slp 0 0 0 - To get detailed information of an image and note the copy number, creation time and expiration time of the image:
Command: bpimagelist -backupid <backup_id>
Example: bpimagelist -backupid hostname.veritas.com_1517932400
Output:
IMAGE hostname.veritas.com 0 0 13 hostname.veritas.com_1517932400 pol-slp 0 *NULL* ......FRAG 1(Copy Number) 1 32 0 0 0 0 @aaaad hostname.veritas.com 262144 0 0 -1 4 1;amazon_raw;amazonfake.com;dp-nbu-vault-worm;nbu-vault-worm;0 1671417000(Expiration) 0 65544 0 0 0 6 1549468400 1517932414(Copy Creation Time) ....
FRAG 2 1 32 0 0 0 0 @aaaad hostname.veritas.com 262144 0 0 -1 4 1;amazon_raw;amazonfake.com;dp-nbu-vault-worm;nbu-vault-worm;0 1549468400 0 65544 0 1 0 6 1549468400 1517933044 .... Change the expiration time if image expires earlier that Vault Lock policy plus 20 days:
Command: bpexpdate -backupid <backup_id> -copy <copy_number> -d <new date mm/dd/yy hh:mm:ss>
where <new date> = creation date + retention period for vault + 20 days. (Example: 02/06/2018 08:00:00 + 3650 + 20 = 02/24/2028 03:53:34)
Example: bpexpdate -backupid hostname.veritas.com_1517932400 -copy 1 -d 02/24/2028 03:53:34
- To get the stype for the storage server:
Option 2: Set the retention level to be greater than the Amazon Glacier Vault lock policy. Apply the new retention level value to all images. Manually clean up the orphaned images from storage.
- Complete steps 1 to 6 from Option 1 to set the retention level to be greater than the Amazon Glacier vault lock policy. Apply the new retention level value to all images.
- See https://www.veritas.com/support/en_US/article.100042314 to manually clean up orphaned images.
Option 3: Keep the retention level as is. Manually clean up the orphaned images from storage.
- Clean the failed image list from NetBackup.
Command: nbdelete -purge_deletion_list -media_id -force - See https://www.veritas.com/support/en_US/article.100042314 to manually clean up orphaned images.