Discrepancy in the sizes of storage pools used for AIR replication

Article: 100011123
Last Published: 2018-02-02
Ratings: 0 0
Product(s): NetBackup

Problem

In an AIR (automatic image replication) environment, one of either the source or target storage pools has more disk space used than the other. 

Cause

There are several possible causes to this.

1. Backups are taken outside of the AIR replication or import SLPs (storage lifecycle policies) which would cause the storage pools to not be mirror images of each other. A common example of this would be catalog backups and or others with retentions short enough to not need replication. This can be easily checked with running the below command on both master servers and comparing the outputs to see which images exist in one NBU catalog but not the other.

# /usr/openv/netbackup/bin/admincmd/bpimmedia -stype PureDisk -dp $diskpoolnamehere | grep IMAGE > /tmp/images.out

If there are images unique to one site, that does not necessarily indicate a problem. If the backups were taken but not replicated yet, this could cause a discrepancy and could be interpreted as a false positive. Because of this, check for images when there are no backups or replications in process or disregard images less than twelve hours old.

2. The retention periods on the images that are replicated are different than they are on the source storage pool. This would cause more data to be retained on one side than the other and increase the space used. It is simple to verify the current retention levels by checking the SLP configuration for replications (source) and imports (target) on the master servers. However, if there is a concern that they may have been changed, it would be best to check the retention of each image manually after confirming step 1 above is not a factor. 

# /usr/openv/netbackup/bin/admincmd/bpimage -backupid $backupidhere | grep Retention

Check to see if the retention levels of the images match on both sides. This can be done more easily through the command line using the "diff" command to show only unique strings but can be done through utilities such as Notepad++ or TextPad.

3. RecoverCR may have been run on one of the storage pools previously. This could create backups of container files used to store data which are often very large. If they are not needed, they can be removed. It is recommended to contact support before removing files manually from the storage pool as moving files that are depended upon by services can cause data loss.

4. Log rotation is not functioning as it should; causing log files to not be compressed and removed automatically and using space. A simple way to verify this is to calculate the size of all the log files in the deduplication file system and compare between the two environments.

# find /Storage/. -name *log* -exec du -k {} \; | awk '{total+=$1}END{print total}'

5. The source and destination storage pools are not on the same NetBackup or PureDisk version and/or do not have the same EEBs installed. If this is true, upgrade to the latest versions and upgrade the target storage server first. 

6. There is a CRQP (content router queue processing) backlog on one of the content routers, leading to a false space statistic. To check this, verify that the oldest transaction log (also called t-log) is not older than twelve hours as that is the default interval for CRQP. 

MSDP: # /usr/openv/pdde/pdcr/bin/crcontrol --queueinfo 
PureDisk: # /opt/pdcr/bin/crcontrol --queueinfo

If the oldest t-log is greater than twelve hours old, check to see if CRQP is running:

MSDP: # /usr/openv/pdde/pdcr/bin/crcontrol --processqueueinfo
PureDisk: # /opt/pdcr/bin/crcontrol --processqueueinfo

If it is not active, manually start it:

MSDP: # /usr/openv/pdde/pdcr/bin/crcontrol --processqueue
PureDisk: # /opt/pdcr/bin/crcontrol --processqueue

If CRQP does not complete on its own as expected, contact Symantec Technical Support for more troubleshooting. Do not move, rename, or delete transaction logs as it may cause data loss. It is also not recommended to restart MSDP or PureDisk related services during CRQP as transactions could be lost or corrupted which could also lead to data loss and longer startup times. 

7. There is a large amount of data waiting to be compacted on a CR (content router). This is most easily checked by running a 'crcontrol --dsstat' on the storage servers and searching for the "Space needs compaction" line. This gives an approximation of "white space" in the containers. For more information on how data is stored in containers and how the compaction process operates, see TECH124914 which is linked below. 

8. PDDODataRemoval is not functioning on the MBE (metabase engine). This is a rare condition and if suspected, contact Symantec Technical Support.

9. Orphaned images could be consuming space needlessly, usually caused by changes made outside of the SLP configuration or storage leaks. Care should be taken if replications or imports are in progress as images that are replicated but haven't been imported yet aren't known to NetBackup and are considered as orphans. Failure to take this into account could lead to accidental data loss. If this condition is suspected, contact Symantec Technical Support.

10. There could be remnants of patches and old log files from previous troubleshooting that are consuming space. To look for files that can be deleted, generate a list of all files on the storage pool and sort by their file sizes in descending order.

# find /Storage/. -type f -exec du -k {} \; | egrep -v "bin|bhd" | sort -nr > /tmp/storage_file_list.out

If patch files (usually with a .tar or .tar.gz extension) exist, they can be removed. If in doubt, contact Symantec Technical Support before manually removing files from the storage file system.

 

 

Solution

The solution is ultimately dependent upon cause. Before troubleshooting in depth, it's recommended to apply the latest patches for NetBackup and/or PureDisk as some issues may have been fixed between versions. If none of the possible causes above appear relevant, contact Veritas Technical Support to create a case. 

 

Applies To

This could be seen on NetBackup versions 7.1 and higher where AIR is used. All versions of PureDisk 6.6 and higher when used in a PDDO (PureDisk Deduplication Option) configuration may also be affected.

Was this content helpful?