NetBackup appliance sends alert when the Log Partition has filled up / reached capacity

Article: 100032261
Last Published: 2016-04-13
Ratings: 1 2
Product(s): Appliances

Problem

After upgrading a NetBackup appliance to version 2.7.1 or 2.7.2, the appliance /log partition fills up and reaches 100%.

Error Message

Below is a sample alert message for appliance named nbmaster.bkup.abc.com:

The following list contains all of the currently unresolved hardware alerts for your appliance nbmaster.bkup.abc.com:
•    Partition usage has exceeded critical threshold: full capacity imminent.
o    Time of event: 2016-04-11 15:31:26 (-04:00)
o    UMI Event code: V-475-103-1000
o    Component Type: Partition
o    Component: Log
o    Status: 100%
o    State: ERROR
o    Additional information about this error is available at following link:
V-475-103-1000
If AutoSupport is enabled on your appliance, this information is automatically transmitted to Veritas for further analysis.

Cause

The callhome directories are not getting cleared out which results in the /log/upload directory ballooning to an abnormally large size.

Solution

Executing the command  ' du -h --max-depth=1'  against the /log directory will find the /log/upload directory has ballooned to 100+ GB and therefore taking up a large portion of the 184 Gigabytes of space allotted for the /log partition.

Within this directory, the callhome data is stored and the script is not deleting the old data that is no longer needed.
 
hostname:~ # du -h /log/upload41G     /log/upload/2016_03_22122G    /log/upload/2016_03_21174G    /log/upload

This is causing the /log partition to eventually reach capacity and send alerts to the administrator.  A defect for this behavior has been noted and is scheduled to be fixed in version 2.7.3.  Until the issue has been addressed, a manual fix will need to be followed.  Enter maintenance mode using normal procedures and access the command shell to perform the steps outlined below:

Step 1:
    Delete all files and directories under /log/upload directory

Step 2:
    Delete the rotating logs under /log/autosupport by issuing the following command:
find /log/autosupport/ -name '*[0-9]' | xargs rm -f

Step 3:
    Empty all of the other logs under /log/autosupport by issuing the following command:
cat  /dev/null > filename
    Example:  'cat /dev/null > alertmanager.log'
        The file size should now be 0 for alertmanager and the other logs

Step 4:
    Modify the script /opt/NBUAppliance/scripts/trigger_upload_logs.pl as following:

    Add one line between lines 164 and 165 in the function sub run_utility:
rmtree($dirname);

     Before modifying:
        158 CALLHOME_LOG(HWMON_LOG_DBG,"Executing CallHomeDataGather script");
        159
        160 my %result = execCmd($cmd, TIMEOUT => SHORT_TIMEOUT);
        161 if ($result{"rv"} != 0) {
        162 CALLHOME_LOG(HWMON_LOG_ERROR,
        163 "Error in running CallhomeDataGather Command: %s Output: %s Error: %s Return value %d.",
        164 $cmd, $result{"stdout"}, $result{"stderr"}, $result{"rv"});
        165 return $result{"rv"}; # Return if failed to collect logs
        166 }

     After modifying:
        158 CALLHOME_LOG(HWMON_LOG_DBG,"Executing CallHomeDataGather script");
        159
        160 my %result = execCmd($cmd, TIMEOUT => SHORT_TIMEOUT);
        161 if ($result{"rv"} != 0) {
        162 CALLHOME_LOG(HWMON_LOG_ERROR,
        163 "Error in running CallhomeDataGather Command: %s Output: %s Error: %s Return value %d.",
        164 $cmd, $result{"stdout"}, $result{"stderr"}, $result{"rv"});
        165 rmtree($dirname);
        166 return $result{"rv"}; # Return if failed to collect logs
        167 }

 

References

UMI : V-475-103-1000 Etrack : 3873761

Was this content helpful?