NetBackup appliances hang/crash after OS partition fills to capacity due to temporary files and log files
Problem
NetBackup 5230, 5330, 5240, 5340 appliances hang/crash after OS partition fills to capacity due to temporary files and log files.
Cause
The OS partition '/' fills to capacity due to temporary files and log files left behind by appliance processes and NetBackup processes, such as the following two types of files:
1. Appliance processes appending to the following files that grow large over time:
- /MegaSAS.log file
- /root/MegaSAS.log file
- /CmdTool.log file
- /root/CmdTool.log file
- /var/log/CacheRiver/CacheRiver-*.log files
- /hs_err_pid* files
 
2. NetBackup VMware backup host media server appliances in some environments will leave behind /usr/openv/tmp/align@ files.
Solution
Veritas Technologies LLC is aware of these issues.
1. The first of the aforementioned issues is formally resolved in NetBackup Appliance version 3.2. For versions 3.1.2 and below, please contact Veritas Technical Support and reference this KB article along with the following Emergency Engineering Binary (EEB) information:
- Etrack 3982246 for 3.1.1
- Etrack 3992356 for 3.1.2
2. The second of the aforementioned issues (VMware backup host temp files) is resolved in a future as yet undetermined NetBackup version. However, there are EEBs available for the following versions, please contact Veritas Technical Support and reference this KB article along with the following Emergency Engineering Binary (EEB) information: 
- Etrack 3982086 for 3.2
- Etrack 4010914 for 3.1.2
- Etrack 3975101 for 3.1.1
 
To prevent the NetBackup Appliance operating system partition (known as the 'System' partition) from reaching a 100% full state, configure the appliance for SMTP alerts so that the administrator will become aware of a potential space problem. (Appliances can also be configured for SNMP alerts, please refer to the 'About hardware monitoring and alerts' section in the Veritas NetBackup™ Appliance Administrator's Guide for details on Hardware and Software monitoring.)
   Example SMTP configuration steps using CLISH from Main_Menu:Settings> Alerts> Email SMTP Add <Server> <Account> <Password>
Settings> Alerts> Email SenderID Set <Address>
Settings> Alerts> Email Hardware Add <Address>
Settings> Alerts> Email Software Add <Address>
NetBackup Appliances will alert with either a WARNING or ERROR when a disk partition reaches specific usage points. Appliances alert for partition space conditions in the following ways: 
     - 'WARNING' with UMI code V-475-103-1001 when the System partition (as well as /log partition) reaches a 80% but under 98% full state.
     - 'ERROR' with UMI code V-475-103-1000 (and also sends this UMI to callhome if callhome is configured) once the partition reaches 98% full state or higher.
Example WARNING alert when the OS or /log partition usage is greater than 80 but less than 98%:
The following list contains all of the currently unresolved hardware alerts for your appliance hostname_here (VTAS5555555): 
• The partition usage has exceeded warning threshold and will soon reach full capacity. Cleanup the partition and re-check status. If the issue is not resolved, contact Veritas Technical Support for assistance. 
o Time of event: 2017-08-20 15:54:17 (+08:00)
o UMI Event code: V-475-103-1001
o Component Type: Partition
o Component: System
o Status: 81%
o State: WARNING
o Additional information about this error is available at following link:
V-475-103-1001
Example ERROR alert when the OS or /log partition usage is greater than 98%:
The following error(s) are detected on your appliance hostname_here (VTAS5555555): 
* The partition usage has exceeded critical threshold: full capacity imminent. Cleanup the partition and re-check status. If the issue is not resolved, contact Veritas Technical Support for assistance. 
o Time of event: 2018-12-05 18:04:22 (+08:00)
o UMI Event code: V-475-103-1000
o Component Type: Partition
o Component: System
o Status: 99%
o State: ERROR
o Additional information about this error is available at following link:
V-475-103-1000 
If AutoSupport is enabled, this information is automatically transmitted to Veritas for further analysis. 
Producing a DataCollect package before you engage Technical Support may help to expedite the resolution process. For information on how to gather the logs that the DataCollect utility creates, refer to your appliance Administrator's Guide.
