Problem
True Image Restore (TIR) enabled Incremental backups, may produce the following message in Activity Monitor:
5/26/2014 2:01:32 PM - Warning bpbrm(pid=1968) from client MyClient: WRN - old TIR info file 'C:\Program Files\Veritas\NetBackup\tir_info\C\NetBackup_file_info.MYCLIENT' is missing. Backing up everything in 'C:'
As stated in the message, the Incremental job behaves like a Full job for the drive letter in question.
Cause
There are 2 requirements which TIR needs from the Full job in order for the Incremental job to function properly :
1. The Full job must finish with a Status 0
2. At the end of the Full job, once the bpbkar process (on the client) informs bpbrm (on the media server) that the job completed with Status 0, bpbkar must recieve acknowledgement from bpbrm that bpbrm received the Status 0.
If either of these two points fail to occur, the "TIR info" file will not be properly renamed at the end of the Full job, and the Incremental job will produce the message seen above.
Solution
As per TIR requirement 1, if the Full job is not finishing with Status 0, identify root-cause of the non-0 status code, and address it accordingly.
Example:
If a busy file is being skipped, and the job is partially successful, consider excluding the busy file.
As per TIR requirement 2, if the Full job is finishing with Status 0, you need to collect logs and identify what is happening with reporting the Status Code between bpbkar and bpbrm
At the end of the job, the bpbkar log (at General Log Level 2) shows the communication of the Status Code to the Media Server (bpbrm):
<2> tar_base::backup_finish: TAR - backup: 156 files
<2> tar_base::backup_finish: TAR - backup: file data: 42891321 bytes
<2> tar_base::backup_finish: TAR - backup: image data: 43281408 bytes
<2> tar_base::backup_finish: TAR - backup: elapsed time: 13 secs 3329339 bps
<2> tar_base::V_vTarMsgW: INF - Client completed sending data for backup
<2> tar_base::V_vTarMsgW: INF - EXIT STATUS 0: the requested operation was successfully completed
The bpbrm log on the Media Server acknowledges it understands the final Status Code to bpbkar on the Client:
<2> inform_client_of_status: INF - Server status = 0
The acknowledgement is noted in the bpbkar log:
<4> tar_backup::readServerMessage: INF - 'INF - Server status = 0' received
When the above line does not appear in the bpbkar log, the 'TIR info' file is not properly renamed and the subsequent Incremental will run as a Full.
Troubleshooting tips for TIR requirement 2:
1. Check the bpbrm log to see if the "inform_client_of_status" operation was attempted. If it was not attempted, identify why and address the situation.
2. If "inform_client_of_status" was attempted, yet still does not appear in the bpbkar log, there is likely a network/socket related reason. Look for errors in both the bpbrm and bpbkar logs.
Example:
<16> dtcp_read: TCP - failure: recv socket (380) (TCP 10054: Connection reset by peer)
The following 2 TCP related registry additions/changes have been known to correct this situation - apply to the Client, and to the Media Server (if it is Windows):
Reference: www.microsoft.com/en-us/download/details.aspx
***** TcpMaxDataRetransmissions *****
Description: This parameter controls the number of times that TCP retransmits an individual data segment before aborting the connection. The retransmission time-out is doubled with each successive retransmission on a connection.
1. Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
2. Create a new REG-DWORD entry named TcpMaxDataRetransmissions
3. Give it a value of Decimal 10
4. A reboot is required to make this setting active
***** KeepAliveTime *****
Description: The parameter controls how often TCP attempts to verify that an idle connection is still intact by sending a keep-alive packet. If the remote system is still reachable and functioning, it acknowledges the keep-alive transmission
1. Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
2. Create a new REG-DWORD entry named KeepAliveTime
3. Give it a value of Decimal 900000
4. A reboot is required to make this setting active