VMware backup with "File Level Recovery" fails with Status 636 due to disconnect between Master and Media Server

Article: 100012397
Last Published: 2014-04-14
Ratings: 0 0
Product(s): NetBackup & Alta Data Protection

Problem

When performing a VMware Backup with File Level Recovery, if there are no updates to the Master server for an extended period of time, then the Master and media server socket may be disconnected by a firewall in between. By default, most firewalls will disconnect after 2 hours when there are no updates.

Error Message

Detail Status:

3/31/2014 11:55:45 AM - begin writing
3/31/2014 11:57:09 AM - Info bpbkar32(pid=2844) 0 entries sent to bpdbm
read from input socket failed(636)
3/31/2014 7:24:46 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:24:51 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:24:56 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:25:01 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:25:06 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:25:11 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:25:16 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:25:21 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:25:27 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:25:32 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:25:39 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK
3/31/2014 7:25:45 PM - Error bpbrm(pid=3644) db_FLISTsend failed: no entity was found (227)
3/31/2014 7:27:22 PM - Info bpbkar32(pid=0) bpbkar waited 0 times for empty buffer, delayed 0 times.
3/31/2014 7:27:22 PM - Critical bpbrm(pid=3644) unexpected termination of client VMCLIENT
3/31/2014 7:29:23 PM - Error bpbrm(pid=3644) could not write EXIT STATUS to OUTSOCK
3/31/2014 7:29:23 PM - Info bpbkar32(pid=0) done. status: 227: no entity was found

bpbrm log snippet from the media server:

12:10:23.472 [3644.4264] <2> bpbrm main: ADDED FILES TO DB FOR VMCLIENT_1396281339 250 v4recovery
19:24:46.066 [3644.4264] <2> put_strlen_str: cannot write data to network: An existing connection was forcibly closed by the remote host.
19:24:46.066 [3644.4264] <16> bpbrm main: could not write FILE ADDED message to OUTSOCK
19:24:46.066 [3644.4264] <2> set_job_details: Tfile (263102): LOG 1396308286 16 bpbrm 3644 could not write FILE ADDED message to OUTSOCK

Cause

The issue occurs due to a device (firewall) in between the Master and Media server closing the open socket.

Solution

  1. Identify what is closing the socket and address the issue.
  2. Change the default OS KeepAliveTime to lower than the identified device disconnect time.

For example, if the firewall closes the connection in 2 hours, then a 15 minute OS KeepAliveTime may help avoid the issue.

To keep the firewall from dropping idle sockets, either lengthen the idle socket timeout on the firewall or shorten the TCP keepalive frequency on the hosts on either side of the firewall. The frequency should be less than the idle socket timeout setting on the firewall. The default frequency is 2 hours, which is much too long for most sites. A frequency of 15 minutes is usually appropriate, but use a shorter frequency if needed.

 
Operating System Parameter for frequency of probes Values Commands
AIX tcp_keepidle 1,800 half secs $ no -o tcp_keepidle=1800
HP-UX 11i tcp_keepalive_interval 900,000 ms $ ndd -set /dev/tcp tcp_keepalive_interval 900000
Linux tcp_keepalive_time 900 secs $ sysctl -w net.ipv4.tcp_keepalive_time=900
Solaris tcp_keepalive_interval 900,000 ms $ ndd -set /dev/tcp tcp_keepalive_interval 900000
Windows KeepAliveTime 900,000 ms See Related Documentation below.

Related Documentation: NetBackup Backup Planning and Performance Tuning Guide:

NetBackup™ Backup Planning and Performance Tuning Guide (veritas.com)

Was this content helpful?