Search <product_name> all support & community content...

VMware backup with "File Level Recovery" fails with Status 636 due to disconnect between Master and Media Server

Article: 100012397

Last Published: 2014-04-14

Ratings: 0 0

Product(s): NetBackup & Alta Data Protection

Problem

When performing a VMware Backup with File Level Recovery, if there are no updates to the Master server for an extended period of time, then the Master and media server socket may be disconnected by a firewall in between. By default, most firewalls will disconnect after 2 hours when there are no updates.

Error Message

Detail Status:

3/31/2014 11:55:45 AM - begin writing 3/31/2014 11:57:09 AM - Info bpbkar32(pid=2844) 0 entries sent to bpdbm read from input socket failed(636) 3/31/2014 7:24:46 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:24:51 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:24:56 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:25:01 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:25:06 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:25:11 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:25:16 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:25:21 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:25:27 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:25:32 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:25:39 PM - Error bpbrm(pid=3644) could not write FILE ADDED message to OUTSOCK 3/31/2014 7:25:45 PM - Error bpbrm(pid=3644) db_FLISTsend failed: no entity was found (227) 3/31/2014 7:27:22 PM - Info bpbkar32(pid=0) bpbkar waited 0 times for empty buffer, delayed 0 times. 3/31/2014 7:27:22 PM - Critical bpbrm(pid=3644) unexpected termination of client VMCLIENT 3/31/2014 7:29:23 PM - Error bpbrm(pid=3644) could not write EXIT STATUS to OUTSOCK 3/31/2014 7:29:23 PM - Info bpbkar32(pid=0) done. status: 227: no entity was found

bpbrm log snippet from the media server:

12:10:23.472 [3644.4264] <2> bpbrm main: ADDED FILES TO DB FOR VMCLIENT_1396281339 250 v4recovery 19:24:46.066 [3644.4264] <2> put_strlen_str: cannot write data to network: An existing connection was forcibly closed by the remote host. 19:24:46.066 [3644.4264] <16> bpbrm main: could not write FILE ADDED message to OUTSOCK 19:24:46.066 [3644.4264] <2> set_job_details: Tfile (263102): LOG 1396308286 16 bpbrm 3644 could not write FILE ADDED message to OUTSOCK

Cause

The issue occurs due to a device (firewall) in between the Master and Media server closing the open socket.

Solution

Identify what is closing the socket and address the issue.
Change the default OS KeepAliveTime to lower than the identified device disconnect time.

For example, if the firewall closes the connection in 2 hours, then a 15 minute OS KeepAliveTime may help avoid the issue.

To keep the firewall from dropping idle sockets, either lengthen the idle socket timeout on the firewall or shorten the TCP keepalive frequency on the hosts on either side of the firewall. The frequency should be less than the idle socket timeout setting on the firewall. The default frequency is 2 hours, which is much too long for most sites. A frequency of 15 minutes is usually appropriate, but use a shorter frequency if needed.


Operating System	Parameter for frequency of probes	Values	Commands
AIX	tcp_keepidle	1,800 half secs	$ no -o tcp_keepidle=1800
HP-UX 11i	tcp_keepalive_interval	900,000 ms	$ ndd -set /dev/tcp tcp_keepalive_interval 900000
Linux	tcp_keepalive_time	900 secs	$ sysctl -w net.ipv4.tcp_keepalive_time=900
Solaris	tcp_keepalive_interval	900,000 ms	$ ndd -set /dev/tcp tcp_keepalive_interval 900000
Windows	KeepAliveTime	900,000 ms	See Related Documentation below.

Related Documentation: NetBackup Backup Planning and Performance Tuning Guide:

NetBackup™ Backup Planning and Performance Tuning Guide (veritas.com)

VMware backup with "File Level Recovery" fails with Status 636 due to disconnect between Master and Media Server

Problem

Error Message

Cause

Solution

Related Knowledge Base Articles

Was this content helpful?

Translated Content

VMware backup with "File Level Recovery" fails with Status 636 due to disconnect between Master and Media Server

Problem

Error Message

Cause

Solution

Related Knowledge Base Articles

Was this content helpful?

Article Languages

Translated Content

Translated Content