Backups of the same client using some media servers fail with status 21: socket open failed

Article: 100001648
Last Published: 2018-03-30
Ratings: 0 2
Product(s): NetBackup

Problem

Backups of the same client using some media servers fail with status 21: socket open failed

Error Message

status 21: socket open failed

Cause

Overview:

This host is a NetBackup client and can be successfully backed up by some remote media servers, but not by other remote media servers and not by itself acting as a NetBackup SAN media server.

Troubleshooting:

The job details confirms that job went active and that bpbrm encountered a problem connecting to the bpcd process on the client host.

4/19/2010 14:23:55 PM - Error bpbrm(pid=9629926) bpcd on myclient exited with status 21: socket open failed  
4/19/2010 14:26:00 PM - Error bpbrm(pid=9629926) cannot send mail because BPCD on myclient exited with status 21: socket open failed
4/19/2010 14:26:02 PM - end writing

The vnetd debug log on the client host confirms that the media server host can connect to the client host and start the bpcd service.

14:20:34.319 [9621694] <2> ProcessRequests: vnetd.c.288: msg: VNETD ACCEPT FROM 2.2.2.2.59499 TO 2.2.2.2.13724 fd = 4
14:20:34.344 [9621694] <2> ProcessRequests: vnetd.c.349: msg: Request VN_REQUEST_SERVICE_SOCKET(6)
14:20:34.344 [9621694] <2> process_service_socket_plus: vnetd.c.1538: service_name: bpcd
...snip...
14:20:34.344 [9621694] <2> launch_command: vnetd.c.2149: path: /usr/openv/netbackup/bin/bpcd

It also shows that the media server host can connect to the client host to create the forwarding socket. But notice that the local process that should attach to the socket did not and vnetd timed out after 120 seconds and exited with status 9.

14:20:34.377 [12439640] <2> ProcessRequests: vnetd.c.288: msg: VNETD ACCEPT FROM 2.2.2.2.59500 TO 2.2.2.2.13724 fd = 4
14:20:34.554 [12439640] <2> ProcessRequests: vnetd.c.370: msg: Request VN_REQUEST_CONNECT_FORWARD_SOCKET(10)
...snip...
14:20:34.584 [12439640] <2> process_connect_forward_socket: vnetd.c.1919: ipc_string: ...
14:20:34.584 [12439640] <2> process_connect_forward_socket: vnetd.c.1932: hash_str1: ...
14:22:34.764 [12439640] <2> vnet_sock_ready: vnet.c.507: max_time: 120 0x00000078
14:22:34.764 [12439640] <2> vnet_sock_ready: vnet.c.508: sock: 5 0x00000005
14:22:34.764 [12439640] <2> vnet_sock_ready: vnet.c.509: Function failed: 11 0x0000000b
14:22:34.764 [12439640] <2> vnet_accept_from_vnetd: vnet_vnetd.c.1152: status: 0 0x00000000
14:22:34.765 [12439640] <2> process_connect_forward_socket: vnetd.c.1986: status: 11 0x0000000b
14:22:34.783 [12439640] <2> ProcessRequests: vnetd.c.372: status: 11 0x0000000b
14:22:38.330 [12439640] <2> vnet_pop_byte: vnet.c.186: errno: 2 0x00000002
14:22:38.330 [12439640] <2> vnet_pop_byte: vnet.c.188: Function failed: 9 0x00000009
14:22:38.330 [12439640] <2> vnet_pop_string: vnet.c.268: Function failed: 9 0x00000009
14:22:38.330 [12439640] <2> vnet_pop_signed: vnet.c.312: Function failed: 9 0x00000009
14:22:38.330 [12439640] <2> ProcessRequests: vnetd.c.298: vnet_pop_signed failed: 9 0x00000009
14:22:38.330 [12439640] <2> ListenForConnection: vnetd.c.594: Function failed: 9 0x00000009
14:22:38.331 [12439640] <16> main: Terminating with status 9

The bpcd debug log from the client host confirms that the service was started. After validating the connecting host versus the SERVER list, it attempted to attach to the forwarding socket, but was unsuccessful which resulted in the status 21.

14:20:34.371 [9621694] <2> logconnections: BPCD ACCEPT FROM 123.123.123.123.59499 TO 123.123.123.123.13724
...snip...
14:20:34.386 [9621694] <2> bpcd peer_hostname: Connection from host myclient (123.123.123.123) port 59499
14:20:34.386 [9621694] <2> bpcd valid_server: comparing mymaster and myclient
14:20:34.387 [9621694] <2> bpcd valid_server: comparing mm1 and myclient
...snip...
14:20:34.466 [9621694] <2> bpcd valid_server: comparing mm2 and myclient
14:21:36.329 [9621694] <2> bpcd valid_server: comparing mm3 and myclient
14:21:36.331 [9621694] <2> hosts_equal: gethostbyname failed for admin1: NO_ADDRESS (4)
14:21:36.331 [9621694] <2> bpcd valid_server: comparing dev1 and myclient
14:21:36.332 [9621694] <2> bpcd valid_server: comparing dev2 and myclient
...snip...
14:21:36.340 [9621694] <2> bpcd valid_server: comparing dev9 and myclient
14:22:38.328 [9621694] <2> bpcd valid_server: comparing qa1 and myclient
14:22:38.328 [9621694] <2> bpcd valid_server: comparing myclient and myclient
14:22:38.328 [9621694] <4> bpcd valid_server: hostname comparison succeeded
14:22:38.329 [9621694] <2> bpcd main: output socket port number = 1
14:22:38.329 [9621694] <2> vnet_connect_by_vnetd: vnet_vnetd.c.1446: save_errno: 2 0x00000002
14:22:38.329 [9621694] <2> vnet_connect_by_vnetd: vnet_vnetd.c.1462: save_errno: 2 0x00000002
14:22:38.329 [9621694] <2> get_vnetd_forward_socket: vnet_connect_by_vnetd failed: 10
14:22:38.329 [9621694] <16> bpcd main: get_vnetd_forward_socket failed: 21

Notice however, that the server validation sequence above consumed 124 seconds. During that time, the vnetd process holding the forwarding socket timed out and closed the socket, which was then unavailable to bpcd.

Notice also that the server validation included two delays of 62 seconds attempting to perform gethostbyname system calls for two of the servers. The name resolution process used by the operating system scanned several DNS servers before succeeding. There is also a server hostname that cannot be resolved. All of the remote media servers that can backup this client host are at the top of the SERVER list.  All of the failing media servers, including the local host, are at the bottom of the SERVER list.

Solution

This problem can be resolved as follows.

  1. Ensure that the client host can perform successful and timely hostname resolution of all SERVER entries in the bp.conf file.
  2. Remove any SERVER entries for hosts that should not be connecting to this host.

In this case, the three problem some servers had been moved to another domain and no longer needed to contact this host. Removing the entries was the best solution.

Was this content helpful?