Problem
Backups of the same client using some media servers fail with status 21: socket open failed
Error Message
status 21: socket open failed
Cause
Overview:
This host is a NetBackup client and can be successfully backed up by some remote media servers, but not by other remote media servers and not by itself acting as a NetBackup SAN media server.
Troubleshooting:
The job details confirms that job went active and that bpbrm encountered a problem connecting to the bpcd process on the client host.
4/19/2010 14:23:55 PM - Error bpbrm(pid=9629926) bpcd on myclient exited with status 21: socket open failed 4/19/2010 14:26:00 PM - Error bpbrm(pid=9629926) cannot send mail because BPCD on myclient exited with status 21: socket open failed4/19/2010 14:26:02 PM - end writing
The vnetd debug log on the client host confirms that the media server host can connect to the client host and start the bpcd service.
14:20:34.319 [9621694] <2> ProcessRequests: vnetd.c.288: msg: VNETD ACCEPT FROM 2.2.2.2.59499 TO 2.2.2.2.13724 fd = 414:20:34.344 [9621694] <2> ProcessRequests: vnetd.c.349: msg: Request VN_REQUEST_SERVICE_SOCKET(6)14:20:34.344 [9621694] <2> process_service_socket_plus: vnetd.c.1538: service_name: bpcd
...snip...14:20:34.344 [9621694] <2> launch_command: vnetd.c.2149: path: /usr/openv/netbackup/bin/bpcd
It also shows that the media server host can connect to the client host to create the forwarding socket. But notice that the local process that should attach to the socket did not and vnetd timed out after 120 seconds and exited with status 9.
14:20:34.377 [12439640] <2> ProcessRequests: vnetd.c.288: msg: VNETD ACCEPT FROM 2.2.2.2.59500 TO 2.2.2.2.13724 fd = 414:20:34.554 [12439640] <2> ProcessRequests: vnetd.c.370: msg: Request VN_REQUEST_CONNECT_FORWARD_SOCKET(10)
...snip...14:20:34.584 [12439640] <2> process_connect_forward_socket: vnetd.c.1919: ipc_string: ...14:20:34.584 [12439640] <2> process_connect_forward_socket: vnetd.c.1932: hash_str1: ...14:22:34.764 [12439640] <2> vnet_sock_ready: vnet.c.507: max_time: 120 0x0000007814:22:34.764 [12439640] <2> vnet_sock_ready: vnet.c.508: sock: 5 0x0000000514:22:34.764 [12439640] <2> vnet_sock_ready: vnet.c.509: Function failed: 11 0x0000000b14:22:34.764 [12439640] <2> vnet_accept_from_vnetd: vnet_vnetd.c.1152: status: 0 0x0000000014:22:34.765 [12439640] <2> process_connect_forward_socket: vnetd.c.1986: status: 11 0x0000000b14:22:34.783 [12439640] <2> ProcessRequests: vnetd.c.372: status: 11 0x0000000b14:22:38.330 [12439640] <2> vnet_pop_byte: vnet.c.186: errno: 2 0x0000000214:22:38.330 [12439640] <2> vnet_pop_byte: vnet.c.188: Function failed: 9 0x0000000914:22:38.330 [12439640] <2> vnet_pop_string: vnet.c.268: Function failed: 9 0x0000000914:22:38.330 [12439640] <2> vnet_pop_signed: vnet.c.312: Function failed: 9 0x0000000914:22:38.330 [12439640] <2> ProcessRequests: vnetd.c.298: vnet_pop_signed failed: 9 0x0000000914:22:38.330 [12439640] <2> ListenForConnection: vnetd.c.594: Function failed: 9 0x0000000914:22:38.331 [12439640] <16> main: Terminating with status 9
The bpcd debug log from the client host confirms that the service was started. After validating the connecting host versus the SERVER list, it attempted to attach to the forwarding socket, but was unsuccessful which resulted in the status 21.
14:20:34.371 [9621694] <2> logconnections: BPCD ACCEPT FROM 123.123.123.123.59499 TO 123.123.123.123.13724
...snip...14:20:34.386 [9621694] <2> bpcd peer_hostname: Connection from host myclient (123.123.123.123) port 5949914:20:34.386 [9621694] <2> bpcd valid_server: comparing mymaster and myclient14:20:34.387 [9621694] <2> bpcd valid_server: comparing mm1 and myclient
...snip...14:20:34.466 [9621694] <2> bpcd valid_server: comparing mm2 and myclient14:21:36.329 [9621694] <2> bpcd valid_server: comparing mm3 and myclient14:21:36.331 [9621694] <2> hosts_equal: gethostbyname failed for admin1: NO_ADDRESS (4)14:21:36.331 [9621694] <2> bpcd valid_server: comparing dev1 and myclient14:21:36.332 [9621694] <2> bpcd valid_server: comparing dev2 and myclient
...snip...14:21:36.340 [9621694] <2> bpcd valid_server: comparing dev9 and myclient14:22:38.328 [9621694] <2> bpcd valid_server: comparing qa1 and myclient14:22:38.328 [9621694] <2> bpcd valid_server: comparing myclient and myclient14:22:38.328 [9621694] <4> bpcd valid_server: hostname comparison succeeded14:22:38.329 [9621694] <2> bpcd main: output socket port number = 114:22:38.329 [9621694] <2> vnet_connect_by_vnetd: vnet_vnetd.c.1446: save_errno: 2 0x0000000214:22:38.329 [9621694] <2> vnet_connect_by_vnetd: vnet_vnetd.c.1462: save_errno: 2 0x0000000214:22:38.329 [9621694] <2> get_vnetd_forward_socket: vnet_connect_by_vnetd failed: 1014:22:38.329 [9621694] <16> bpcd main: get_vnetd_forward_socket failed: 21
Notice however, that the server validation sequence above consumed 124 seconds. During that time, the vnetd process holding the forwarding socket timed out and closed the socket, which was then unavailable to bpcd.
Notice also that the server validation included two delays of 62 seconds attempting to perform gethostbyname system calls for two of the servers. The name resolution process used by the operating system scanned several DNS servers before succeeding. There is also a server hostname that cannot be resolved. All of the remote media servers that can backup this client host are at the top of the SERVER list. All of the failing media servers, including the local host, are at the bottom of the SERVER list.
Solution
This problem can be resolved as follows.
- Ensure that the client host can perform successful and timely hostname resolution of all SERVER entries in the bp.conf file.
- Remove any SERVER entries for hosts that should not be connecting to this host.
In this case, the three problem some servers had been moved to another domain and no longer needed to contact this host. Removing the entries was the best solution.