Important Update: Cohesity Products Knowledge Base Articles
All Cohesity Knowledge Base Articles are now managed via the Cohesity Support Portal: https://support.cohesity.com/s/searchunify. The Knowledge Base articles available here will not reflect the latest information or may no longer be accessible.
Problem
When running any backup, the following error is returned: Status 40: network connection broken.
A network connection error may also be returned specifically when trying to perform full backup of a mount point which was of large size.
Error Message
bpbkar FATAL exit status = 40: network connection broken
Enable logging with high verbosity and observe:
BPBKAR log snippets:
20:30:15.294[21395] <16> bpbkar: ERR - bpbkar killed by SIGPIPE
20:30:15.332[21395] <16> bpbkar: ERR - bpbkar FATAL exit status = 40: network connection broken
20:30:15.332[21395] <4> bpbkar: INF - EXIT STATUS 40: network connection broken
20:30:15.355[21395] <4> bpbkar: INF - setenv FINISHED=0
BPTM log snippets:
18:22:09.061[25730] <2> io_init: using 8 data buffers <----*
...
04:17:33.074[25733] <2> fill_buffer: [25730] socket is closed, waited for empty buffer598398 times, delayed 807663 times, read 311454061 Kbytes
...
04:17:33.138[25730] <2> job_monitoring_exex: ACK disconnect
04:17:33.143[25730] <2> job_disconnect: Disconnected
04:17:33.143[25730] <2> db_error_add_to_file: dberrorq.c:midnite =1224475200
04:17:33.163[25730] <16> catch_signal: media manager terminated by parentprocess
...
04:18:11.341[25730] <2> process_tapealert: TapeAlert returned 0x00000000 0x00000000(from io_terminate_tape)
04:18:11.341[25730] <2> catch_signal: EXITING with status 82
The traces from their corresponding logs shows that the bpbkar process was terminated and bptm was also terminated as it didn't receive anything within the timeout range (as defined by the CLIENT_READ_TIMEOUT parameter).
Cause
As found in the bptm logs, the operation was using 8 data buffers, and closing the socket after waiting to get empty buffers for a very long time.
18:22:09.061[25730] <2> io_init: using 8 data buffers <----*
...
04:17:33.074[25733] <2> fill_buffer: [25730] socket is closed, waited for empty buffer598398 times, delayed 807663 times, read 311454061 Kbytes
The concept of buffers can be understood by reviewing Related Articles.
Solution
- Check for a firewall between the servers and clients
- If no performance parameters have been set, run the "ls" command and "bpbkar-dev-null" test command to check for proper functionality:
# ls -ltR <directory_name>
# bpbkar -dt0 -r 888 -nocont <directory_name> > /dev/null
- If no performance parameters have been set, run the "ls" command and "bpbkar-dev-null" test command to check for proper functionality:
- Check for the "tcp_fusion" status
- Set the tcp_time_wait_interval parameter to an optimal value:
# ndd -set/dev/tcp tcp_time_wait_interval <time in milli-seconds> - Increase the CLIENT_READ_TIMEOUT value.