STATUS CODE 41: Possible causes of exit status 41 (network connection timed out) on NetBackup for Lotus Notes Agent backups

STATUS CODE 41: Possible causes of exit status 41 (network connection timed out) on NetBackup for Lotus Notes Agent backups

  • Article ID:100016299
  • Last Published:
  • Product(s):NetBackup

Problem

STATUS CODE 41: Possible causes of exit status 41 (network connection timed out) on NetBackup for Lotus Notes Agent backups

Error Message

EXIT STATUS 41: network connection timed out

Solution

Overview: There are several causes of status code 41 with Lotus Notes backups. This article outlines those causes.

Troubleshooting:
Typical situations where Lotus Notes backups will fail with status 41 include:

1. Client Read Timeout value set too low to allow backup processes to complete
2. Lotus Notes ID file incorrectly configured
3. Lotus Notes ID file incorrectly specified in notes.ini
4. Networking issues related to the TcpMaxDataRetransmissions setting in the registry
5. Incorrect configuration of the NetBackup (tm) for Lotus Notes extension

Each of these potential problems are covered in depth below.

1. Client Read Timeout value set too low to allow backup processes to complete:

The NetBackup for Lotus Notes backup agent proceeds through four different processes to back up a database. These processes are visible in the bpbkar log (if logging is set high enough) as seen below:

First, the agent will locate the databases:
06/07/02 05:21:39 AM: [441]: INF - FileAction() NSFSearch() Found <database_name>.nsf

Next, the agent will save the database location:
06/07/02 05:22:57 AM: [439]: INF - CopyLocalToMaster() Process object <database_name>.nsf
06/07/02 05:22:57 AM: [439]: INF - CopyLocalToMaster() Allocate memory for the object

Then, the agent will open the database for interrogation:
06/07/02 05:24:43 AM: [439]: INF - NBLN_FindNextFile() <Enter>

Finally, the agent will transfer the data to a NetBackup storage unit:
06/07/02 05:24:44 AM: [439]: TAR - Backup: C:\Notes\<database_name>.nsf
06/07/02 05:24:44 AM: [439]: INF - read non-blocking message of length 1
<snip>
06/07/02 05:31:57 AM: [439]: INF - read non-blocking message of length 1
06/07/02 05:31:57 AM: [439]: FIL - 970574 7 5530 19 33216 root root 970390 1022105311 1022105311 1013440636 /C/Notes/<database_name>.nsf

If any of these processes take longer than the configured setting for Client Read Timeout, the job will fail.

The bpbrm log will show the timeout message:

17:21:39 [1892.752] <2> bpbrm spawn_child: "D:\Veritas NetBackup\NetBackup\bin\bptm.exe" -w -pid 1892 -c mailatm -den 13 -rt 8 -rn 0 -stunit adicsrvr-dlt-robot-tld-0 -cl NotesTest -bt 1013041286 -b mailatm_1013041286 -st 0 -cj 6 -p Notes -ru root -rclnt mailatm -rclnthostname mailatm -rl 1 -rp 1209600 -sl Full -ct 25 -v -mediasvr adicsrvr -jobid 633 -masterversion 340000
17:21:39 [1892.752] <2> bpbrm create_mm_terminate: created terminate event pid 1860
17:21:39 [1892.752] <2> bpbrm write_continue_backup: wrote CONTINUE BACKUP on COMM_SOCK
17:21:40 [1892.752] <4> bpbrm main: from client mailatm: TRV - BACKUP 2/6/02 7:19:45 PM mailatm NotesTest Full FULL
17:22:10 [1892.752] <2> bpbrm mm_sig: received ready signal from media manager
17:31:51 [1892.752] <2> bpbrm readline: bpbrm timeout after 300 seconds
17:31:51 [1892.752] <2> bpbrm kill_child_process: start
17:33:13 [1892.752] <2> bpbrm wait_for_child: start
17:33:13 [1892.752] <2> bpbrm wait_for_child: child exit_status = 150
17:33:13 [1892.752] <2> inform_client_of_status: INF - Server status = 41
17:33:18 [1892.752] <4> bpbrm Exit: client backup EXIT STATUS 41: network connection timed out

Resolution:
By increasing the Client Read Timeout, backups are allowed to proceed through all the backup processes to finish successfully.

To increase the Client Read Timeout in NetBackup 4.5 using the administrative console, go to the master server and open the administrative console. Then locate the Lotus Notes server in the Clients section of the Host Properties area. Right-click on the Notes server, and select Properties. Go to the Universal Settings tab, and increase the Client Read Timeout value. Depending on the number of databases, this value may need to be set to 1800 or higher.  

To increase the Client Read Timeout in NetBackup 5.0 and 5.1 using the administrative console, go to the master server and open the administrative console. Then locate the Lotus Notes server in the Clients section of the Host Properties area. Right-click on the Notes server, and select Properties. Go to the Timeouts section, and increase the Client Read Timeout value. Depending on the number of databases, this value may need to be set to 1800 or higher.  

To increase the Client Read Timeout by modifying the registry, start Regedit by selecting Start | Run | Regedit. Then go to HKEY_LOCAL_MACHINE\SOFTWARE\VERITAS\NetBackup\CurrentVersion\Config. In the Config key, find the CLIENT_READ_TIMEOUT value. If it does not exist, create it by selecting Edit | New | DWORD Value, and call the new value CLIENT_READ_TIMEOUT. Then open the value, change the base number from hexadecimal to decimal, and then set the value. Depending on the number of databases, this value may need to be set to 1800 or higher.  

2. Lotus Notes ID file incorrectly configured:

When backing up Lotus Notes databases, the backup fails with status 41 and the backup consistently "hangs" on the same database. A review of the bpbkar log file shows the backup is in the NBLN_FindNextFile() portion of the backup, but provides little else in terms of detail. This is because the "hang" is on the Lotus side of the backup.

05/23/02 10:14:12 AM: [153]: INF - CopyLocalToMaster() Get next object
05/23/02 10:14:12 AM: [153]: INF - CopyLocalToMaster() <Exit>
05/23/02 10:14:12 AM: [153]: INF - SearchForElements() Buffer size: 352176
05/23/02 10:14:12 AM: [153]: INF - SearchForElements() <Exit>
05/23/02 10:14:12 AM: [153]: INF - NBLN_OpenEnumerate() <Exit>
05/23/02 10:14:12 AM: [153]: INF - NBLN_FindNextFile() <Enter>
05/23/02 10:14:13 AM: [153]: INF - NBLN_FindNextFile() <Enter>
05/23/02 10:14:13 AM: [153]: INF - NBLN_FindNextFile() <Enter>
05/23/02 10:14:13 AM: [153]: INF - NBLN_FindNextFile() <Enter>

Please note there are multiple reasons which can cause a backup to fail at this point ( NBLN_FindNextFile), including corrupt databases and client read timeouts set too low.

Resolution:
By adding the following lines to the server's notes.ini file, more detailed logs (specifically, a debug.txt file) can be generated by Notes.

DEBUG_OUTFILE=C:\DEBUG.TXT
DEBUG_THREADID=1
DEBUG_CAPTURE_TIMEOUT=1
DEBUG_SHOW_TIMEOUT=1

Please refer to Lotus article 162400 on the Lotus Support site (   https://www-3.ibm.com/software/lotus/support/ ) for more information about these parameters.

After adding these parameters to the notes.ini file, reboot the Notes server, and run another backup. Then review the debug.txt file, searching for the text "password." Something similar to the following should be found in the debug text file.

[0034:0002-012B] The ID file being used is: c:\lotus\notes\ids\bfender.id
Enter password (press the Esc key to abort): [0190:0002-018F] 07/12/2002 02:13:19 PM Pushing mailbox01.nsf to tcpip

Note the ID for which a password is being requested, and open that ID file from within Lotus Notes. At the bottom of the dialog box, there is an option to not prompt for password. This option must be selected.

 

Once this option is selected, saved, and the server rebooted, the problem should be resolved.

3. Lotus Notes ID file incorrectly specified in notes.ini:

In the Lotus Notes notes.ini file, there are two fields which, if present, may need to be changed. The two entries are KeyFilename and ServerKeyFilename. The problem will occur if the ID specified for either of those values does not have necessary permissions to access the databases on the Notes server. This occurs most often when either the KeyFilename or the ServerKeyFilename is set to some ID file other than the server.id file.

Resolution:
Search the server's notes.ini file for the values KeyFilename and ServerKeyFilename. If they are present, change both values to server.id.


4. Networking issues related to the TcpMaxDataRetransmissions setting in the registry:

A review of the bpbkar log file, shows the following messages:

12:19:41.700 AM: [93.142] <4> ov_log::OVLoop: Timestamp
12:22:50.372 AM: [93.142] <4> ov_log::OVLoop: Timestamp
12:24:54.950 AM: [93.142] <4> ov_log::OVLoop: Timestamp
12:26:34.044 AM: [93.142] <4> ov_log::OVLoop: Timestamp
12:27:34.825 AM: [93.142] <4> ov_log::OVLoop: Timestamp
12:28:35.763 AM: [93.142] <4> ov_log::OVLoop: Timestamp
12:28:37.606 AM: [93.142] <4> tar_base::V_vTarMsgW: INF - tar message received from tar_backup::backup_data_state
12:28:37.606 AM: [93.142] <2> tar_base::V_vTarMsgW: FTL - tar file write error (10054)
12:28:37.606 AM: [93.142] <4> lotus_access::V_CloseForRead: INF - <Enter>

Resolution:
Increasing the TcpMaxDataRetransmissions value from the default of 5 to 10 allows backups to complete successfully. Refer to Microsoft TechArticle 170359 for details on how to change this value.  

The text of the article is reproduced below for convenience:

*Begin Microsoft Article
-----------------------------------------------------------------------------------------------------------------------------------------
How to Modify the TCP/IP Maximum Retransmission Timeout
The information in this article applies to:

   * Microsoft Windows 2000 Server
   * Microsoft Windows 2000 Advanced Server
   * Microsoft Windows 2000 Professional
   * Microsoft Windows 2000 Datacenter Server
   * Microsoft Windows NT Workstation 4.0
   * Microsoft Windows NT Server 4.0

This article was previously published under Q170359
IMPORTANT: This article contains information about modifying the registry. Before you modify the registry, make sure to back it up and make sure that you understand how to restore the registry if a problem occurs. For information about how to back up, restore, and edit the registry, click the following article number to view the article in the Microsoft Knowledge Base:256986 Description of the Microsoft Windows Registry

SUMMARY
TCP starts a retransmission timer when each outbound segment is handed down to IP. If no acknowledgment has been received for the data in a given segment before the timer expires, then the segment is retransmitted, up to the TcpMaxDataRetransmissions times. The default value for this parameter is 5.

The retransmission timer is initialized to three seconds when a TCP connection is established; however, it is adjusted on the fly to match the characteristics of the connection using Smoothed Round Trip Time (SRTT) calculations as described in RFC793. The timer for a given segment is doubled after each retransmission of that segment. Using this algorithm, TCP tunes itself to the normal delay of a connection. TCP connections over high-delay links will take much longer to time out than those over low-delay links.

By default, after the retransmission timer hits 240 seconds, it uses that value for retransmission of any segment that needs to be retransmitted. This can be a cause of long delays for a client to time out on a slow link.

For additional information about the latest service pack for Windows 2000, click the article number below to view the article in the Microsoft Knowledge Base:
260910 - How to Obtain the Latest Windows 2000 Service Pack

MORE INFORMATION
Warning: If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk.

Windows provides a mechanism to control the initial retransmit time, and then the retransmit time is self-tuning. To change the initial retransmit time, modify the following values in the following registry key:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
Value Name:  TcpMaxDataRetransmissions
Data Type:   REG_DWORD - Number
Valid Range: 0 - 0xFFFFFFFF
Default:     5

Description: This parameter controls the number of times TCP retransmits an individual data segment (non connect segment) before aborting the connection. The retransmission timeout is doubled with each successive retransmission on a connection. It is reset when responses resume. The base timeout value is dynamically determined by the measured round-trip time on the connection.
Change the following key in Windows NT 4.0:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
Value Name:  InitialRtt
Data Type:   REG_DWORD
Valid Range: 0-65535 (decimal)
Default:     0xBB8 (3000 decimal)

Description: This parameter controls the initial retransmission timeout used by TCP on each new connection. It applies to the connection request (SYN) and to the first data segment(s) sent on each connection.

For example, the value data 5000 decimal sets the initial retransmit time to five seconds.
Change the following key in Windows 2000:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\ID for Adapter
Value Name:  TCPInitialRtt
Data Type:   REG_DWORD
Valid Range: 3000-65535 (decimal)
Default:     0xBB8 (3000 decimal)

Description: This parameter controls the initial retransmission timeout used by TCP on each new connection. It applies to the connection request (SYN) and to the first data segments sent on each connection. For example, the value data 5000 decimal sets the initial retransmit time to five seconds.
------------------------------------------------------------------------------------------------------------------------------------------------------------
*End of Microsoft Article


Note: You can only increase the value for the initial timeout. Decreasing the value is not supported. For additional information about retransmit time, click the article numbers below to view the articles in the Microsoft Knowledge Base:
232512 - TCP/IP may Retransmit Packets Prematurely
223450 - TCP Initial Retransmission Timer Adjustment Added to Windows NT
For additional information, search the Web for RFC 793 (Section 3.7) TCP Protocol Specification.


5. Incorrect configuration of the NetBackup for Lotus Notes extension:

In the bpbkar log file, the last two lines associated with the backup show the following:

11:38:14.985 AM: [333.379] <4> dos_backup::V_VerifyFileList: INF - Replaced: F:\notes\data with Lotus Notes:\F:\notes\data
11:38:14.985 AM: [333.379] <333> nbex_DebugLog: INF - NBLN_Connect() <Enter> NotesIniPath:'F:\notes\notes.ini'

Resolution:
A review of the NetBackup registry on the Lotus Notes server shows the LOTUS_NOTES_INI and the LOTUS_NOTES_PATH are not configured correctly. These entries are not required for all installations, but if they do exist, the values must be correct for the backup to run successfully. Syntax is critical.

1. Click Start | Run and type regedt32
2. Select the HKEY_LOCAL_MACHINE key
3. Navigate to SOFTWARE\VERITAS\NetBackup\CurrentVersion\Config
4. Highlight the Config key, and from the Edit menu, click Add Value... to add a new value
5. Set the Value Name to LOTUS_NOTES_PATH, and select REG_SZ for the Data Type
6. Click OK to add value, and the String Editor dialog box appears
7. For the value data, enter the path to the Notes nserver.exe file; for example, if a search of the hard drive for the nserver.exe file determined the file was located in D:\Lotus\domino, enter D:\Lotus\domino, and click OK to accept the value
8. Repeat this process for the LOTUS_NOTES_INI value. Specify both, the directory location as well as the file name; for example, D:\Lotus\domino\notes.ini for the value data.


Was this content helpful?