STATUS CODE 41: Possible causes of exit status 41 (network connection timed out) on NetBackup for Lotus Notes Agent backups

STATUS CODE 41: Possible causes of exit status 41 (network connection timed out) on NetBackup for Lotus Notes Agent backups

  • Article ID:100016299
  • Last Published:
  • Product(s):NetBackup
  • Ratings: 0 0

Problem

STATUS CODE 41: Possible causes of exit status 41 (network connection timed out) on NetBackup for Lotus Notes Agent backups

Error Message

EXIT STATUS 41: network connection timed out

Solution

Overview: There are several causes of status code 41with Lotus Notes backups. This article outlines those causes.

Troubleshooting:
Typical situations where Lotus Notes backupswill fail with status 41 include:

1. Client Read Timeout value set toolow to allow backup processes to complete
2. Lotus Notes ID file incorrectlyconfigured
3. Lotus Notes ID file incorrectly specified in notes.ini
4. Networking issues related to the TcpMaxDataRetransmissions setting in the registry
5. Incorrectconfiguration of the NetBackup (tm) for Lotus Notes extension

Each ofthese potential problems are covered in depth below.

1. Client ReadTimeout value set too low to allow backup processes to complete:

TheNetBackup for Lotus Notes backup agent proceeds through four different processesto back up a database. These processes are visible in the bpbkar log (iflogging is set high enough) as seen below:

First, the agent will locatethe databases:
06/07/02 05:21:39 AM: [441]: INF -FileAction() NSFSearch() Found<database_name>.nsf

Next, the agent will save thedatabase location:
06/07/02 05:22:57 AM: [439]:INF - CopyLocalToMaster() Process object<database_name>.nsf
06/07/0205:22:57 AM: [439]: INF - CopyLocalToMaster() Allocate memory for theobject

Then, the agent will open the database forinterrogation:
06/07/02 05:24:43 AM: [439]: INF -NBLN_FindNextFile() <Enter>

Finally, the agent willtransfer the data to a NetBackup storageunit:
06/07/02 05:24:44 AM: [439]: TAR - Backup:C:\Notes\<database_name>.nsf
06/07/0205:24:44 AM: [439]: INF - read non-blocking message of length1
<snip>
06/07/02 05:31:57AM: [439]: INF - read non-blocking message of length1
06/07/02 05:31:57 AM: [439]: FIL -970574 7 5530 19 33216 root root 970390 1022105311 1022105311 1013440636/C/Notes/<database_name>.nsf

If any of these processestake longer than the configured setting for Client Read Timeout, the jobwill fail.

The bpbrm log will show the timeoutmessage:

17:21:39 [1892.752] <2> bpbrmspawn_child: "D:\Veritas NetBackup\NetBackup\bin\bptm.exe" -w -pid 1892 -cmailatm -den 13 -rt 8 -rn 0 -stunit adicsrvr-dlt-robot-tld-0 -cl NotesTest -bt1013041286 -b mailatm_1013041286 -st 0 -cj 6 -p Notes -ru root -rclnt mailatm-rclnthostname mailatm -rl 1 -rp 1209600 -sl Full -ct 25 -v -mediasvr adicsrvr-jobid 633 -masterversion340000
17:21:39 [1892.752] <2>bpbrm create_mm_terminate: created terminate event pid1860
17:21:39 [1892.752] <2>bpbrm write_continue_backup: wrote CONTINUE BACKUP onCOMM_SOCK
17:21:40 [1892.752]<4> bpbrm main: from client mailatm: TRV - BACKUP 2/6/02 7:19:45 PMmailatm NotesTest Full FULL
17:22:10[1892.752] <2> bpbrm mm_sig: received ready signal from mediamanager
17:31:51 [1892.752] <2>bpbrm readline: bpbrm timeout after 300seconds
17:31:51 [1892.752] <2>bpbrm kill_child_process:start
17:33:13 [1892.752] <2>bpbrm wait_for_child: start
17:33:13[1892.752] <2> bpbrm wait_for_child: child exit_status =150
17:33:13 [1892.752] <2>inform_client_of_status: INF - Server status =41
17:33:18 [1892.752] <4>bpbrm Exit: client backup EXIT STATUS 41: network connection timedout

Resolution:
By increasing the Client ReadTimeout, backups are allowed to proceed through all the backup processes tofinish successfully.

To increase the Client Read Timeout inNetBackup 4.5 using the administrative console, go to the master server and openthe administrative console. Then locate the Lotus Notes server in the Clientssection of the Host Properties area. Right-click on the Notesserver, and select Properties. Go to the Universal Settingstab, and increase the Client Read Timeout value. Depending on the numberof databases, this value may need to be set to 1800 or higher.  

Toincrease the Client Read Timeout in NetBackup 5.0 and 5.1 using theadministrative console, go to the master server and open the administrativeconsole. Then locate the Lotus Notes server in the Clients section of the Host Properties area. Right-click on the Notes server, and select Properties. Go to the Timeouts section, and increase the ClientRead Timeout value. Depending on the number of databases, this value mayneed to be set to 1800 or higher.  

To increase the Client ReadTimeout by modifying the registry, start Regedit by selecting Start | Run | Regedit. Then go to HKEY_LOCAL_MACHINE\SOFTWARE\VERITAS\NetBackup\CurrentVersion\Config. Inthe Config key, find the CLIENT_READ_TIMEOUT value. If it does notexist, create it by selecting Edit | New | DWORD Value, and call the newvalue CLIENT_READ_TIMEOUT. Then open the value, change the base numberfrom hexadecimal to decimal, and then set the value. Depending onthe number of databases, this value may need to be set to 1800 orhigher.  

2. Lotus Notes ID file incorrectlyconfigured:

When backing up Lotus Notes databases, the backup failswith status 41 and the backup consistently "hangs" on the same database. Areview of the bpbkar log file shows the backup is in the NBLN_FindNextFile() portion of the backup, but provides little else interms of detail. This is because the "hang" is on the Lotus side of thebackup.

05/23/02 10:14:12 AM: [153]: INF -CopyLocalToMaster() Get nextobject
05/23/02 10:14:12 AM: [153]:INF - CopyLocalToMaster()<Exit>
05/23/02 10:14:12 AM:[153]: INF - SearchForElements() Buffer size:352176
05/23/02 10:14:12 AM: [153]:INF - SearchForElements()<Exit>
05/23/02 10:14:12 AM:[153]: INF - NBLN_OpenEnumerate()<Exit>
05/23/02 10:14:12 AM:[153]: INF - NBLN_FindNextFile()<Enter>
05/23/02 10:14:13 AM:[153]: INF - NBLN_FindNextFile()<Enter>
05/23/02 10:14:13 AM:[153]: INF - NBLN_FindNextFile()<Enter>
05/23/02 10:14:13 AM:[153]: INF - NBLN_FindNextFile() <Enter>

Please notethere are multiple reasons which can cause a backup to fail at this point( NBLN_FindNextFile), including corrupt databases and client read timeoutsset too low.

Resolution:
By adding the following lines to theserver's notes.ini file, more detailed logs (specifically, a debug.txt file) can be generated byNotes.

DEBUG_OUTFILE=C:\DEBUG.TXT
DEBUG_THREADID=1
DEBUG_CAPTURE_TIMEOUT=1
DEBUG_SHOW_TIMEOUT=1

Pleaserefer to Lotus article 162400 on the Lotus Support site(   https://www-3.ibm.com/software/lotus/support/) for more information about these parameters.

After adding theseparameters to the notes.ini file, reboot the Notes server, and runanother backup. Then review the debug.txt file, searching for the text"password." Something similar to the following should be found in the debug textfile.

[0034:0002-012B] The ID file being usedis:c:\lotus\notes\ids\bfender.id
Enterpassword (press the Esc key to abort): [0190:0002-018F] 07/12/2002 02:13:19 PMPushing mailbox01.nsf to tcpip

Note the ID for which apassword is being requested, and open that ID file from within Lotus Notes. Atthe bottom of the dialog box, there is an option to not prompt for password.This option must beselected.

 

Once this option is selected, saved, and the server rebooted, theproblem should be resolved.

3. Lotus Notes ID file incorrectlyspecified in notes.ini:

In the Lotus Notes notes.ini file,there are two fields which, if present, may need to be changed. The two entriesare KeyFilename and ServerKeyFilename. The problem will occur ifthe ID specified for either of those values does not have necessary permissionsto access the databases on the Notes server. This occurs most often when eitherthe KeyFilename or the ServerKeyFilename is set to some ID fileother than the server.id file.

Resolution:
Search theserver's notes.ini file for the values KeyFilename and ServerKeyFilename. If they are present, change both values to server.id.


4. Networking issues related to theTcpMaxDataRetransmissions setting in the registry:

A review of the bpbkar log file, shows the followingmessages:

12:19:41.700 AM: [93.142] <4>ov_log::OVLoop:Timestamp
12:22:50.372 AM: [93.142]<4> ov_log::OVLoop:Timestamp
12:24:54.950 AM: [93.142]<4> ov_log::OVLoop:Timestamp
12:26:34.044 AM: [93.142]<4> ov_log::OVLoop:Timestamp
12:27:34.825 AM: [93.142]<4> ov_log::OVLoop:Timestamp
12:28:35.763 AM: [93.142]<4> ov_log::OVLoop:Timestamp
12:28:37.606 AM: [93.142]<4> tar_base::V_vTarMsgW: INF - tar message received fromtar_backup::backup_data_state
12:28:37.606AM: [93.142] <2> tar_base::V_vTarMsgW: FTL - tar file write error(10054)
12:28:37.606 AM: [93.142]<4> lotus_access::V_CloseForRead: INF -<Enter>

Resolution:
Increasing the TcpMaxDataRetransmissions value from the default of 5 to 10 allowsbackups to complete successfully. Refer to Microsoft TechArticle 170359 fordetails on how to change this value.  

The text of the article isreproduced below for convenience:

*Begin MicrosoftArticle
-----------------------------------------------------------------------------------------------------------------------------------------
Howto Modify the TCP/IP Maximum Retransmission Timeout
The information inthis article applies to:

   * Microsoft Windows 2000Server
   * Microsoft Windows 2000 AdvancedServer
   * Microsoft Windows 2000 Professional
   *Microsoft Windows 2000 Datacenter Server
   * Microsoft Windows NTWorkstation 4.0
   * Microsoft Windows NT Server 4.0

Thisarticle was previously published under Q170359
IMPORTANT: This articlecontains information about modifying the registry. Before you modify theregistry, make sure to back it up and make sure that you understand how torestore the registry if a problem occurs. For information about how to back up,restore, and edit the registry, click the following article number to view thearticle in the Microsoft Knowledge Base:256986 Description of the MicrosoftWindows Registry

SUMMARY
TCP starts a retransmission timer when eachoutbound segment is handed down to IP. If no acknowledgment has been receivedfor the data in a given segment before the timer expires, then the segment isretransmitted, up to the TcpMaxDataRetransmissions times. The default value forthis parameter is 5.

The retransmission timer is initialized to threeseconds when a TCP connection is established; however, it is adjusted on the flyto match the characteristics of the connection using Smoothed Round Trip Time(SRTT) calculations as described in RFC793. The timer for a given segment isdoubled after each retransmission of that segment. Using this algorithm, TCPtunes itself to the normal delay of a connection. TCP connections overhigh-delay links will take much longer to time out than those over low-delaylinks.

By default, after the retransmission timer hits 240 seconds, ituses that value for retransmission of any segment that needs to beretransmitted. This can be a cause of long delays for a client to time out on aslow link.

For additional information about the latest service pack forWindows 2000, click the article number below to view the article in theMicrosoft Knowledge Base:
260910 - How to Obtain the Latest Windows 2000Service Pack

MORE INFORMATION
Warning: If you use Registry Editorincorrectly, you may cause serious problems that may require you to reinstallyour operating system. Microsoft cannot guarantee that you can solve problemsthat result from using Registry Editor incorrectly. Use Registry Editor at yourown risk.

Windows provides a mechanism to control the initial retransmittime, and then the retransmit time is self-tuning. To change the initialretransmit time, modify the following values in the following registrykey:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
ValueName:  TcpMaxDataRetransmissions
Data Type:   REG_DWORD -Number
Valid Range: 0 - 0xFFFFFFFF
Default:    5

Description: This parameter controls the number of times TCPretransmits an individual data segment (non connect segment) before aborting theconnection. The retransmission timeout is doubled with each successiveretransmission on a connection. It is reset when responses resume. The basetimeout value is dynamically determined by the measured round-trip time on theconnection.
Change the following key in Windows NT4.0:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
ValueName:  InitialRtt
Data Type:   REG_DWORD
Valid Range: 0-65535(decimal)
Default:     0xBB8 (3000 decimal)

Description:This parameter controls the initial retransmission timeout used by TCP on eachnew connection. It applies to the connection request (SYN) and to the first datasegment(s) sent on each connection.

For example, the value data 5000decimal sets the initial retransmit time to five seconds.
Change thefollowing key in Windows2000:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\IDfor Adapter
Value Name:  TCPInitialRtt
Data Type:  REG_DWORD
Valid Range: 3000-65535 (decimal)
Default:     0xBB8(3000 decimal)

Description: This parameter controls the initialretransmission timeout used by TCP on each new connection. It applies to theconnection request (SYN) and to the first data segments sent on each connection.For example, the value data 5000 decimal sets the initial retransmit time tofiveseconds.
------------------------------------------------------------------------------------------------------------------------------------------------------------
*Endof Microsoft Article


Note: You can only increase the valuefor the initial timeout. Decreasing the value is not supported. For additionalinformation about retransmit time, click the article numbers below to view thearticles in the Microsoft Knowledge Base:
232512 - TCP/IP may RetransmitPackets Prematurely
223450 - TCP Initial Retransmission Timer AdjustmentAdded to Windows NT
For additional information, search the Web for RFC 793(Section 3.7) TCP Protocol Specification.


5. Incorrectconfiguration of the NetBackup for Lotus Notes extension:

In the bpbkar log file, the last two lines associated with the backup show thefollowing:

11:38:14.985 AM: [333.379]<4> dos_backup::V_VerifyFileList: INF - Replaced: F:\notes\data with LotusNotes:\F:\notes\data
11:38:14.985 AM:[333.379] <333> nbex_DebugLog: INF - NBLN_Connect() <Enter>NotesIniPath:'F:\notes\notes.ini'

Resolution:
Areview of the NetBackup registry on the Lotus Notes server shows the LOTUS_NOTES_INI and the LOTUS_NOTES_PATH are not configuredcorrectly. These entries are not required for all installations, but if they doexist, the values must be correct for the backup to run successfully. Syntax iscritical.

1. Click Start | Run and type regedt32
2. Select the HKEY_LOCAL_MACHINE key
3. Navigate to SOFTWARE\VERITAS\NetBackup\CurrentVersion\Config
4. Highlight the Config key, and from the Edit menu, click Add Value... toadd a new value
5. Set the Value Name to LOTUS_NOTES_PATH, andselect REG_SZ for the Data Type
6. Click OK to addvalue, and the String Editor dialog box appears
7. For the value data,enter the path to the Notes nserver.exe file; for example, if a search ofthe hard drive for the nserver.exe file determined the file was locatedin D:\Lotus\domino, enter D:\Lotus\domino, and click OK toaccept the value
8. Repeat this process for the LOTUS_NOTES_INI value.Specify both, the directory location as well as the file name; for example, D:\Lotus\domino\notes.ini for the value data.


Was this content helpful?