Backup job completes with: errno = 62 - Timer expired status: 41: network connection timed out

Article: 100043604
Last Published: 2019-10-01
Ratings: 2 3
Product(s): NetBackup & Alta Data Protection

Problem

A backup of a Windows Client regularly completes with the following Job Details within Activity Monitor:

<date> 10:39:13 - Info bpbrm (pid=108186) starting bpbkar on client
<date> 10:39:13 - connected; connect time: 0:00:00
<date> 10:44:13 - Error bpbrm (pid=108186) socket read failed: errno = 62 - Timer expired
<date> 10:44:13 - Info bpbkar (pid=0) done. status: 13: file read failed
<date> 10:44:14 - end writing
file read failed  (13)

...or:

<date> 12:30:47 - Info bpbrm (pid=584) starting bpbkar32 on client
<date> 12:30:47 - connected; connect time: 0:00:00
<date> 12:35:47 - Info bpbkar32 (pid=0) done. status: 41: network connection timed out
<date> 12:35:47 - end writing
termination requested by administrator  (150)

Error Message

errno = 62 - Timer expired
status: 13: file read failed
status: 41: network connection timed out

Cause

The above error codes occur when the media server's bpbrm process does not receive a response from the client's bpbkar process for longer than the configured CLIENT_READ_TIMEOUT threshold.

In this instance, the client was a node in a Cluster.

Investigation of the client side bpbkar log (at General Log Level 2) reveals an excessive amount of time waiting for "GetServerType for local machine" to return information on each volume.

Example observed for a single volume:

10:40:40.670 [10164.11100] <2> ov_log::V_GlobalLog: INF - 'C:\' NTFS: {ENCRYPTION} {COMPRESSION} {SUPPORTS_ATTRIBUTES} {DATES} {ACCESS DATES} {DATA_SECURITY} {UNICODE} {CASE_PRESERVING} f=0x0B111479 x=0x06000000 volflags=0x03E700FF
10:41:21.980 [10164.11100] <2> ov_log::V_GlobalLog: INF - GetServerType for local machine

Note: The above operation runs for each drive letter.

It can be observed that the quantity of drive letters causes the bpbkar process to spend over 8 minutes performing this operation.  This duration exceeds the default 300 second threshold of CLIENT_READ_TIMEOUT.

Among the data which bpbkar attempts to collect for each volume is information about the volume's cluster resource assignments. Debug analysis of the observed delay reveals that collection of this clustered information is what is taking bpbkar so long per volume.

This delay can also be observed when running the PowerShell command Get-ClusterResource.

Solution

There are two possible ways to resolve this issue:

  1. Increase the value for the CLIENT_READ_TIMEOUT setting to a value larger than it takes for the above operation to complete.
  2. Identify which cluster resources are causing the delay in the Get-ClusterResource command's response, and address it.

Delays in the Get-ClusterResource command have been tied to legacy cluster resources which no longer function properly. Removing the legacy resources increases the response time of the Get-ClusterResource command, and in-turn NetBackup's processing of volume level cluster information.

Was this content helpful?