Search <product_name> all support & community content...

SQL Server parent backup job fails with status code 636 even though the child backup jobs complete successfully

Article: 100011149

Last Published: 2014-01-31

Ratings: 0 0

Product(s): NetBackup & Alta Data Protection

Problem

SQL Server parent backup job fails with status code 636 even though the child backup jobs complete successfully.

The same SQL Server parent job completes successfully only if the destination storage unit resides on the master server itself.

There are no issues with file system backups irrespective of its destination storage unit.

Error Message

From Job Details
================

BPHDB (the parent job that runs the batch file on the client) completed with status code 0 but the job ended with status code 636 :

7/5/2013 2:33:40 PM - Info bphdb(pid=7696) done. status: 0: the requested operation was successfully completed
read from input socket failed(636)

From BPBRM log
==============

The BPBRM process received status code 0 from the client but then encountered an error when closing the socket to NBJM :

14:33:40.566 [7472.10224] <2> bpbrm Exit: client backup EXIT STATUS 0: the requested operation was successfully completed
14:34:17.179 [7472.10224] <8> vnet_close_socket_safely: [vnet.c:2017] error on read EOF 0 0x0
14:34:17.179 [7472.10224] <2> vnet_close_socket_safely: [vnet.c:2029] safe close 9 0x9
14:34:17.179 [7472.10224] <2> bpbrm Exit: Error occured during closure of socket to nbjm, vnet status 9

NBJM itself did not receive any indication of the exit status from BPBRM and then 5 minutes after the Backup has completed it failed the job with status code 636 :

7/5/2013 14:39:45.679 [Diagnostic] NB 51216 nbjm 117 PID:5984 File ID:117 [jobid=42856 parentid=42856] 1 V-117-239 [BackupJob::terminateThisJob] terminated job, jobid=42856, status=636

From NBJM log
=============

Cause

The TCP KeepAliveTime value on master server which was already reduced to 900,000 ms was still too high for this environment.

Solution

After reducing the TCP KeepAliveTime setting on Master server to 300,000 ms (5 mins), followed by a reboot of the master server, SQL Server parent backup jobs were now able to complete successfully when backing up to the affected media servers.

Applies To

-- Master server "NBMASTER1" running Windows 2008 R2 and NBU 7.5.0.4.
-- Master server also functions as a Media Server.
-- SQL backups to master server's storage unit are successful.

-- SQL Servers running Windows 2008 R2 and NBU 7.5.0.4, and are part of a 2 node Microsoft cluster "SQLNODE1" and "SQLNODE2".
-- SQL Servers also function as SAN Media servers.
-- SQL backups (parent jobs) to BOTH media servers' storage units are failing with status code 636.

-- TCP KeepAliveTime setting on both Master and Media servers already set to 900,000 ms (15 mins), previously reduced from the default of 7,200,000 ms (2 hours).

-- Windows Firewall disabled on master server "NBMASTER1" and on the affected media servers "SQLNODE1" and "SQLNODE2".

-- BPBRM on the affected media servers and NBJM on the master server communicate over secondary / backup NICs using HP FlexFabric 10Gb 2-port 554FLB Adapters.
-- These HP 10Gb NICs are connected via a 10Gb SAN Switch, thus there is no firewall present on this 10Gb link.

SQL Server parent backup job fails with status code 636 even though the child backup jobs complete successfully

Problem

Error Message

Cause

Solution

Was this content helpful?

Translated Content

SQL Server parent backup job fails with status code 636 even though the child backup jobs complete successfully

Problem

Error Message

Cause

Solution

Was this content helpful?

Article Languages

Translated Content

Translated Content