failed waiting for child process(34) and media manager - system error occurred (174) experienced when attempting a Replication SLP operation using NetBackup Replication Director feature

Article: 100030879
Last Published: 2015-10-15
Ratings: 0 0
Product(s): NetBackup

Problem

failed waiting for child process(34) and media manager - system error occurred (174) experienced when attempting a Replication SLP operation using NetBackup Replication Director feature.

The above error is seen when using the SnapVault or SnapMirror replication methods.

Error Message

04/03/2012 23:44:37 - Error (pid=5036) ReplicationJob::WaitForReplicationCommandStatus: Replication failed for backup id netapp1_1330904461: media manager - system error occurred (174)
04/03/2012 23:44:37Replicate failed for backup id netapp1_1330904461 with status 174
04/03/2012 23:44:37 - end operation
failed waiting for child process(34)

Cause

Common causes for Status 174 and 34 are:

  • Incorrect source/destination storage controller relationships/access permissions
  • Name resolution failures, network issues
  • DFM member(s) removed from a dataset(a member is the volume/path being protected in a dataset)
  • Broken corrupted member relationships within DFM
  • Hosts (storage controllers) being added to the "ignore list" within DFM
  • Non-conformant datasets (suffering from the above symptoms within DFM)

Solution

Consider the following:

The first operation in the SLP is a snapshot to the primary storage unit. The second operation is a replication to a target storage unit. This operation fails with the error:

04/03/2012 23:44:37 - Error (pid=5036) ReplicationJob::WaitForReplicationCommandStatus: Replication failed for backup id netapp1_1330904461: media manager - system error occurred (174)
04/03/2012 23:44:37Replicate failed for backup id netapp1_1330904461 with status 174
04/03/2012 23:44:37 - end operation
failed waiting for child process(34)

Storage controller relationships/access permissions/name resolution/network issues:

Check source and destination filer console for messages similar to the following (in this case the following message was displayed on the destination filer's console):

Sun Mar  4 21:02:34 GMT [netapp2:replication.dst.err:error]: SnapVault: destination transfer from netapp1:/vol/vol2/- to /vol/vol2_1/NetBackup_1330893853_netapp1_vol2 : cannot connect to source filer.

The snapmirror log:

dst Sun Mar  4 21:02:29 GMT netapp1:/vol/vol2/- netapp2:/vol/vol2_1/NetBackup_1330893853_netapp1_vol2 Request (Initialize)
dst Sun Mar  4 21:02:34 GMT netapp1:/vol/vol2/- netapp2:/vol/vol2_1/NetBackup_1330893853_netapp1_vol2 Abort (cannot connect to source filer)
dst Sun Mar  4 21:03:03 GMT netapp1:/vol/vol2/- netapp2:/vol/vol2_1/NetBackup_1330893853_netapp1_vol2 Request (Retry)
dst Sun Mar  4 21:03:04 GMT netapp1.:/vol/vol2/- netapp2:/vol/vol2_1/NetBackup_1330893853_netapp1_vol2 Abort (cannot connect to source filer)
cmd Sun Mar  4 21:03:09 GMT - netapp2:/vol/vol2_1/NetBackup_1330893853_netapp1_vol2 Stop_command

Other network related errors can show:

dst Tue Jun 25 13:01:32 BST netapp2.domain.com:xxx_xxx netapp1:NetBackup_1364xxx_mirror_netapp2_xxx_xxx Request (Update)
dst Tue Jun 25 13:01:40 BST netapp2.domain.com:xxx_xxx netapp1:NetBackup_1364xxx_mirror_netapp2_xxx_xxx Abort (transfer aborted because of network error)

Ensure connectivity and correct name resolution between source and destination filers via /etc/hosts or dns.

Check:

  • Short/long host name mixture as well as upper/lower case host names
  • Lack of connectivity on designated ip addresses via a basic ping test, for example:
    • netapp1> ping 10.x.x.16
      no answer from 10.x.x.16
      netapp1> ping netapp2
      no answer from netapp2
  • Check DNS domain name is correct - "options dns"
  • If using "legacy" for snapmirror.access, ensure the names in snapmirror.allow are resolved correctly (forward and reverse lookups). It's preferred to use "snapmirror.access host=" and "snapvault.access host="
  • The snapmirror.checkip.enable on/off setting determines how the ip address and hostnames in snapmirror.allow are verified (refer to NetApp documentation for more information)

DFM configuration:

When a NetBackup Replication Director policy is run for the first time (this example uses an NDMP policy), a "Create relationship" job is created in DFM which creates and defines the dataset, and the relationship between the primary member and its destination. Subsequent runs of the policy (if the policy's backup selection list is not added to) will generate "On-demand protection" jobs in DFM rather than the initial "Create relationship" job:


Ensure protected members are not missing from the dataset:

[root@rhdfm45 /]# dfpm dataset list
Id         Name                        Protection Policy           Provisioning Policy Application Policy          Storage Service
---------- --------------------------- --------------------------- ------------------- --------------------------- ---------------
200 NetBackup_create_import_1372358579 NetBackup Local backups only                                                          
202 NetBackup_filer1_vol1_q_958nbmast NetBackup Mirror, then back up                                                         
218 NetBackup_filer1_vol1_q_968nbmast NetBackup Mirror                                                                       
132 NetBackupUnprotected
 
[root@rhdfm45 /]# dfpm dataset list -m 202
Id         Node Name            Dataset Id Dataset Name         Member Type                                        Name                                     
---------- -------------------- ---------- -------------------- -------------------------------------------------- -------------------------------------------------------
152 Primary data                202 NetBackup_filer1_vol1_q_958nbmast qtree                                              filer1:/vol1/primary        
204 Mirror                      202 NetBackup_filer1_vol1_q_958nbmast volume                                             filer2:/vol1_1              

210 Backup                      202 NetBackup_filer1_vol1_q_958nbmast volume                                             filer2:/vol1_2

If a member, such as ID 152 above (Primary data) was to be removed from the dataset or the environment, when NetBackup runs the Replication Director policy and consquently the DFM "On-demand protection" job, the operation may fail with Status 174 or 34.

Non-conformant datasets:

Inconsistencies in the dataset, may result in a "Nonconformant" error being displayed in the "Status" box of the "Datasets" window in the NetApp Management Console, as indicated by the yellow arrow below:

To view possible reasons why the dataset is in a nonconformant state or to look for inconsistencies that are not already apparent, you can run a "dfpm dataset conform -D (dataset ID)" (ensure you use the -D option as this is a non-destructive "dry-run" without making any changes). It is wise to engage NetApp for further assistance with a nonconformant dataset. 

Hosts set to ignore:

Ensure hosts involved in Replication Director operations are not set to ignore:

If it is the case where a host is ignored and should not be, right click on the host and select "Undo Ignore".

 

 


Applies To

DFM 5.0 - OnCommand

DataONTAP 8.1

NetBackup 7.5

Was this content helpful?