A potential for skipped duplications has been discovered in NetBackup 6.5.4 when using Vault, or in NetBackup versions 6.5.1 - 6.5.3 if a specific vltrun binary has been applied.

Problem

A potential for skipped duplications has been discovered in NetBackup 6.5.4 when using Vault, or in NetBackup versions 6.5.1 - 6.5.3 if a specific vltrun binary has been applied. Vault may end with status code 0, 306 or 308 if duplication rules are configured. If a Status 0 occurs due to this issue, duplications are skipped without indication.

Error Message

vltrun@AnalyzeAndClearDupJobs^1436: Duplication Step completed with StatusCode=0
vltrun@AnalyzeAndClearDupJobs^1434: Duplication Step completed with StatusCode=306
vltrun@AnalyzeAndClearDupJobs^891: Duplication Step completed with StatusCode=308

Solution

Introduction:
A potential for skipped duplications has been discovered in NetBackup Server / Enterprise Server 6.5.4 when using Vault, or in versions 6.5.1 - 6.5.3 if a specific vltrun binary has been applied.  This issue only occurs when duplication rules are configured.  If duplication rules are used for any of the NetBackup versions described in this document, it is suggested to apply a fix or workaround immediately.

Where Vault is configured to duplicate images, images found to be eligible for duplication are grouped into batches according to the Media Servers for which duplication rules are configured. For each Media Server, images are grouped into batches in a manner that will optimize resource consumption and manipulation for better efficiency.

Under certain conditions, some duplication batches may be skipped and the Vault parent job may end with status code 0, 306 or 308. In the event of the Vault parent job ending incorrectly with Status Code 0 due to this issue, duplications will appear to have been successful but will have been skipped.


What is affected:
The following versions of NetBackup are affected on all supported Master Server platforms:
 
- NetBackup Server / Enterprise Server 6.5.4
 
- This issue also affects environments running NetBackup Server / Enterprise Server 6.5.1 - 6.5.3 if an affected vltrun binary has been applied.  In this case, please contact Symantec Technical Support to receive an updated binary. If no vltrun engineering binaries have been applied to these versions, then the environment is not affected by this issue.
 
If Emergency Engineering Binaries (EEBs) for vltrun were used for any of the following 3 Etracks and were obtained/applied prior to August 12, 2009, they should be replaced with an updated EEB by calling support immediately.
Etrack 1461578 (6.5.1), Etrack 1513041 (6.5.2a)  Etrack 1474130 (6.5.3).


How to determine if affected:
Duplications can be skipped if ALL of the following conditions are met:
• The Master server is running NetBackup Server / Enterprise Server 6.5.4 and NetBackup Vault 6.5.4 - OR in the case of NetBackup Vault 6.5.1 - 6.5.3 if an affected vltrun binary is installed.
• Vault is configured and used in the environment to duplicate images.
• Duplication rules have been configured in the Vault Profile (in the Duplication tab using the Advanced Configuration feature).
• Upon capturing and batching the images to duplicate, either one or both of the following is/are satisfied:
(i) Vault has at least one duplication rule with 0 batches to do
(ii) Vault has at least one duplication rule in which there are less number of batches to do than there are WRITE drives configured for that duplication rule, AND there are NO EXTRA batches to do. Extra batches occur when Vault finds images to duplicate for which there are no duplication rules defined for the Media Servers to which the images belong.


How to determine if conditions (i) and (ii) are met:
Use a verbose detail.log to determine if either (i) or (ii) or both are met.
 
- Verbosity can be enabled through the NetBackup Administration Console, as follows:
NetBackup management > Host Properties > Master Server > Master Server properties > Logging > Vault logging level > set to 5
 
- The detail.log can be found under the session directory:
/usr/openv/netbackup/vault/sessions/<vault_name>/sidXXX  (UNIX)
<install_path>\VERITAS\NetBackup\vault\sessions\<vault_name>\sidXXX (Windows)
 
where XXX is the vault session number.
 

Case (i)
The detail.log will show a line similar to the following:
<4> vltrun@LogDuplicationBatches^1420: DupRule for MS=ms1 #Batches=0

Case (ii)
The detail.log will show lines similar to the following:
  • A duplication rule which has less number of batches to do than there are write drives configured for that rule:
<4> vltrun@save_vault_conf()^1440: DUPLICATION_ITEM BackupServer=ms5 AltReadHost= ReadDrives=4 WriteDrives=4
..
<4> vltrun@LogDuplicationBatches^1440: DupRule for MS=ms5 #Batches=1
<4> vltrun@LogDuplicationBatches^1440: TapeBatchInfo:
<4> vltrun@LogDuplicationBatches^1440: MS=ms5 #IMGs=2 RL=3 MDA=SU2326 #MDA=2 SZ=66564448
 
  • AND there are NO EXTRA batches:
<4> vltrun@LogDuplicationBatches^1440: Logging 0 EXTRA Batch(es)

For simplicity, we will refer to the duplication rule satisfying (i) or (ii) as the "offending" duplication rule.
We shall also provide scenario examples using case (i) only, but they equally apply to case (ii).


Scenarios:

Status Code 308
Status Code 308 will occur if the FIRST duplication rule is the offending rule. In the example below, the first rule satisfies (i) above:

04:30:55.215 [2101372] <4> vltrun@LogDuplicationBatches^893: DupRule for MS=ms1 #Batches=0
04:30:55.215 [2101372] <4> vltrun@LogDuplicationBatches^893: DupRule for MS=ms2 #Batches=27
04:30:55.217 [2101372] <4> vltrun@LogDuplicationBatches^893: DupRule for MS=ms3 #Batches=23
....
Since ms1 is first in the list of duplication rules and has 0 batches to do, duplication will abort immediately with status 308, ignoring the other duplication rules (for ms2 and ms3) that do have batches to do:

04:30:55.253 [2101372] <2> vltrun@RunDuplicationBatches^893: No more batches for this DupRuleItem MS=>ms1<
04:30:55.253 [2101372] <4> vltrun@RunDuplicationBatches^893: Fired initial set=0 of Dup Jobs
04:30:55.253 [2101372] <2> vltrun@AnalyzeAndClearDupJobs^893: Dup Return value: 0
04:30:55.274 [2101372] <4> vltrun@AnalyzeAndClearDupJobs^893: Total Dup Stats: 0 of 0 images duplicated successfully
04:30:55.274 [2101372] <4> vltrun@AnalyzeAndClearDupJobs^893:Duplication Step completed with StatusCode=308


Status Code 306
Status Code 306 will occur if:
- there is at least one offending duplication rule (other than the FIRST rule) which satisfies (i) or (ii) above;
- there is/are duplication rule(s) earlier in the list (before the offending duplication rule) for which duplications are performed;
- at least one image in these duplications fails with an error.

For example, the offending rule satisfies (i) above:

10:20:46.180 [1588] <4> vltrun@LogDuplicationBatches^1434: DupRule for MS=ms1 #Batches=4
10:20:46.182 [1588] <4> vltrun@LogDuplicationBatches^1434: DupRule for MS=ms2 #Batches=4
10:20:46.183 [1588] <4> vltrun@LogDuplicationBatches^1434: DupRule for MS=ms3 #Batches=0
10:20:46.183 [1588] <4> vltrun@LogDuplicationBatches^1434: DupRule for MS=ms4 #Batches=2

Here (4 + 4 =) 8 duplication jobs would be started.
When Vault detects that the duplication rule for ms3 contains 0 batches to do, it will NOT consider the duplication batches for ms4. It will completely ignore them. It will, however, continue with duplications for ms1 and ms2 until all 8 batches have been done.

If any of the images in these 8 duplication jobs fail for whatever reason (regardless of error code), the Vault parent job will end with status 306:

13:47:11.262 [1588] <4> vltrun@AnalyzeAndClearDupJobs^1434: Total Dup Stats: 52 of 56 images duplicated successfully
13:47:11.273 [1588] <4> vltrun@AnalyzeAndClearDupJobs^1434: Duplication Step completed with StatusCode=306
..
13:48:20.013 [1588] <16> vltrun@VltSession::lock_and_operate^1434 OP_STEP=duplicate_bymid FAILED
13:48:20.032 [1588] <16> vltrun@VltSession::lock_and_operate^1434 FAILed NB_EC=306 NB_MSG=vault duplication partially succeeded


Status Code 0
Status Code 0 will occur if:
- there is at least one offending duplication rule (other than the FIRST rule) which satisfies (i) or (ii) above;
- there is/are duplication rule(s) earlier in the list (before the offending duplication rule) for which duplications are performed;
- ALL images in these duplications are successfully duplicated.

For example, the offending rule satisfies (i) above:

11:20:46.180 [1499] <4> vltrun@LogDuplicationBatches^1436: DupRule for MS=ms1 #Batches=3
11:20:46.182 [1499] <4> vltrun@LogDuplicationBatches^1436: DupRule for MS=ms2 #Batches=2
11:20:46.183 [1499] <4> vltrun@LogDuplicationBatches^1436: DupRule for MS=ms3 #Batches=0
11:20:46.183 [1499] <4> vltrun@LogDuplicationBatches^1436: DupRule for MS=ms4 #Batches=5
....
14:47:11.262 [1499] <4> vltrun@AnalyzeAndClearDupJobs^1436: Total Dup Stats: 56 of 56 images duplicated successfully
14:47:11.273 [1499] <4> vltrun@AnalyzeAndClearDupJobs^1436: Duplication Step completed with StatusCode=0

Here (3+ 2 =) 5 duplication jobs would be started.  When Vault detects that the duplication rule for ms3 contains 0 batches to do, it will NOT consider the duplication batches for ms4 and ignore them. It will, however, continue with duplications for ms1 and ms2 until all 5 batches have been done.  If all images from these 5 batches are duplicated successfully, a Status 0 (successful) will be reported even though NO duplications for the ms4 are run.


Workaround:
A fix for this issue (for NetBackup 6.5.4 only) has been posted and can be obtained by accessing this TechFile:
 http://support.veritas.com/docs/333127

Updated fixes for 6.5.1 - 6.5.3 servers that have applied an affected vltrun binary can be obtained by calling Technical Support.

In addition, there are two possible workarounds, as follows:

1. Remove all duplication rules from the Vault configuration. Configure duplication to go only to one Media Server (where the destination storage unit resides) regardless of the source Media Servers on which the images were backed up. This will enable all images captured by Vault to be duplicated by this one Media Server.  An alternate read host may be used to prevent duplicating over the network, providing the Media Servers have access to the same robot.
Note that configuring duplications to be performed by a single Media Server may not allow all duplications to finish within the desired time frame.

2. Configure a separate Vault profile for each Media Server of interest (i.e. for which a duplication rule would have been configured) to capture images of that Media Server and to duplicate them either to the Media Server's own destination storage unit, or to a storage unit on another Media Server. In the case of the latter, an alternate read host should be used to prevent duplicating over the network, providing the Media Servers have access to the same robot.


Formal Resolution:
The formal resolution to this issue (Etrack 1750001) is currently scheduled to be included in the following release:
- NetBackup 6.5 Release Update 5 (6.5.5), scheduled for release in Q4 of calendar year 2009.

When NetBackup 6.5.5 is released, please visit the following link for download and readme information:
 http://www.symantec.com/business/support/overview.jsp?pid=15143

Please note that a formal resolution will prevent future duplications from being skipped due to this issue, but cannot recover data from duplications that were not run due to this issue.


Best Practices:
Symantec strongly recommends the following best practices:
1. Always perform a full backup prior to and after any changes to your environment
2. Always make sure that your environment is running the latest version and patch level
3. Perform periodic "test" restores
4. Subscribe to technical articles


How to Subscribe to Email Notification:
Article Subscription:
Subscribe to this TechNote for any updates that are made to this article, by clicking on the following link:
 http://maillist.support.veritas.com/notification.asp?doc=327941


Software Alerts:
If you have not received this from the Symantec Technical Support Email Notification Service, please click on the following link to subscribe to future Notifications:
 http://maillist.entsupport.symantec.com/subscribe.asp
 

 

Terms of use for this information are found in Legal Notices.

Search

Survey

Did this article answer your question or resolve your issue?

No
Yes

Did this article save you the trouble of contacting technical support?

No
Yes

How can we make this article more helpful?

Email Address (Optional)