alert: A rare potential for data loss has been discovered if an incremental backup is restarted using Checkpoint Restart (CPR) feature or if there is a delay starting bpbkar and files are created/changed within an exact window of…

  • Article ID:100019287
  • Last Published:
  • Product(s):NetBackup


alert: A rare potential for data loss has been discovered if an incremental backup is restarted using the Checkpoint Restart (CPR) feature, or if there is a delay in bpbkar starting for an incremental backup and files are created or changed within an exact window of time.


An issue has been discovered in NetBackup where a window of time may be created that allows new or changed files to be skipped by an incremental backup.  This can be due to a suspend/resume of an incremental backup using the Checkpoint Restart (CPR) feature, or from a delay in the bpbkar process starting on the client (due to network or system/system resource issues).  Bpbkar is the client process used to generate backup images.

When an incremental backup is run, a delta time is calculated to determine the new or changed files that the incremental will need to back up.  The issue described in this document is due to this delta time being based on when the client's bpbkar process starts, instead of when the job goes active on the Master Server.  A padding of 10 seconds exists to compensate for a possible delay, but a difference of more than 10 seconds between the job initially going active on the master server and the client starting the backup (by starting the bpbkar process) will create a window where new/changed files could be skipped by an incremental.  Without suspending and resuming a checkpoint restart backup, it is unlikely that this issue would be encountered.

What is Affected:
The following NetBackup versions are affected, on all supported Master Server platforms*:
-  NetBackup Server/Enterprise Server 6.0 GA through 6.0 Maintenance Pack 7 (MP7)
-  NetBackup Server/Enterprise Server 6.5 through 6.5.2A

* Versions prior to 6.0 (on supported platforms) are also affected by this issue if a delay condition is experienced with bpbkar (see below).  Please note that these versions have reached the End of Engineering Support date and no future patches are planned for those versions.  It is recommended that affected users of these older versions apply one of the workarounds below, then upgrade to NetBackup 6.x as soon as the formal resolution is available.

How to Determine if Affected:  
Data loss may occur under either of the following scenarios:

Scenario 1:  Checkpoint Restart backup is suspended or failed, then resumes.
Data loss has been known to occur if ALL of the following conditions are met:
- The Master Server is running one of the NetBackup versions listed above on any supported Operating System.
- Incremental backups are in use.
- Checkpoint Restart is in use for a given backup policy.
- An incremental backup from this policy is suspended or failed, then resumed later.  The elapsed time between the initial active time of this backup and the time it is resumed (minus 10 seconds) is the duration of the window where new/changed files could be skipped.
- Between the time that the last (base) backup started and the time this window ends, files are created or changed on the client.
- One or more of these changed/created files were not backed up before the backup was suspended.
- Files affected by this window are not modified again before the incremental backup is able to back them up.
- These changed files are required for restore before a subsequent full or incremental backup is run.  
Example of issue occurring due to the use of Checkpoint Restart:
- The NetBackup Master Server is running version 6.5.2.  A NetBackup client is running 6.x
- One or more policies are using Checkpoint Restart
- A full backup starts at 2:00am on the client for one of these policies and completes successfully.
- Later that day at 6:00pm, an Incremental backup is run for this same client and policy using Checkpoint Restart
- This backup is suspended at 7:15:27pm.  It is resumed at 7:45:49pm, 1hr 45minutes and 49 seconds after the initial start time.
- Within 1hr 45min and 39 seconds of the last backup (the "base" backup) starting (Between 2:00am and 3:45:39am), files were changed or modified on the client.
- One or more of these changes/created files were not backed up before the backup was suspended.
- The same file(s) are not modified again before the resumed incremental backup is able to back them up.
- The next cumulative incremental or full backup will protect these files that were new / changed during the window, but any unique file revisions between the 2:00am and 3:45:39am window will not be available.
Additionally, some cumulative backups may still back up files that would otherwise be skipped due to this issue, this is due to the nature of how cumulative incrementals operate.

Scenario 2: Bpbkar is delayed on the client (Affects UNIX Clients only by default):
To a much lesser extent, it is possible that this issue could occur without the use of Checkpoint Restart, if a UNIX client's bpbkar process is delayed more than 10 seconds in starting after the job goes active on the Master Server.  This delay would require an outstanding system or network issue and would likely create a much smaller window (i.e. seconds) for affected files.  To further illustrate the unlikelihood, the issues at hand would have to be timed so that the backup job does not fail due to these system/network problems and only causes bpbkar to delay initially.  Windows clients are not affected by this by default, as they use the Archive Bit for incremental backups and also have an additional 60 minute overlap for delay.

Similarly, in the case that the client's system time is manually moved forward between the base and incremental backups, a time window would be created for that time difference, minus 10 seconds.

Formal Resolution:
The formal resolution to this issue (Etrack 1282984) is included in the following patch release:
  • NetBackup 6.5 Release Update 3 (6.5.3)
This release is available at the support web site at:

A formal resolution to this issue is tentatively scheduled for inclusion in the following NetBackup release:
  • NetBackup Server/Enterprise Server 6.0 Maintenance Pack 8 (MP8), tentatively scheduled for release in September of 2009.
The Checkpoint Restart issue (Scenario 1) will no longer be present in these versions.  
Additionally, In these releases, a bpbkar delay of up to 5 minutes will be allowed by default.  Therefore a delay of 5 minutes in bpbkar starting, without causing the backup to fail would be necessary for Scenario 2 to be seen.
Furthermore, this delay may be increased further by placing a touch file on the Master Server:  
UNIX: /usr/openv/netbackup/DeltaOffset
Windows: <install_path>\NetBackup\DeltaOffset
This allows configuring a value from 0-7200 seconds (0 to 2 hours) of additional time to give in starting bpbkar.

Please note that the formal resolution will prevent this issue with future backups, but cannot recover data from previous backups affected by this issue.

Until the formal resolution is available for this issue, Veritas strongly recommends implementing the Workaround described in the next section of this article.

To prevent this issue from occurring due to a checkpoint restart resume, the feature may be disabled:
1. Identify the policies that use Checkpoint Restart:
-  Within the Windows or Java Administration Console, Expand NetBackup Management and click Policies.
-  Observe the policy summary screen in the far right pane (in the Java Console, it may be necessary to click Summary of all Policies).
-  Sort by "Checkpoint Interval (minutes)".
-  Identify each policy that has a number greater than 0 defined for "Checkpoint Interval (minutes)".
2. Disable Checkpoint Restart for all policies:
-  Double click on a policy that has a Checkpoint Interval defined that is greater than 0.
-  In the Policy Attributes screen, uncheck the " Take checkpoints every" check box.
-  Click OK.  Repeat this process for any other policies that still display a Checkpoint Interval that is greater than 0.

Veritas Strongly Recommends the Following Best Practices:
1. Always perform a Full backup prior to and after any changes to your environment, including manual changes to the system time.
2. Always make sure that your environment is running the latest version and patch level.



Was this content helpful?

Get Support