Please enter search query.
Search <product_name> all support & community content...
How Veritas NetBackup determines if a tape should be frozen or the status of a tape drive should be changed to down, and how to change this behaviour
Article: 100023335
Last Published: 2021-10-19
Ratings: 0 1
Product(s): NetBackup & Alta Data Protection
Problem
How Veritas NetBackup determines if a tape should be frozen or the status of a tape drive should be changed to down, and how to change this behaviour
Solution
When a read, write, or position error occurs on tape, it is difficult to know whether the error is caused by media or by the drive itself. This is because the only error produced comes from the operating system, and only reports, "I/O ERROR". In an attempt to prevent bad media or drives from causing all backups in a given timeframe to fail, NetBackup developed a method to attempt to determine, based on past history, if a media or drive is bad.
Each time an I/O error occurs on a read, write, or position, bptm logs the error into an errors file. Each entry consists of the time of the error, the media ID, the drive index, and the type of error.
Each time an I/O error occurs on a read, write, or position, bptm logs the error into an errors file. Each entry consists of the time of the error, the media ID, the drive index, and the type of error.
The errors file is located on each media server in
/usr/openv/netbackup/db/media/errors (Unix)
<install_path>\Veritas\NetBackup\db\media\errors (Windows)
Sample entries in this file are:
Each time an entry is made, past entries in the file are scanned to determine if the same media id or drive has had the same type of error in the past "n" hours, where "n" is the
For example:
- If the same media id gets write errors three times within the time window, on more than 1 drive, it is assumed that the media is bad and NetBackup freezes the media.
- If different media id's get the same error three times within the time window on the same drive, it is assumed the drive is bad and NetBackup places that drive into a "DOWN" state.
- If the same drive gets errors three times within the time window with the same media id, then NetBackup assumes the media is bad and freezes it.
The
If any one of a combination of the above files exist, the bptm shows a message indicating which value is used each time it goes through the algorithm. The log message shows:
where the %d comes from the number obtained from the file.
In general, the freeze and down behavior is designed to aid in getting backups completed successfully. If read errors occur during a restore attempt, freezing of the media has little effect, as it is still necessary to have that same tape to perform the restore (or another copy if it exists). In the case of a restore, downing a bad drive may help, assuming the problem is with the drive.
To view the error threshold and window settings, run the following nbemmcmd command:
Sample entries in this file are:
05/21/06 04:15:17 A00167 4 WRITE_ERROR
05/26/06 12:37:47 A00168 4 READ_ERROR
Each time an entry is made, past entries in the file are scanned to determine if the same media id or drive has had the same type of error in the past "n" hours, where "n" is the
TIME_WINDOW
. The default time window is 12 hours. The command to freeze a media or down a drive does not normally occur the first time the error is encountered. There are two other parameters,
MEDIA_ERROR_THRESHOLD
and
DRIVE_ERROR_THRESHOLD
, the default value for each being 3.
For example:
- If the same media id gets write errors three times within the time window, on more than 1 drive, it is assumed that the media is bad and NetBackup freezes the media.
- If different media id's get the same error three times within the time window on the same drive, it is assumed the drive is bad and NetBackup places that drive into a "DOWN" state.
- If the same drive gets errors three times within the time window with the same media id, then NetBackup assumes the media is bad and freezes it.
The
TIME_WINDOW
,
MEDIA_ERROR_THRESHOLD
and
DRIVE_ERROR_THRESHOLD
values are all configurable. If the
MEDIA_ERROR_THRESHOLD
or
DRIVE_ERROR_THRESHOLD
value is set to 0, freeze or down occurs on the first error.
MEDIA_ERROR_THRESHOLD
is looked at first, so if both are set to 0, the freeze of the media overrides the downing of the drive. This configuration is not recommended.
If any one of a combination of the above files exist, the bptm shows a message indicating which value is used each time it goes through the algorithm. The log message shows:
"using time window of %d hours"
"using media error threshold of %d"
"using drive error threshold of %d"
where the %d comes from the number obtained from the file.
In general, the freeze and down behavior is designed to aid in getting backups completed successfully. If read errors occur during a restore attempt, freezing of the media has little effect, as it is still necessary to have that same tape to perform the restore (or another copy if it exists). In the case of a restore, downing a bad drive may help, assuming the problem is with the drive.
To view the error threshold and window settings, run the following nbemmcmd command:
Windows
<Install_Path>\Veritas\NetBackup\bin\admincmd>nbemmcmd -listsettings -machinename <machine name>
<Install_Path>\Veritas\NetBackup\bin\admincmd>nbemmcmd -listsettings -machinename <machine name>
Unix
/usr/openv/netbackup/bin/admincmd/nbemmcmd -listsettings -machinename <machine name>
Several parameters will display, including the following:
DRIVE_ERROR_THRESHOLD="2"
MEDIA_ERROR_THRESHOLD="2"
TIME_WINDOW="12"
To change the error threshold and window settings, run the following nbemmcmd command:
Several parameters will display, including the following:
DRIVE_ERROR_THRESHOLD="2"
MEDIA_ERROR_THRESHOLD="2"
TIME_WINDOW="12"
To change the error threshold and window settings, run the following nbemmcmd command:
Windows
<Install_Path>\Veritas\NetBackup\bin\admincmd>nbemmcmd -changesetting -machinename <machine name>
<Install_Path>\Veritas\NetBackup\bin\admincmd>nbemmcmd -changesetting -machinename <machine name>
Unix
/usr/openv/netbackup/bin/admincmd/nbemmcmd -changesetting -machinename <machine name>
The parameters are specified as follows:
DRIVE_ERROR_THRESHOLD <unsigned integer>
MEDIA_ERROR_THRESHOLD <unsigned integer>
TIME_WINDOW <unsigned integer>
DRIVE_ERROR_THRESHOLD <unsigned integer>
MEDIA_ERROR_THRESHOLD <unsigned integer>
TIME_WINDOW <unsigned integer>