Configuring and troubleshooting NetBackup database agents with snapshots

Article: 100037947
Last Published: 2015-09-14
Ratings: 0 0
Product(s): NetBackup & Alta Data Protection

Description

1.     Identification of the type snapshot required.

 

Snapshot type can be identified based on customer’s requirements and source environment. The following points are considered while deciding on the type of snapshot.

All snapshots are broadly classified into 2 broad categories:

  • Copy-on-Write   (C-O-W) (space optimized)
  • Mirror Based  

A Copy-On-Write based snapshot is suitable if:

  • Storage space is limited. (as only changed blocks are copied)
  • Frequency of backups is high.
  • Retention period for snapshot is low (for instant recovery).
  • There are very small amount of changes per snapshot cycle.
  • Local client snapshot backup is desired.  

A mirror-based snapshot is suitable if:

  • Storage space is not a concern.
  • Frequency of backups is low (as each backup will require a dedicated snapshot volume)
  • Retention period for snapshot is high (for instant recovery)
  • Large amount of source data is changed frequently per snapshot cycle.
  • Off-host alternate client snapshot backup is desired.  

Both the above snapshots can be either: hardware based or software based.

If the source volume belongs to specific hardware array (listed below) that supports snapshots, hardware snapshots can be used. Various options available for hardware snapshot are as follows:

  • EMC CLARiiON array provides EMC_CLARiiON _SnapView(_clone and _snapshot)
  • EMC symmetrix array provide EMC_TimeFinder(_mirror, _clone and _snap)
  • HP EVA arrays provide Vsnaps, snapshot and snapclone type of snapshots
  • IBM DS6000 and DS8000 arrays provide IBM_DiskStorage_FlashCopy
  • IBM DS4000 series provide IBM_StorageManager_FlashCopy  

Below are the legacy (before VxFI) snapshots for hardware/arrays:

  • BusinessCopy for mirror snapshots on HP XP series arrays
  • ShadowImage for Hitachi data systems disk arrays
  • Timefinder for EMC Symmetrix/DMX array series with SYMCLI  

If software based snapshots are needed irrespective of array type, the following options are available:

  • VxFS_checkpoint
  • VxFS_snapshot
  • VxVM
  • Flashsnap (Veritas VM based snapshot for off host backups)
  • NAS_Snapshot
  • VSS(Windows)
  • nbu_snap

Once the type of snapshot is identified, continue with respective configuration/validation steps as mentioned in the NetBackup 7.1 Administrator’s Guide for Snapshot Clients (linked below).

 

For Hardware based snapshots refer to  Configuration of snapshot methods for disk arrays in the  NetBackup 7.1 Administrator’s Guide for Snapshot Clients (chapter 10, page 167).

Software snapshots irrespective of underlying array type can provide snapshot capabilities with some specific software snapshot provider. Configuration steps for the software snapshot provider are explained in NetBackup 7.1 Administrator’s Guide for Snapshot Clients (chapter 8, page 145).


2.      Disk/volume setup and configuration at the OS level .  

  • Allocate disk LUNs from the desired storage array to the systems as required. (Refer respective storage array or hardware manuals for the same)
  • Format the allocated LUNs.
  • Initialize the LUNs into respective Volume Manager stack (LVM or VxVM)

         Example :

          For HP LVM – Use commands like vgcreate, pvcreate, etc.  (Refer to the operating system's manual pages for the respective commands)

                    For VxVM, - Use commds like vexed, vxassist, vxvol, mkfs,etc.  (Refer to the VxVM Administrator’s Guide)

  • Have at least 3 volumes/file-systems allocated:
  1. Source volume containing data files
  2. Snapshot volume (for   mirror / COW based snapshots)
  3. Separate volume for database executables, redo logs, other files .

Once the volumes are configured, install the database files. Note that only data files must be located on the source volume.


3.     Database specific configurations :  

All the database management systems mentioned below provide some sort of Application Programming Interfaces (APIs) which NetBackup uses to perform the backup and restore operations. These APIs are used in the respective database agent module like Oracle Agent, SAP Agent, etc.

SAP, Oracle and DB2 provide two ways to backup the files: stream-based backup and File-based backup.

In the case of a stream-based backup, the DBMS is responsible for moving the backup data as streams. However, when taking snapshots, NetBackup must control the movement of data. Hence, there is a special method called “proxy method” supported by these database systems. The proxy method allows NetBackup to control to movement of data.

Each database agent has its own log directory where it logs all the relevant operations that have been performed.

Below are the few key points to be considered while taking snapshot based backups of various databases.

  •   DB2 :
           - “proxy copy” method must be used for taking snapshots.

                 - Use the 'bpdb2proxy' command  to perform a snapshot based backup of DB2 databases. (Example : "bpdb2proxy -backup -d sample -s 3 -n 0")

                 -  Symbolic links (if any) must point to files on the same volume / file system.

                 -  Snapshot backups do not back up all database objects. Your backup configuration must include policies to perform file-based and stream-based backups. DB2 does not  support proxy backups of transaction logs.

                - Snapshot backups must be initiated from a backup script.  A template cannot be used to initiate a snapshot backup

  •    SAP :

           - The "util_file_online" option of brbackup must be used to perform non-rman snapshot based backups

- In case of RMAN based SAP backups, enable proxy based backup by setting the environment variable :

- rman_proxy = yes

- Most of the points listed in Oracle section also apply to SAP when used with RMAN.

            - For performing rollback restore, set the environment variable : S AP_RESTORE=rollback

            - Snapshots are not supported for MAXDB backend.

  •     Oracle :

            - The RMAN BACKUP command is used to initiate stream-based backups for datafiles, archive logs, and control files.  These backups do not use snapshots.

            - If the PROXY keyword is added to the RMAN BACKUP command, then the database, tablespaces, or data files can be backed up using a snapshot if the policy is configured appropriately.

- For control files and archived redo logs, Oracle RMAN performs conventional stream-based backups only. NetBackup for Oracle must use stream-based backups for control   files and archived redo logs even when you use Snapshot Client methods for the other database objects. However, Oracle 10g extends RMAN functionality to allow the PROXY keyword to be used on the RMAN BACKUP ARCHIVELOGS command.

- Snapshot backups must be initiated from a RMAN script.  A template cannot be used to initiate a snapshot backup

  •     MS-Sharepoint :  

          - Only VSS snapshots are supported.

           - Snapshots are only used for GRT backups.

  • Lotus    

             - Snapshots are not supported for standard lotus databases (plain NSF files) .

            - However, Lotus databases can be present within a DB2 database at the backend. The snapshot capabilities of the Netbackup DB2 agent can be used in such an environment.

  • Informix :

             - Snapshots are not supported for Informix database backups.

 

4.     Environment validation & Best practices :

 

  •   Confirm that data files are located on a separate dedicated volume and other files like oracle executables, redo logs, parameter file, control files are located separately.

-     Most database agents support snapshot of only data files.

-      Since snapshots are always volume level, data should not be co-located on the same file system along with the other files such as control files, redo logs, parameter files, database executable files, etc.   This is because, during a snapshot, the source file system is freezed, thus making the other files unavailable.

-     Hence, while installing a database, data files should always be located on a separate volume which is the source volume for the snapshot.

  • For other Veritas products such as the Veritas File System and Volume Manager or Storage Foundation, install the latest patches and updates for those products.
  • A snapshot may not be removed if there is system failure such as such as a system crash or abnormal backup termination. In that case, remove the snapshot manually. (See the Netbackup Snapshot Client Administrator’s Guide for more information on Removing a snapshot.)
  • During snapshot rollback, if the data file you want to restore has not changed since it was backed up, the rollback may fail. Initiate the restore from a script and use the FORCE option. (See the Netbackup Snapshot Client Administrator’s Guide )
  • For off-host alternate client method, your snapshot mirror must be visible/exposed on alternate client and can be imported on the alternate client successfully.
  • In case of hardware based snapshots, the respective CLI/API libraries from the array vender must be compatible and installed properly.

 

 

5.     Troubleshooting and typical issues:

There are various failures encountered during backups of database with snapshot configurations.   The most important logs to look at are database agent logs and the bpfis/bppfi log.   The following table shows the primary logs to be examined after the failure for various database agents :


BACKUP

RESTORE

SAP

bphdb, backint, bpfis, bpbkar, bpbrm, user_ops, progress log

backint, bpfis, bppfi, tar, bpbrm, user_ops, progress log

Oracle

bphdb, dbclient, bpfis, bpbkar, bpbrm, user_ops, progress log

dbclient, bpfis , bppfi, tar, bpbrm, user_ops, progress log

MS-Exchange

bpbkar, bpfis, bpbrm, BEDS, bpresolver (Exchange 2010)

tar(for streamed GRT), ncfgre(for non-stream GRT), bpfis, bppfi, bpbrm, BEDS

MS-SQL Server

dbclient, bpbkar, bpfis, bpbrm, user_ops, progress log

dbclient, tar, bpfis, bppfi, bpbrm, user_ops, progress log

DB2

bphdb, dbclient, bpdbsdb2, bpbkar, bpfis, bpbrm, bpdb2, user_ops

dbclient, tar, bpfis, bppfi, bpbrm, bpdb2, bpubsdb2, user_ops

MS-Sharepoint

nbfsd, bpbkar, bpfis, bpbrm, BEDS, bpresolver, event viewer logs.

ncf(6.5.x), ncfgre(7.x), nbfsd, bpbrm, BEDS

 

Problems with backups/restores usually occur in the following three components:

1.       Database Agent

2.       Snapshot mechanism

3.       Other NetBackup area  

 

A typical snapshot based backup starts at least two jobs in the activity monitor (in case of a single streamed backup). The first job creates and mounts the snapshot using the bpfis process and the successive job(s) execute the bpbkar process to backup the data from the snapshot to the storage unit such as disk or tape. If the policy is configured to create only a snapshot i.e. no backup to storage unit,  there will be a just a single job that creates the snapshot. 
SLP managed snapshots have a seperate jobs for mounting the snapshot and importing the snapshot.   
 
If the first job indicates errors, there is a problem with the snapshot mechanism. There can be several reasons for this. The   bpfis log along with the database agent log must be examined to determine the cause of the problem. (Refer the table above to find out what logs to examine)

 

  • Examine the database agent log to check if the error occurs before the database is queisced. In that case, the problem is usually with the database agent configuration or policy configuration. Recheck these configurations.
  • If the database agent log indicates that the  database is successfully quiesced, check the bpbrm log on whether the bpfis process was started on the client. If the bpfis process is successfully started, check the bpfis logs on the client. An error in the bpfis logs can have several different causes. Typically, bpfis fails with status code 156 which can be due to different reasons.

     

If the first job in the activity monitor is successful, it indicates that the snapshot was taken successfully. It also indicates that the database is also unquiesced successfully and can be back online. The database agent log will help in confirming this.

A new job is then kicked off to backup data from the snapshot to the storage unit. The bpfis process mounts the snapshot and the bpbkar process takes the backup. If this new job fails:
  • Check the bpbrm log to determine if the bpfis and bpbkar processes were launched on the client.
  • Check the bpfis log to determine whether the snapshot was mounted correctly.
  • Check the bpbkar log to determine if the backup was done correctly.
  • Check the bptm log to determine if the data was written correctly on the tape. 
A failure of this job is not limited to issues with the database agent and snapshot configuration, but also due to problems with several different components of NetBackup as a whole. For example, storage unit issues, network issues, etc.  Refer to the  NetBackup Troubleshooting Guide (linked below) for more details.
 
Restore :
 
If a rollback restore of a snapshot backup is being opted, then 'bppficorr' from the master server will invoke 'bppfi' on the client which validates snapshot fragment/IR image.
If successful, then bppfi will invoke “bpfis restore” command to perform volume level rollback.

If a copy back restore from a snapshot is being opted, then 'bppficorr' will invoke 'bppfi' to validate snapshot fragment and on success 'bppfi' will mount the snapshot volume, construct a filelist to be restored(copy backed from snapshot) and invoke 'bpbkar' by passing it the file list to be restored.

If a tape image restore is opted, restore will happen via the 'tar' process as in the standard process flow.
 
In case of a snapshot failure with “instant recovery” or “offhost alternate client”, check to see if the snapshot mirror is attached/synced back with the source volume before the backup runs.

 

Was this content helpful?