Recommended Oracle RMAN backup piece name format for efficient backup, restore, and crosscheck

Article: 100028185
Last Published: 2022-05-26
Ratings: 0 0
Product(s): NetBackup & Alta Data Protection

Problem

What RMAN FORMAT should be used for successful and efficient backup using NetBackup for Oracle?

The most common problem is when there are delays performing RMAN operations.  The delay can be from a few minutes to over an hour in extreme cases.  Several symptoms are visible to the NetBackup administrator and the DBA.

A) The Oracle Application backup job completes with a status 0 in the Activity Monitor and shows good throughput, but there is a long delay before the next Application job starts or before the Automatic backup job exits.   The bpbrm and bptm processes on the media server will have completed before the status 0, but the dbclient processes are still running; even if this was the last backupset piece in the overall backup of the Oracle database.

B) There may be a long delay between when RMAN initiates a restore and when the job appears in the NetBackup Activity Monitor.

C) RMAN crosscheck or delete expired operations progress very slowly.

D) In rare instances the delay may be long enough to cause an application timeout.

E) Concurrent RMAN crosscheck operations from many instances cause excessive load on the NetBackup primary server and affect it's responsiveness.

Less common problems are detailed in the related articles.

Error Message

For backups, the Job Details typically show that the job completed quickly with status 0, but the next job did not queue within a minute as expected.

For the first job:

07/15/2013 11:49:18 - end writing; write time: 0:00:10
the requested operation was successfully completed (0)

For the next job:

07/15/2013 12:44:56 - Info nbjm (pid=14599) starting backup job ...


The comm file (18700.0.1373989759) on the client confirms that the server status was updated and the job is complete.  This file is located in the /usr/openv/netbackup/logs/user_ops/dbext/logs.

11:49:18 INF - Server status = 0
11:49:18 INF - Backup by oracle on client myclient using policy mypolicy, sched mysched: the requested operation was successfully completed


The dbclient debug log shows that the job is complete from both the NetBackup dbclient (EXIT STATUS) and the NetBackup server perspective (server EXIT STATUS) and that sbtclose2 processing returned control to Oracle.

11:49:18.684 [18700] <2> sbtclose2: INF - entering
11:49:18.684 [18700] <2> int_CloseImage: INF - Backup - closing <SID_160729794_20130715.ctl>
...snip...
11:49:18.697 [18700] <4> closeApi: INF - EXIT STATUS 0: the requested operation was successfully completed
...snip...
11:49:18.900 [18700] <4> closeApi: INF - server EXIT STATUS = 0: the requested operation was successfully completed
...snip...
11:49:18.901 [18700] <2> sbtclose2: INF - leaving

But the subsequent sbtinfo2 request from Oracle to lookup the media ID to which the backup was written is either still active or in this case took nearly an hour to complete while waiting for a query to the bprd service on the primary server.  Oracle does not consider the backup successful until this lookup completes.

11:49:18.901 [18700] <2> sbtinfo2: INF - entering
11:49:18.901 [18700] <2> sbtinfo2: INF - requesting image info for <SID_160729794_20130715.ctl>
..snip...
11:49:18.902 [18700] <2> int_logDateRange: INF - Start Time = 12/26/95 00:00:00
11:49:18.902 [18700] <2> int_logDateRange: INF - End Time = 07/16/13 15:49:19
...snip...
11:49:18.905 [18700] <4> BuildBprdRequest: request_string=<7.1 myclient myclient *NULL* 4 819936000 1373989759 /SID_160729794_20130715.ctl>
...snip...

11:49:19.519 [18700] <2> logconnections: BPRD CONNECT FROM 10.x.x.3.39654 TO 10.x.x.1.1556 fd = 31
...long delay here...

12:44:42.584 [18700] <4> dbc_GetMediaListByName:        Media ID : </NBU/DSU/myclient_1373989763_C1_F1>
...snip...
12:44:42.597 [18700] <2> sbtinfo2: INF - leaving


The bprd debug log shows the inbound request is forwarded to bpdbm, but did not receive a response for nearly an hour.  Notice the 'starttime' that is much older than the 'endtime'.

11:49:19.531 [16623] <2> logconnections: BPRD ACCEPT FROM 10.x.x.3.39654 TO 10.x.x.1.1556 fd = 31
...snip...
11:49:19.532 [16623] <2> process_request: command C_MEDIA_LIST_BY_FILE_3_2 (67) received
11:49:19.532 [16623] <2> get_image_by_file: client = myclient
11:49:19.532 [16623] <2> get_image_by_file: pathname = /SID_160729794_20130715.ctl
11:49:19.532 [16623] <2> get_image_by_file: starttime = 819936000
11:49:19.532 [16623] <2> get_image_by_file: endtime = 1373989759
11:49:19.532 [16623] <2> get_image_by_file: client_type = 4
...snip...

11:49:19.534 [16623] <2> logconnections: BPDBM CONNECT FROM 10.x.x.1.55628 TO 10.x.x.1.1556 fd = 6
...long delay here...

12:44:42.298 [16623] <2> get_image_by_file: Sent to client @aaacX 1373903341 1376581741 myclient_1373903341 PDW
12:44:42.299 [16623] <2> process_request: EXIT STATUS 0


Older versions of NetBackup and Oracle, may fail the lookup completely, possibly with little delay.  In those instances the RMAN output may show messages similar to these.

ORA-27016, 00000, "skgfcls: sbtinfo returned error"
ORA-27192: skgfcls: sbtclose2 returned error - failed to close file

ORA-19513: failed to identify sequential file
ORA-27206: requested file not found in media management catalog

The dbclient debug log may show a search range that is outside of the time at which the backup occurred.  In this case, the backup took place on August 13, but the start and end time are for June 10-12.

17:10:54 [8663] <4> get_bfs_date_range: Start Time = 06/10/99 07:10:21
17:10:54 [8663] <4> get_bfs_date_range: End Time = 06/12/99 07:10:21
...snip...
17:10:54 [8663] <4> dbc_GetMediaListByName: Request String = <3.4 myclient myclient *NULL* 4 929013021 929185821 /i0h_CATAPRD24055367827021>
...snip...
17:10:55 [8663] <16> sbtinfo: No media found

 

Cause

Oracle does not consider a backup piece completely saved until the sbtbackup/sbtwrite/sbtclose/sbtinfo sequence is complete.

In this case the site was not using the recommended RMAN format for the name of the backupset piece.  Because there isn't a timestamp to key off, dbclient can't request a narrow search of the image directory for the piece, and is waiting for bpdbm to search all images as shown by the 'Start Time' months or years in the past.  If the client has many images and the primary server is under significant load the search can take a long time.  Having multiple of these wide ranging searches running concurrently can make the situation degrade even further. 


The same delays shown above may be observed during sbtinfo requests associated with RMAN crosscheck, delete expired, and restore operations.  For crosscheck and deleted expired operations, this delays are cumulative across all the pieces being checked or deleted.


Once a sbtinfo lookup has completed, the next sbtbackup/sbtrestore/sbtclose/sbtremove/sbtinfo should occur without delay.  Delays between SBT API calls indicate that NetBackup is waiting for Oracle.  Delays within SBT API calls indicate that Oracle is waiting for NetBackup.


Note: If the RMAN operation is using multiple channels, it will not issue a sbtend for any channel until the prior operations on all channels have completed.

Solution

Part 1 - Make sure future backups use a RMAN FORMAT that can be easily converted to a date search range 

Do not include any space characters in the format, and end the format with a '_%t'.

The presence of the '_%t' allows NetBackup to search only the images created within +/- 24 hours, instead of the entire catalog for the client.

If there is anything in the piece name format after the '_%t', the RMAN FORMAT syntax should be modified so that  '_%t' is at the end of the piece name.   E.g.

In the backup script:

  BACKUP FORMAT 'df_%s_%p_%t' ... DATABASE;
  BACKUP FORMAT 'al_%s_%p_%t' ... ARCHIVELOG ...;
  BACKUP FORMAT 'cf_%s_%p_%t' ... CONTROLFILE;

In the RMAN persistent configuration for the target database:

  CONFIGURE CHANNEL DEVICE TYPE sbt FORMAT 'bk_%s_%p_%t';

In the NetBackup template wizard:

  On the Backup Options panel, set the 'Backup file name format'

  bk_%s_%p_%t

Note-1: The RMAN AUTOBACKUP CONTROLFILE automatically uses a format that includes the timestamp for the piece creation at a well known location.  Those piece names will not encounter delays.

Note-2: The '_%t' format is still useful and should continue to be used after the primary server is upgraded to NetBackup 7.6 and making use of the new quick lookup table.

Note-3: See the related articles for less common problem symptoms when the recommended format is not used.

Part 2 - The benefit of adding the '_%t' will be partially immediate, but will also accrue over time.

New backups should have a relatively quick sbtinfo2 lookup becaues the +/- 24 hour date range will be used, subject of course to load on and the responsiveness of the primary server.  This will generally decrease the overall backup time for the instance.

But RMAN maintenance such as crosscheck and delete for this instance will be making queries for all of the known backup piece names, and the processing of the older piece names will still incur delays and create load on the primary server.

Similarly, backup, crosscheck, and delete operations for other instances within the domain will have similar affect on the primary server.

This condition will improve over time as older images expire and no longer need to be queried for existence.

While images without the '_%t' at the end of the format are present, it is often best to limit crosscheck and other maintenance operations to once per day or week (not once per backup) and to schedule them at a time independent of any backup operations.

Part 3 - Queries with the wide date range may still occur, just much less frequently.

The dbclient and bprd debug logs may still show an occasional query using a wider date range.  This is normal and expected due to several conditions.

1) When an backup image initially becomes expired from the NetBackup catalog, the RMAN catalog will be unaware.  The subsequent crosscheck or delete operation will result in a search using the precise date range from the '_%t' format, but it will fail (status 227) because NetBackup know longer has a backup image containing that piece name.  The dbclient will then do additional searching using the wider date range just in case the host clocks were significantly out of sync at the time of backup or query.  When this too fails, dbclient will advise RMAN that the piece is not found and it should get marked as expired and then deleted from the RMAN catalog.

2) The same might happen if a configuration error cause RMAN to query NetBackup using the wrong CLIENT_NAME or NB_ORA_CLIENT value.  Because NetBackup will search for the image under the wrong client name, the status 227 will occur and the wider search will take place.  Typically, the DBA will notice that many, perhaps all, of the backup pieces for the instance suddenly became expired.  But note that since the images are not expired in NetBackup, they can be reinserted into the RMAN catalog if desired.

Applies To

Any NetBackup version

Any platform

 

Was this content helpful?