cancel
Showing results for 
Search instead for 
Did you mean: 

DB backups - image status on tape on failure

mpatt
Level 4
Certified

Our backup environment is mostly comprised of RMAN backups. The full backup script first performs a backup up of the DB files (.dbf) followed by the control file and then the archives and finally deletion of the backed up archives.

Recently, I observed that in the last stage  - deletion of the archives, our backup failed. The log indicated that after backing up the archives, while deleting them, one of the archive (to be deleted) was missing. It was deleted by a DBA who instead of copying the archive for a clone to another server, moved it.  The RMAN log after entering the following, returned an Error code (ofcourse status code 6).

 

 

RMAN-03009: failure of backup command on ch00 channel at xxxxxx 03:22:25
ORA-19625: error identifying file /db/arc01/.....xxxx.arc
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
 
I was just wondering, how good is this backup in case a restore was required? Are the images still on tape? 

Does NetBackup expire all the images of the backup peices (as RMAN refers to them), immediately? Technically,  the DB backup is good (restorable) since everything was backed up and it only failed in the deletion stage? 

 

9 REPLIES 9

pri3006
Level 4
Certified

I do not think it is even reaching the deletion start, it seems to be failing when backing up the archive logs due to that missing archive log. Since you would have had seperate channel allocations for the database backups, I think, that should be fine and restorable. But for the archivelogs I doubt it. Secondly, deleting the archive logs in the manner your DBA has done is not a good idea, because RMAN is expecting to find that archivelog. Even if you have a good backup of the archive logs, your database is recoverable only till the archivelog just before the one that is shown in the error message.

 

 

muhanad_daher
Level 6
Partner Accredited Certified

try this:

change archivelog all crosscheck;
crosschecks and delete obsolete and expired must be parts of backup script

i suggest go to oracle forum, may can find some helpful about this issue.

Marianne
Level 6
Partner    VIP    Accredited Certified

In addition to above excellent posts, your DBA should know better than to manually 'fiddle' with archive logs.

To check if 1st part of backup was kept by NBU, use 'bplist' to list successful backups:

# bplist -C <client-name> -s <start-date> -e <end-date> -t 4 -R /

mpatt
Level 4
Certified

Marianne,

Unfortunately It was a little worse. The DBA had actually got it deleted via a cron job. I have asked them to investigate the issue.

I will validate the backup using bplist command you provided, but I am trying to understand how NetBackup behaves in this case, when the Parent Backup job (initiated by Netbackup) terminates with a status 6 when it gets an error from the RMAN script due to that archive not existing during deletion.

Pri2006,

Why would having a separate "Allocate channel "- prevent data from being set to expiry since it marks data as failure. For the record, our script "after" backing up the database & control file

==================================

SQL 'alter system switch logfile';

 

crosscheck archivelog all;
delete noprompt expired archivelog all;
ALLOCATE CHANNEL ch1 
    TYPE 'SBT_TAPE';
ALLOCATE CHANNEL ch2 
    TYPE 'SBT_TAPE';
SEND 'NB_ORA_POLICY=xxxxxx,NB_ORA_SERV=xxxxxx,NB_ORA_CLIENT=xxxxx';
BACKUP
    archivelog all not backed up 1 times;
RELEASE CHANNEL ch1
RELEASE CHANNEL ch2;
delete noprompt archivelog all backed up 1 times to device type sbt;
resync catalog;

=====================================================

In most of my scripts, the above is not the case. I just allocate channels at the beginning of RMAN Backup (when it backs up the database) and use the same channels for control file and archive log backup. RMAN automatically releases the channels once RUN{....} block ends. The reason being, if an channels fails (say Ch02) then, RMAN retries the same peice on another channel. In this case, the script errors out at the "Release Channel Ch02 command). So i stopped using release channel.  Tell me that is wrong!

mpatt
Level 4
Certified

I think the bplist command may have confirmed that netbackup expires the image.

 bplist -C client -s 05/10/2012 19:00:00 -e 05/11/2012 07:00:00

EXIT STATUS 227: no entity was found

Marianne
Level 6
Partner    VIP    Accredited Certified

Even worse! Question is  :

WHY the cron job to AUTOMATICALLY delete archive logs?

Like I've said before - your Oracle dba should know better....

The moment RMAN is used to do online, hot backups, the Archive Logs MUST be managed/deleted by RMAN.

If there is a problem with space, the cron job should kick off a separate script to backup the logs and then delete.

 

mpatt
Level 4
Certified

they confirmed that a few recent RMAN failures led to archive disk storage getting full causing hte DB to hang. this prompted them to setup the cron job (without knowing the implications ofcourse).

Wish NetBackup did not expire those images. 

How sooner does NetBackup expire the image after the backup fails. I know that it marks all the images related to the failed backup as expired in the catalog. I know that it sounds crazy, but i am curious  to know if i can freeze that particular tape if was really monitoring that backup?

Marianne
Level 6
Partner    VIP    Accredited Certified

i honestly have no idea how long after the backup it is expired because of the failed archive log backups. bpdbm log will probably have the answer (if enabled).

Freezing a tape will not prevent image cleanup/expiration.
If tapes are not yet overwritten, you could possibly try to import them.
Question is - can database be successfully restored if it cannot be 'recovered'?
Test restore to alternate system will tell...

Hopefully dba's got the scare of their lives and have learnt from this!!

mpatt
Level 4
Certified

All,

Not sure if the above solutions are conclusive. 

This morning one of our Prod RMAN backups failed while backing up the archives (after backing up the DB & control file), due to loss of connectivity to it catalog db. 

6 hours later, our DBA noticed that the DB was backed up & it had only failed while backing up the archives. He manually ran an arc backup (via the script), which is a different policy. I checked the catalog (from the GUI) and see all the images of backup including the manually run archives. Netbackup has not expired any of the images. A few image cleanup jobs also ran in the interim.