Solved: Netbackup RMAN Restore Problem

grosstech · ‎12-27-2011

Hi:

I was requested to do a restore of RMAN files to a specific date (approximately 2 weeks prior to the request date) with Netbackup 6.5.4 on Solaris 10u6 Master. Went into the BAR window on the Master and filled in all the necessary information and pleasantly saw that I could perform the restore immediately as all the files were on a disk storage unit. 3 weeks pass and then I hear from the DBA that the files which I restored were too old. Files from a later date would be required. He also provides me with the name of a file which RMAN is requesting. Searches for the RMAN file from the CLI turn up nothing. However the most disappointing thing for me was I was unable to recreate the restore conditions. I could no longer see the files in the path where they resided 3 weeks prior in the GUI. Files from disk are migrated to tape when the storage unit hits a particular threshold. One thing which I have observed on a few occasions is duplication job(s) have failed. Had not been able to track down the reason but I now believe that there has been some media which caused the failures and were subsequently frozen. Unfortunately logging is minimal. However, I don't believe that NBU is deleting backup images from the disk storage unit until they are safely migrated to tape. I did check the catalog and it does show the primary copy of the images. The images which are migrated to tape exhibit a copy 2 but are the primary and I believe that they should be used in a restore request. Some of the backup images still reside on disk and some reside on tape. To complicate the situation the RMAN backups are client initiated and I believe that they are errant. Now to really twist things around I collect statistics which are published daily to business stake holders using bperror. I see from my stats that there was a client initiated backup at the end of October. However, there is nothing in the NBU catalog to correlate. I would suspect that bperror probably queries the EMM database which could potentially get out of synch with the catalog but .... The catalog shows me backups initiated 6 months ago. How could this one slip through? The other question which I find quite serious is my inability to be able to reproduce the list of RMAN files which I originally restored. Is it possible that due to the disk to tape migration failure that some of the images that were on disk expired? The retention time for disk images I believe is 2 weeks. Disk to tape duplication jobs run daily. The tape retention times are long. I would hope that NBU utilizes an algorithm which selects the image which is due to expire the earliest in determining what is migrated to tape. In the event that disk thresholds are met and an image is left on disk and its' retention time expires what occurs? I'm thinking the worst which would explain what I have seen. Thanks for any thoughts, suggestions. My feelings currently are to extend the disk retention time period for staging policies another week.

-- Mark

Marianne · ‎12-27-2011

About bperror and catalog info:

bperror takes info from the 'error' database (/usr/openv/netbackup/db/error). This error database can be compared with an event log and keeps info by default for 28 days. Successful and failed jobs are captured.

Image catalog will ONLY store successful backups. Reports/image list will report on valid backups in image catalog that is dicatated by retention period.

Where disk staging is used, images on disk is first of all controlled by the retention period. If the retention period has expired, images on disk will be expired and deleted, regardless of successful duplication or not. The second disk retention criteria is high water mark. When high water mark is reached, images that have been duplicated successfully and have not yet reached retention period will be considered for expiration/deletion to make space for new backups.

So, you should NEVER ignore failed duplications - troubleshoot and re-run ASAP.

Another option that will never expire/delete disk backups unless duplication has successfully completed, is Storage Lifecycle Policies. SLP's cannot be used with Basic Disk, only Advanced Disk. Enterprise Disk license is needed, which is a capacity based license.

Hope this helps!

Handy NetBackup Links

View solution in original post

Marianne · ‎12-27-2011

About bperror and catalog info:

bperror takes info from the 'error' database (/usr/openv/netbackup/db/error). This error database can be compared with an event log and keeps info by default for 28 days. Successful and failed jobs are captured.

Image catalog will ONLY store successful backups. Reports/image list will report on valid backups in image catalog that is dicatated by retention period.

Where disk staging is used, images on disk is first of all controlled by the retention period. If the retention period has expired, images on disk will be expired and deleted, regardless of successful duplication or not. The second disk retention criteria is high water mark. When high water mark is reached, images that have been duplicated successfully and have not yet reached retention period will be considered for expiration/deletion to make space for new backups.

So, you should NEVER ignore failed duplications - troubleshoot and re-run ASAP.

Another option that will never expire/delete disk backups unless duplication has successfully completed, is Storage Lifecycle Policies. SLP's cannot be used with Basic Disk, only Advanced Disk. Enterprise Disk license is needed, which is a capacity based license.

Hope this helps!

Handy NetBackup Links

grosstech · ‎12-27-2011

Hi Marianne:

Thank you for the clear and prompt response. I didn't ignore the failed duplication jobs. IMMC the GUI didn't allow me to restart the job as other failed backups. I did read a technote which did state that they needed to be re-run manually. I made an assumption which appears to be incorrect that since the duplication jobs ran daily that the images which failed to migrate yesterday because of bad media would be picked up the following day. If the disk images were used to generate the Restore gui never made it to tape this would explain why I could no longer see the files no matter what start and end dates that I input. The only thing which does still puzzle me is that the stats from bperror ran at the end of October which cannot be repeated today showed that the client initiated backup was successful. Based upon your comments the image catalog should have stored this backup. Ahh ... if this only went to disk and the retention had expired it would be deleted upon expiration from the image catalog. Mystery solved. Very dangerous. I do not like losing data. Thank you again.

Mark

VOX

Netbackup RMAN Restore Problem