RMAN clone/restore problem

Question

Hi,
&nbsp;&nbsp;&nbsp;&nbsp; I am seeing intermittent issues when restoring an Oracle database using RMAN.
The backup is done to a 5220 at our live site, with the database going to dedupe, and the archive logs going to advanced disk.
The database is optimised duplicated to a remote 5520 (same NBU domain).
The archive logs are duplicated to the same remote 5220, then duplicated to tape.
We identify the images required, and make the relevant offsite copy the primary. (copy 2 (dedupe) for database, copy 3 (tape) for arcs)
When we start the clone there appears to be a delay for netbackup to send the data and RMAN thinks the pieces are unavailable. See error below.
The pieces for restore are available, and if we restart it, it usually works. If we rehydrate to tape and use the tape copies, it works.
The strange thing is, if I manually duplicate (rehydrate) the offending images to tape, there appears to be a delay there as well. The tape mounts and sits there for up to 15 mins or so, and then kicks into life. See second section below. (9 minute delay this time)
So its not RMAN, is it a rehydration problem. These databases are small, about 25Gb in size.
CLIENT_READ_TIMEOUT settings etc are set to 3600.
Output from RMAN fail.
Starting restore at 02.10.13 22:38:25
	allocated channel: ORA_AUX_DISK_1
	channel ORA_AUX_DISK_1: SID=353 device type=DISK
	allocated channel: ORA_AUX_DISK_2
	channel ORA_AUX_DISK_2: SID=3 device type=DISK
	allocated channel: ORA_AUX_DISK_3
	channel ORA_AUX_DISK_3: SID=51 device type=DISK
	allocated channel: ORA_AUX_SBT_TAPE_1
	channel ORA_AUX_SBT_TAPE_1: SID=103 device type=SBT_TAPE
	channel ORA_AUX_SBT_TAPE_1: Veritas NetBackup for Oracle - Release 7.5 (2012091610)
	allocated channel: ORA_AUX_SBT_TAPE_2
	channel ORA_AUX_SBT_TAPE_2: SID=153 device type=SBT_TAPE
	channel ORA_AUX_SBT_TAPE_2: Veritas NetBackup for Oracle - Release 7.5 (2012091610)
	
	channel ORA_AUX_SBT_TAPE_1: starting datafile backup set restore
	channel ORA_AUX_SBT_TAPE_1: restoring control file
	channel ORA_AUX_SBT_TAPE_1: reading from backup piece DBNAME_HOT_FULL_DAILY_m8ol6ckq_827536026
	channel ORA_AUX_SBT_TAPE_1: ORA-19870: error while restoring backup piece DBNAME_HOT_FULL_DAILY_m8ol6ckq_827536026
	ORA-19507: failed to retrieve sequential file, handle="DBNAME_HOT_FULL_DAILY_m8ol6ckq_827536026", parms=""
	ORA-27029: skgfrtrv: sbtrestore returned error
	ORA-19511: Error received from media manager layer, error text:
	&nbsp;&nbsp; Failed to open backup file for restore.
&nbsp;
Output from manual duplication to tape
&nbsp;
03/10/2013 10:36:47 - begin Duplicate
	03/10/2013 10:36:53 - requesting resource nbu5520-LTO4-1
	03/10/2013 10:36:53 - requesting resource @aaaab
	03/10/2013 10:36:53 - reserving resource @aaaab
	03/10/2013 10:36:55 - reserved resource @aaaab
	03/10/2013 10:36:55 - granted resource LT4811
	03/10/2013 10:36:55 - granted resource tapelib-DRIVE06
	03/10/2013 10:36:55 - granted resource nbu5520-LTO4-1
	03/10/2013 10:36:55 - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=dp_disk_nbu5520;Path=PureDiskVolume;StorageServer=nbu5520;MediaServer=nbu5520
	03/10/2013 10:36:56 - Info bptm(pid=24450) start&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
	03/10/2013 10:36:56 - started process bptm (24450)
	03/10/2013 10:36:58 - Info bptm(pid=24450) start backup&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
	03/10/2013 10:36:58 - Info bpdm(pid=24455) started&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
	03/10/2013 10:36:58 - started process bpdm (24455)
	03/10/2013 10:36:58 - Info bpdm(pid=24455) reading backup image&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
	03/10/2013 10:36:59 - Info bpdm(pid=24455) using 64 data buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
	03/10/2013 10:36:59 - Info bpdm(pid=24455) requesting nbjm for media&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;
	03/10/2013 10:36:59 - Info bptm(pid=24450) Waiting for mount of media id LT4811 (copy 3) on server nbu5520.
	03/10/2013 10:36:59 - started process bptm (24450)
	03/10/2013 10:36:59 - mounting LT4811
	03/10/2013 10:36:59 - Info bptm(pid=24450) INF - Waiting for mount of media id LT4811 on server nbu5520 for writing.
	03/10/2013 10:38:02 - Info bptm(pid=24450) media id LT4811 mounted on drive index 6, drivepath /dev/nst3, drivename tapelib-DRIVE06, copy 3
XXXXX 9 minute gap here before we write to tape XXXXX

03/10/2013 10:47:15 - begin reading
	03/10/2013 10:47:26 - Info bptm(pid=24450) waited for full buffer 149 times, delayed 34112 times&nbsp;&nbsp; &nbsp;
	03/10/2013 10:47:30 - end reading; read time: 00:00:15
	03/10/2013 10:47:30 - Info bpdm(pid=24455) completed reading backup image
&nbsp;
We are using NBU 7.5.4 and OS is hp-ux. Oracle is 11.2.03
&nbsp;
Many thanks in anticipation. - Darren
	
	&nbsp;

doctorski · Accepted Answer

We opened a call with Symantec and sent them full verbose logs from the client and the media server.
It turns out our problem is related to ET3016768.&nbsp;
We ran a script on the appliance that shows the nbMSDPpoCounts totals.
Our offending images were all in one policy, but with different schedules.
The counts for this policy were many thousands higher than any other policy. See below
The issue is how the appliance "paths" the images, and thousands of backups under the same "path" means the appliance has to traverse the paths to find the images required.
Count&nbsp;&nbsp; Client/Policy
&nbsp;&nbsp;&nbsp; ------- ------------------------------------------
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 65907 clienta-b/CLIENTA-B-B_ORA_BKUP
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5269 clientb-b/CLIENTB-B_BKUP
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4060 clientc-b/CLIENTC-B_ORA_BKUP
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4019 clientd-b/CLIENTD-B_BKUP
&nbsp;&nbsp;&nbsp; . . . SNIP . . .
Symantec recommend using more policies to spread the load on the "paths".
We have rehydrated lots of long-term retention backups off the appliance to tape, and reduced the above counts significantly. We have also changed our SLP's to do the same thing moving forward.
The above actions have fixed our optimised duplication / rehydration delays, and timeouts on RMAN restores.
There is currently no fix for this, but may be fixed in 7.6.1.
Thanks for your help.
&nbsp;
&nbsp;
&nbsp;

doctorski · Answer

Also within RMAN we set NB_ORA_DISK_MEDIA_SERVER=nbu5520 to use the remote 5520.

doctorski · Answer

This is to an alternate client. Anybody got any thoughts ?

marianne · Answer

I am not sure that the delay in duplication is related to the restore problem.
You will need logs to troubleshoot each of the 2 issues.
Restore:
	bprd on the master
	bptm and bpbrm on media server
	bpcd and dbclient on destination client
duplication:
	admin on master server
	bptm bpdm and bpbrm on remote media server.

doctorski · Answer

Thanks Marianne,
I will look into the duplication issue first as we are working around the restore issue by rehydrating to tape via SLP and using the tape copy. This works first time every time.
I will update when I have something and then move on to the restore from dedupe issue.
- Darren

Forum Discussion

RMAN clone/restore problem

5 Replies

Related Content

Encrypt Oracle RMAN Backup with NetBackup

Removing Rman Catalog from oracle backups

Restore hard drive from cloned disk

RMAN backup schedule problem

Re: bphdb exit status = 6: the backup failed to back up the requested files

Recent Discussions

"failed, status 6" error after increasing datastore (LUN) capacity

command: bperror

MS-SharePoint policy restore error (2804) .

How to restore a backup

How to configure RBAC