Oracle RMAN restore hangs

DHoffman_2 · ‎07-24-2007

Environment

Master/media server running Windows Server 2003 SP2, dual quad core CPU, 8 GB RAM

Tape drive: IBM TS3310 library with 2 LTO3 tape drives connected to Netbackup server via 2-4 Gbps fiber channel connections

Client is an Oracle database 10g R2 running on IBM p650 LPAR with AIX 5.3 TL5 SP5

NetBackup software: Server and clients both running NetBackup version 6.0 MP4 as installed from CDs. Oracle client is also version 6.0 MP4. No updates have been installed

We are attempting to perform a redirected restore from an Oracle instance running on one AIX LPAR to a different one (both running save version of AIX and Oracle). The backup of the source database was initiated manually on July 20 from the NetBackup master server and completed successfully. To perform the restore, we are following the steps outlined in the Veritas NetBackup 6.0 for Oracle System Administrator Guide for UNIX from the section titled "Redirecting a Restore to a Different Client". We attempted the restore on July 23 and 24 with differing results. We were able to successfully restore the control files on both days. On the 23rd, we successfully restored 120 data files in less then 30 minutes, but the restore seemed to hang when it came time to restore the redo logs and the temporary tablespace files. The attempt on the 24th restored 20 data files within 15 minutes, but then nothing further was restored. The AIX fuser command showed some of the data files being currently opened by the Oracle instance.

The master server shows 2 restore jobs running. Viewing the details on both restore jobs show that they are attempting to read from the same tape volume. One restore jobs shows normal progress messages for a period and then the updates stop. The second restore job shows that it is waiting for the tape. Eventually, we have either gotten a network connection timeout error or a System Error = 5. I had increased the client read timeout value to 3600 seconds prior to the restore attempts. The restores were attempted during periods when the master server was not running any other jobs.

I have examined the NetBackup logs on the client, and do not see any errors until the timeout occurs.

DHoffman_2 · ‎07-25-2007

Additional information:

- RMAN commands are being entered in an interactive RMAN session

- Oracle hot backups were performed using a NetBackup RMAN template created on the client to be run as the Oracle instance ID

We retried the restore making sure that only 1 restore session was started on the NetBackup master server. One file from the backup was completed, and the server log showed that the next one was being started. 120 files were restored and then the restore hung again. The logs in /usr/openv/netbackup/logs/dbclient/log.072507 stopped getting any updates after a period of time. The "detailed status" tab for the job showed no increase in the current kilobytes written although the elapsed time field continued increasing after the hang started.

Omar_Villa · ‎07-25-2007

if the DB is too big increase the client connect time out to 7200 and check the bptm log and find out if you have buffers issues, probably the drive is waiting to much for info because the buffers are not properly configured, take a look at the following link, maybe can help

http://seer.entsupport.symantec.com/docs/244652.htm

VOX

Oracle RMAN restore hangs