Forum Discussion

stwali005's avatar
stwali005
Level 4
10 years ago
Solved

RMAN restore random error

Hi,

having a random issue where our Oracle DB team are experiencing restore errors from our Release environment to their Prod environment for 2 databases.

 

Sometimes the one fails - sometimes its the other. The RMAN restores are controlled by them.

 

I have got logs for the dbclient that suggests a timeout - i have already increased the timeout on master and media and clients from 1800 to 7200.

I also thought maybe it is the control file but that gets restored as part of the restore process. I cant find anything different in comparing the other Dev and QA environments - everything seems to be consistent. If anyone has any suggestions where to look.

 

Here is a snippet from the dbclient log file:

07:56:39.657 [8867] <16> readCommFile: ERR - timed out after 7200 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/8867.0.1435200986

07:56:39.657 [8867] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/8867.0.1435200986>

07:56:39.657 [8867] <16> RestoreFileObjects: ERR - serverResponse() failed

07:56:39.657 [8867] <4> closeApi: entering closeApi.

07:56:39.657 [8867] <4> closeApi: INF - EXIT STATUS 5: the restore failed to recover the requested files

07:56:39.657 [8867] <16> VxBSAGetObject: ERR - System error occurred trying to retrieve object in RestoreFileObject. Status: 3

07:56:39.658 [8867] <2> xbsa_ProcessError: INF - entering

07:56:39.658 [8867] <2> xbsa_ProcessError: INF - leaving

07:56:39.658 [8867] <16> xbsa_GetObject: ERR - VxBSAGetObject: Failed with error:

Server Status: Communication with the server has not been initiated or the server status has not been retrieved from the serve

07:56:39.658 [8867] <2> xbsa_GetObject: INF - leaving (3)

07:56:39.658 [8867] <16> int_StartJob: ERR - Failed to open backup file for restore.

07:56:39.658 [8867] <2> int_StartJob: INF - leaving

07:56:39.658 [8867] <2> sbtrestore: INF - leaving

07:56:39.658 [8867] <2> sbterror: INF - entering

07:56:39.658 [8867] <2> sbterror: INF - Error=7501: Failed to open backup file for restore.

.

07:56:39.658 [8867] <2> sbterror: INF - leaving

 

Running 7.5.0.5 master version

client on oracle db server is 7.5.0.3

Redhat 2.6.18

 

Many thanks

 

 

  • You can replace all names with generic names, for example replace master server hostname (that includes domain name) with 'master' and media server with 'media1', client name with 'client1'.

    It is unfortunately not possible to find troubleshoot without seeing logs.

    What I normally do is to look for <16> and then look at the lines above that to get a feel for what lead up to the failure.

    bprd is not that easy - here you need to look for connection request from client IP address and then how request was interpreted and handled by the master.

6 Replies

  • is the restore job even getting to the activity monitor of the master server, if so with what error it is failing (anything in the detailed status of the failed job) ?

    how many media servers are there in your environment, if there are more than one....are both success and failures happening on the same media server ?

  • Hi,

     

    all the restores in the activity monitor are successful. when i run a report for failures there is none. These errors are being pointed out to me from the Oracle dba's, which is why i am finding it hard to troubleshoot.

    There are 2 media servers.

    Here is another log i have just received:

    The log for CFTREL restore had the usual 'failover to previous backup' error.

    channel aux3: ORA-19870: error while restoring backup piece DI0_CFTPROD_0kqabc4h_1_1_883273873_446484

    ORA-19507: failed to retrieve sequential file, handle="DI0_CFTPROD_0kqabc4h_1_1_883273873_446484", parms=""

    ORA-27029: skgfrtrv: sbtrestore returned error

    ORA-19511: Error received from media manager layer, error text:

    Failed to open backup file for restore.

    channel aux1: ORA-19870: error while restoring backup piece DI0_CFTPROD_0hqabc3h_1_1_883273841_446481

    ORA-19507: failed to retrieve sequential file, handle="DI0_CFTPROD_0hqabc3h_1_1_883273841_446481", parms=""

    ORA-27029: skgfrtrv: sbtrestore returned error

    ORA-19511: Error received from media manager layer, error text:

    Failed to open backup file for restore.

    failover to previous backup

    怀

    怀

    Then at the end of the log had this:

    Oracle instance shut down

    RMAN-00571: ===========================================================

    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

    RMAN-00571: ===========================================================

    RMAN-03002: failure of Duplicate Db command at 06/25/2015 11:40:04

    RMAN-05501: aborting duplication of target database

    RMAN-03015: error occurred in stored script Memory Script

    RMAN-20506: no backup of archived log found

    RMAN-06053: unable to perform media recovery because of missing log

    RMAN-06025: no backup of archived log for thread 1 with sequence 54225 and starting SCN of 8366539190711 found to restore

  • It seems restore of individual files were not started because they could not be found in NBU catalogs. We need to see initial request in dbclient log and received request in bprd log on the master server. Please copy logs to dbclient.txt and bprd.txt and upload as File attachments.
  • Hi Marianne,

    Unfortunately i cant upload those files due to the domain and server names being in it. Is there anything specific you are looking for that i can try find? If not i might just have to log a call directly to symantec.

    apologies and thanks for the assistance as always

  • You can replace all names with generic names, for example replace master server hostname (that includes domain name) with 'master' and media server with 'media1', client name with 'client1'.

    It is unfortunately not possible to find troubleshoot without seeing logs.

    What I normally do is to look for <16> and then look at the lines above that to get a feel for what lead up to the failure.

    bprd is not that easy - here you need to look for connection request from client IP address and then how request was interpreted and handled by the master.