Solved: Oracle redirected restore: bug from Netbackup vers...

rme2023 · ‎09-20-2023

Hi folks.

I'm not a Netbackup admin/engineer but a member of the database team trying to understand the issue.

Platform and setup:

Netbackup ver. 10
Oracle 19c, RAC with Dataguard.
Netbackup-managed backup.
Multiple databases in RAC

Background:

We are attempting a redirected restore via RMAN. The source client is the production RAC database, referred to in the RMAN command using the appropriate Netbackup host catalog name (NB_ORA_CLIENT=<DBNAME>_<DBID>). We are aware of the redirected restore procedure and are following it correctly, as far as we know.

We would get errors in the restore where RMAN would abort with this message:

ORA-19507: failed to retrieve sequential file, handle="bk_dBLAHPROD_u9g217066_s6448_p1_t1142128838", parms=""
ORA-27029: skgfrtrv: sbtrestore returned error
ORA-19511: non RMAN, but media manager or vendor specific failure, error text:
   Failed to open backup file for restore.

We've enabled debugging information in Netbackup and we saw these messages in the dbclient logs:

00:47:07.541 [765266] <4> sendRequest: sending buf = 1689256954 1689256954 /bk_dBLAHPROD_u9g217066_s6448_p1_t1142128838
00:47:07.541 [765266] <4> sendRequest: Date range: <-s 07/14/23 2:02:34>, <-e 07/14/23 02:02:34>
00:47:07.541 [765266] <4> serverResponse: entering serverResponse.
00:47:07.541 [765266] <4> serverResponse: initial client_read_timeout = <900>
00:47:07.541 [765266] <4> readCommMessages: Entering readCommMessages
00:47:09.541 [765266] <4> serverResponse: read comm file:<00:47:08 INF - Server status = 227>
00:47:09.541 [765266] <16> serverResponse: ERR - server exited with status 227: no entity was found
00:47:09.541 [765266] <16> RestoreFileObjects: ERR - serverResponse() failed
00:47:09.541 [765266] <4> closeApi: entering closeApi.
00:47:09.541 [765266] <4> closeApi: INF - EXIT STATUS 5: the restore failed to recover the requested files
00:47:09.541 [765266] <8> VxBSAGetObject: WRN - RestoreFileObject was not able to find the object. Status: 26
00:47:09.541 [765266] <2> xbsa_ProcessError: INF - entering
00:47:09.541 [765266] <2> xbsa_ProcessError: INF - leaving
00:47:09.541 [765266] <16> xbsa_GetObject: ERR - VxBSAGetObject: Failed with error: There is no copy of the requested object.
00:47:09.541 [765266] <2> xbsa_GetObject: INF - leaving (26)
00:47:09.541 [765266] <16> int_StartJob: ERR - Failed to open backup file for restore.
00:47:09.541 [765266] <2> int_StartJob: INF - leaving
00:47:09.541 [765266] <2> sbtrestore: INF - leaving
00:47:09.541 [765266] <2> sbterror: INF - entering
00:47:09.541 [765266] <2> sbterror: INF - Error=7501: Failed to open backup file for restore. .
00:47:09.541 [765266] <2> sbterror: INF – leaving

In researching this, we came across this Veritas KB article (100049320) which describes the same situation that we are experiencing along with similar debug messages, but for a lower version (excerpt below):

When attempting an Oracle RAC restore with NetBackup 8.3, the restore fails with an an error 227, stating that the image needed for the restore cannot be found. A bplist of the oracle images will show that the image is actually present. This can occur on RAC clusters that have more than one database.

In verifying the conditions relating to this bug, we indeed see the required backup piece via bplist:

-rw-rw---- oracle asmadmin 700448768 Jul 14 02:00 /bk_dBLAHPROD_u9g217066_s6448_p1_t1142128838

But, as mentioned in the KB document:

NetBackup will look for all possible images needed for the restore across all possible RAC catalog names associated with the client. The failure occurs when the last catalog name does not have any backup images in the requested time range and an error is returned.

If you notice in the bplist output above, the timestamp for the backup piece is 02:00, but the time range it's being searched from the Netbackup catalog when doing the restore attempt is (as shown in the debug log):

00:47:07.541 [765266] <4> sendRequest: sending buf = 1689256954 1689256954 /bk_dBLAHPROD_u9g217066_s6448_p1_t1142128838
00:47:07.541 [765266] <4> sendRequest: Date range: <-s 07/14/23 2:02:34>, <-e 07/14/23 02:02:34>

... which obviously would preclude the backup piece it is interested in.

Questions:

is it possible that we are hitting this bug despite being in a different version than what is mentioned in the KB article? I'm aware that the KB document mentions: This issue has been seen in all present versions - just wanted to confirm it as the KB article doesn't clearly indicate that the bug has been resolved in versions post 8.3.
the KB article refers to resorting to an EEB to workaround this issue - do we have any other options?

Thank you all in advance.

Nicolai · ‎09-21-2023

hi @rme2023

If you look at this page , it state the issues is fixed in NBU 9.1 : https://www.veritas.com/support/en_US/doc/81225970-146133007-0/v149907937-146133007

To obtain a EEB you need to open a support ticket with Veritas support. If the bug hits a lot of users the EEB will be for public download. Refer the Etrack number in the support case from the knowledge base.

/Nicolai

View solution in original post

Nicolai · ‎09-21-2023

hi @rme2023

If you look at this page , it state the issues is fixed in NBU 9.1 : https://www.veritas.com/support/en_US/doc/81225970-146133007-0/v149907937-146133007

To obtain a EEB you need to open a support ticket with Veritas support. If the bug hits a lot of users the EEB will be for public download. Refer the Etrack number in the support case from the knowledge base.

/Nicolai

rme2023 · ‎09-21-2023

Hi @Nicolai

Thank you very much for your response.

That link is very helpful, thanks.

That is what I'm puzzled about - if that bug has been fixed in a version prior to what we are using, then why are we experiencing its symptoms? I don't know if any of those debug messages and the information we extracted via bplist can be any clearer, but I believe it's this declaration that it has been fixed that's causing the engineers we are in contact with to shrug off this evidence (i.e. maybe something may have reintroduced this bug in ver. 10? maybe there's some new factor in play?).

Anyway, I appreciate the time you took to answer my post.

Kind regards.

Nicolai · ‎09-22-2023

If your support ticket go nowhere, i suggest you call the on-duty escalation manager and explain the issues.

@mph999 Can you help here ?

/nicolai

rme2023 · ‎09-28-2023

Thanks for your replies, @Nicolai

We've received an EEB for this issue and it appears to have solved it.

I believe the Etrack is 4131540. I'm looking forward to seeing the details once it becomes publicly available.

VOX

Oracle redirected restore: bug from Netbackup version 8.3 still affecting version 10?