cancel
Showing results for 
Search instead for 
Did you mean: 

Oracle database restore fail

Arshad_Khateeb
Level 5
Certified

We are trying to restore oracle database from one server to another. It all starts well but this is where it stuck.

Error bpbrm (pid=11257) cannot get server's peername on client rome_pres-bu
Error bpbrm (pid=11257) listen for client protocol error - couldn't write necessary information on /usr/openv/netbackup/logs/user_ops/dbext/logs/10774.0.1517947887

1 ACCEPTED SOLUTION

Accepted Solutions

And here is the fix that worked.....

After plenty of troubleshooting it was found that the DBA were not using same dbid as of source client in the restore script on destination client. Also the SID was incorrect. The restore is running smooth at the moment.

The DBA is moving to Oracle 12c and proposing to do backups their own instead of using NetBackup so that they can have control on their own on backup/restore. They are little nervous with the restore issue but it's completely due to their poor script.

What are the benefits of using NetBackup for Database backups (to tape/disk) against DBA doing their own? This will help me talk about NetBackup in our upcoming meeting with them.

View solution in original post

19 REPLIES 19

Amol_Nair
Level 6
Employee
What is the nbu version on master and client?
Is the restore of the control file done and then reporting this message while restoring the database or the control file restore itself seems to be reporting it..

Please do verify basic connectivity from master to client as well as client to master and of there is a separate media server involved then the media to client and voce versa as well..

Please provide us with more inputs as to the commands used to for the restore..

Since this is as alternate client restore have you changed the client_name in the bp.conf file? And hope you are providing appropriate names in the send parameters used in the restore script

We are running with older version of NetBackup 6.5.6 on master, media and client.

Connectivity looks good from all side. Yes, we are restoring to different client and using different media server. We have changed client name on destination server with the name of original server and tried but all different attempts comes to an end at the error mentioned in my previous post.

Restore RMAN script is created by DBA guys and we have configured policy to call that script.

6.5...
Its been ages that version ran out of support.. Why wasn’t nbu upgraded.. so now you can’t even reach out to us with a support case

could we get the details of the script used by your DBA.. I would be more interested on the send parameters they are using..

Since you mentioned that it is a different media server as well did you verify connectivity from the media server to the client as well.. hope it wasn’t just a simple ping command that you used and concluded that connectivity is good but also verified forward and reverse lookups and whether ports are open bidirectionally

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Please tell us about  "using different media server".
Have you changed image or tape ownership to the "different media server"?

to troubleshoot the error, you need to trace the process flow via the logs.

  1. dbclient on the destination client.
    Ensure that the folder exists under /usr/openv/netbackup/logs with 777 permissions.
    This is where everything starts - the client initiates the request that includes parameters such as user name, source client name and local hostname. This request is sent to bprd on the master server.
    1.a)  bpcd log on the client (same location) - used in point 4 below.

  2. bprd on the master server.
    Ensure that the folder exists under /usr/openv/netbackup/logs. Restart NBU if you need to create the folder.
    bprd will show us how NBU interprets the request from the client. 
    Once the image is found in the database, bprd will send the restore instruction to the media server.
    Here we need to see if the instruction is sent to the original or different media server.
    The restore instruction is sent to bpbrm on the relevant media server.

  3. bpbrm on the media server.
    Ensure that the folder exists under /usr/openv/netbackup/logs. No need to restart.
    bpbrm will show restore instruction from the master server.
    bpbrm will now show attempt to bpcd client name and client IP address.
    This is where you need to confirm that this is the destination client.

  4. Check bpcd log on the client to see if media server connection was received and connected back. 

Since you are not getting past this initial comms process, we need to check all of these logs to see where it's going wrong.

If you need assistance with reading the logs, please copy them to .txt files (dbclient.txt, bpcd.txt, bprd.txt, bpbrm.txt) and upload as attachments.

This is just to understand how things are in placed.

Master Server A
Source Client B using Media Server C
Destination Client D using Media Server E

Last week, we attempted the restore on D using E and it got stuck as i shared in my first post.

This week, we renamed D to B and used E for restore but it failed again at the same point. (Attaching logs for this attempt from Feb 6th)

 

Attaching restore script but this is called by below small script so that it can be a oracle user.

#!/bin/bash

su - oracle -c /usr/openv/netbackup/scripts/restore.sh

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Seems your Oracle dba is not trying to restore from NBU but rather from local directory on the Oracle server:

restore controlfile from '/u01/app/oracle/product/10.2.0.5.0/db_1/dbs/iwhprod_bkup_control_db.20180120090005'

But they are specifying the channel as NBU: 
SBT_TAPE' parms='SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64.1'

This is why we see this info message in dbclient log:

VxBSAQueryObject: INF - Object </u01/app/oracle/product/10.2.0.5.0/db_1/dbs/iwhprod_bkup_control_db.20180120090005> was not found in the NetBackup catalog.

So, the dba firstly needs to determine where exactly the control file was backed up to and whether rman catalog or autobackup was used.
NBU cannot restore from disk locatation that was not written (backed up) by NBU.

PS: 
In future, please only post logs as requested. We really don't need a zip of entire log folder.
We only need the log for the restore that failed for each of the processes mentioned.
Files with .txt extentions can be easily opened on any device - including mobile. Any other format can only be downloaded from a computer.

Thanks a lot Marriane! I am sorry about uploading all logs and in zip format.


I will try to bring this up with DBA. In the mean time, i am testing with using FQDN name of source client under altnames. We already had empty file with name No.Restrictions. Now i am planning to use empty destination client name fqdn.


master> ls /usr/openv/netbackup/db/altnames/
No.Restrictions                 romepres-sol10-bu.eus.elnk.net

Also, DBA changed script to match the controle file and we tried three times restore again this AM with restore always stuck at the same point :(

As Marianne mentioned the problem is the DBA is trying to restore from disk image not from nbu image so using FQDN under nbu or making any changes in nbu side is not going to help you here..

From nbu side run this command to get the list of oracle backups in nbu

“bplist -C <source_client_name> -S <master_server_name> -t 4 -l -R /“

**note the names are case sensitive so any incorrect names will result in no entity found error..

From the images listed from this command the dba can select one and attempt to restore with sbt_tape channel

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
We need to see bprd log on the master to see the reason for this error:

14:39:39.507 [2770] <16> xbsa_GetObject: ERR - VxBSAGetObject: Failed with error:
Server Status: operation requested by an invalid server

Please have a look at dbclient 'take2' file and help us understand what these 2 different hostnames on the destination mean (I won't copy the actual names here unless you're okay with it):

Xxxpres-bu
Xxxpres-sol10

I can see that image name for NBU backup was found for source client iXXprod.

The problem seems to be with the way the master interprets the 2 names on the destination client.

So, as I have explained - the request goes to bprd on the master server. This is where the client validation is failing.
' operation requested by an invalid server '

@Amol_Nair - We already tested the option with no luck and it fails at the same point. You can see the take1 logs.

@Marianne - thanks again Marianne for your input. It was just a part of test (in our take2) that we used two different names of destination client in the bp.conf.

Master > atlbaxxxxmaster, Source Client > rome_prexx-bu using media server virt-phoxxx-bu, destination client > romepresxxxsol10-bu using media server virt-hooxxx-bu

I'll uplodad bprd logs from the date of our restore testing from master.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Source client name as CLIENT_NAME in bp.conf is good. 

We now need to see bprd log to see why the master server is rejecting the restore request.

Important that we see one set of logs for the same restore attempt - firstly dbclient and bprd. 
Only when all looks good in these logs will we go to bpbrm on the media server.

I am still confused about the restore via different media server.
How are you doing that?
Have you moved tape/image ownership to the other media server?
Or added FORCE_RESTORE... entry in master's bp.conf? 

So you mean the bplist command failed? Or did it run successfully and list the files backed up.

When you attempt RMAN restore in the backend it  would execute the bplist command before the actual restore is initiated and any failure with the bplist command would result in the job failing.

You mentioned the destination client as "destination client > romepresxxxsol10-bu" and the source client name as "Source Client > rome_prexx-bu

**With an underscore (_)

The names I see in the logs for take 1 shows me the name with hyphen (-).. no names with underscores in it for the bplist request sent to the master server. 

Search for these keywords in the take1 log and you should be able to find the entries I am referring to 

---------------------

14:24:37.307 [2440] <4> dbc_GetServerClientConfig: 

14:24:37.309 [2440] <4> sendRequest:    request: 

14:26:38.026 [2440] <4> serverResponse: 

---------------------

 

Once you get the bplist command working to list all the Oracle images it would be a lot easier to simply use the same entries as SEND parameters in the restore script. So if the bplist command itself is failing at the moment, I would suggest that it would be a better idea to focus on that command. If the bprd logs show an error like an invalid name is being used to make the request (usually in case of alternate client restore and when No.Restrictions file or altnames are not there on the master server) then the same error would also be reported when you attempt the bplist command.

You can start by running the bplist command on the master server. There you do not need any extra entries, nor can you get an invalid server made the request error. Once you get the bplist command working on the master server using the same names in -S and -C parameter, you can then attempt to run it on the client machine as the root user as well as the oracle user.

Running the command as oracle user is what is required but at times if there is some permission related issue, the command may work as the root user but fail as the oracle user, hence suggesting both of them.

@Marianne

yes, source client name in bp.conf of destination is there.

rome_prexx-bu is non-global server on media server virt-phooxx-bu

romeprexx-sol10-bu is non-global server on media server virt-hooxx-bu

source client with its media server and destination client with its media server are on different network so these are the reason we are using different media server for restore.

yes, i have used force restore in bp.conf of master

---------------------------------------------------------------

@Amol_Nair

bplist is successfull

Don't confuse in the source and client name since they look similar (source client is rome_prxx-bu and destination client is rompresx-sol10-bu)

we have No.Restriction in place already on master.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

We are still waiting to see bprd log.... 

And here is the fix that worked.....

After plenty of troubleshooting it was found that the DBA were not using same dbid as of source client in the restore script on destination client. Also the SID was incorrect. The restore is running smooth at the moment.

The DBA is moving to Oracle 12c and proposing to do backups their own instead of using NetBackup so that they can have control on their own on backup/restore. They are little nervous with the restore issue but it's completely due to their poor script.

What are the benefits of using NetBackup for Database backups (to tape/disk) against DBA doing their own? This will help me talk about NetBackup in our upcoming meeting with them.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Please get proof of all these mistakes when you go into your meeting.
We all agree that they are trying to blame NetBackup for their mess-ups.
The outcome would be no different if they restored from disk backup.

I guess with 'own backup' they want to use disk instead of NBU sbt_tape.

My issues with that: 

The additional space needed for this.
The additional task to keep an eye on disk usage and manual cleanup of older backups.
Backing up to disk followed by file-level backup to NBU will take twice as long.
Scheduling becomes an issue as disk backup must be finished before file-level backup can start.
If disk backup has been removed and restore from NBU is needed, the restore will take twice as long because of the 2-step process.

Just one more thing - you really need to upgrade NBU.
If anything goes wrong on NBU side, you cannot log Support calls with Veritas.
As you have experienced, we do not have the time and in-depth experience to assist here on VOX.
Highest level logs and insight into the Oracle side is needed, which we here on VOX simply do not have the time for.

All the best for your meeting!

thanks Marianne for your inputs. you are right, they don't want NetBackup as mediator between their database and backup target. they want to use rman to backup to disk.

yesterday's restore surprised them with taking just 6hrs for approx 9TB with 4 channels running at a time. The reason we used 4 channels is due to the same number of channels are configured in rman backup script. so here is the question and we are planning to test,

Question1-Does rman restore script need to have same number of channels as like rman backup script? If not then i think we can restore in less time.

Question2( Includes 3 questions :) )-All channels completed but parent failed with status code 6. DBA find a successfull restore and applied archives. Failing parent restore job doesn't make any difference? Is it considered as success? Or should i try restore (as test) restore from another date?

yeah, i have the notes what happended before. DBA were not even using dbid in the previous rman restore scripts.

we may upgrade and have datadomain too.

It is usually recommended to use the same number of channels while performing restore that was used at the time
of backup.. If you increase or decrease the number of channels it would highly impact performance of the rman restores..

Parent stream failure means that there was a non zero exit status returned from rman side after the child stream completed successfully some rman command that ran returned a failure.. dba need to check and tell which command reported an error.. could be some channel trying to search for some archive log sequence was the one that reported error..

btw the main issue where uid and gid was not the same is kind of the pre-requisite for alternate client restore with rman.. If the dba does not want this then while taking backups they need to use “BKUP_IMAGE_PERM=ANY”

With this parameter set the images backed up to nbu would kind of have 777 permission. this will not change the permissions on the existing oracle files. but would still kind of be a security breach because any user or group would get permission to go with the restore

you can refer to the below link for details that can be used this variable
https://www.veritas.com/support/en_US/doc/16226115-126559565-0/v33480652-126559565

If the DBA wants his own command just like the way he would take a backup to disk to be implemented in the environment I could suggest the NBU copilot dump and sweep method.. This would give the DBA the choice to take rman backups to local disk and then nbu would also take a sbt_tape backup.. probably you could read through this and see if this could be implemented in your environment.. You do require nbu appliance inorder to configure copilot backups..