cancel
Showing results for 
Search instead for 
Did you mean: 

Restore of catalog on DR server failing

Noor_Toorabally
Level 4

The error while trying to retore on our freshly installer DR server while recovering the catalog is as follows:

 

Catalog recovery is in progress. Please wait...
EXIT STATUS 2818: NBU-Catalog policy restore error
ERR - Failed to execute command /usr/openv/netbackup/bin/bprestore -w -T -X -C sesegx10.sagir.qc -t 35 -p NBU-Catalog -e 1364461370 -L "/usr/openv/netbackup/logs/user_ops/root/logs/Recover1366033056.log" "/usr/openv/netbackup/db/images" on host sesegx10 (5)
 

 

Here are the details of PRODUCTION MASTER server and DR MASTER SERVER:

Both versions are Netbackup 7.5.0.4 

PRODUCTION MASTER SERVER:

[root@sesegx10 ~]# hostname
sesegx10.sagir.qc
[root@sesegx10 ~]# uname -n
sesegx10.sagir.qc
[root@sesegx10 ~]# cat /proc/sys/kernel/hostname
sesegx10.sagir.qc
[root@sesegx10 ~]# sysctl kernel.hostname
kernel.hostname = sesegx10.sagir.qc
[root@sesegx10 ~]# cat /etc/sysconfig/network | grep -i hostname
HOSTNAME=sesegx10.sagir.qc
[root@sesegx10 ~]# hostname -f
sesegx10.sagir.qc
[root@sesegx10 ~]# hostname -a
sesegx10
[root@sesegx10 ~]# nbemmcmd -listhosts | grep sesegx10
master           sesegx10
server           sesegx10
[root@sesegx10 ~]# nbemmcmd -listhosts
NBEMMCMD, Version: 7.5.0.4
The following hosts were found:
media            sedefu05
media            sedefu06
media            sepfou04
media            sepfou03
media            seprdu01
media            seprdu02
media            seprdu04
media            seprdu03
media            sesegu02
media            sesegu01
media            sepfou02
media            sepfou01
media            sedefu04
media            sedefu03
media            sedefu01
media            sedefu02
media            seppuu02
media            seppuu01
media            sesegx09
media            sesegx08
master           sesegx10
client           sesegx09
client           seppux26
client           seppux27
client           seppux06
client           seppux05
client           seppux11
client           seppux10
client           seppuu03
client           seppuu04
client           seprdu06
client           seprdu05
client           sesegx05
server           sesegx10
Command completed successfully.
[root@sesegx10 ~]# grep sesegx10 /usr/openv/netbackup/bp.conf
SERVER = sesegx10.sagir.qc
CLIENT_NAME = sesegx10
EMMSERVER = sesegx10
CONNECT_OPTIONS = sesegx10 0 1 0
FORCE_RESTORE_MEDIA_SERVER = sesegu01 sesegx10
FORCE_RESTORE_MEDIA_SERVER = seppuu02 sesegx10
FORCE_RESTORE_MEDIA_SERVER = sedefu02 sesegx10
FORCE_RESTORE_MEDIA_SERVER = sesegu02 sesegx10
FAILOVER_RESTORE_MEDIA_SERVERS = seppuu02 sesegx10
THROTTLE_BANDWIDTH = sesegx10 400000
[root@sesegx10 ~]# nbemmcmd -getemmserver | grep MASTER

MASTER         7.5                 sesegx10                      sesegx10
[root@sesegx10 ~]# /usr/openv/db/bin/nbdb_ping
Database [NBDB] is alive and well on server [NB_sesegx10].

 

DR MASTER SERVER:

[root@sesegx10 logs]# hostname
sesegx10.sagir.qc
[root@sesegx10 logs]# uname -n
sesegx10.sagir.qc
[root@sesegx10 logs]# cat /proc/sys/kernel/hostname
sesegx10.sagir.qc
[root@sesegx10 logs]# sysctl kernel.hostname
kernel.hostname = sesegx10.sagir.qc
[root@sesegx10 logs]# cat /etc/sysconfig/network | grep -i hostname
HOSTNAME=sesegx10.sagir.qc
[root@sesegx10 logs]# hostname -f
sesegx10.sagir.qc
[root@sesegx10 logs]# hostname -a
sesegx10
[root@sesegx10 logs]# nbemmcmd -listhosts | grep sesegx10
server           sesegx10
master           sesegx10
[root@sesegx10 logs]# nbemmcmd -listhosts
NBEMMCMD, Version: 7.5.0.4
The following hosts were found:
server           sesegx10
master           sesegx10
client           seprdu05
foreign_media    sesegx08
foreign_media    sesegx09
Command completed successfully.
[root@sesegx10 logs]# grep sesegx10 /usr/openv/netbackup/bp.conf
SERVER = sesegx10
CLIENT_NAME = sesegx10
EMMSERVER = sesegx10
FORCE_RESTORE_MEDIA_SERVER = sesegx08 sesegx10
FORCE_RESTORE_MEDIA_SERVER = sesegx09 sesegx10
FORCE_RESTORE_MEDIA_SERVER = seprdu01 sesegx10
FORCE_RESTORE_MEDIA_SERVER = seprdu02 sesegx10
FORCE_RESTORE_MEDIA_SERVER = seprdu03 sesegx10
FORCE_RESTORE_MEDIA_SERVER = seprdu05 sesegx10
FORCE_RESTORE_MEDIA_SERVER = sesegx12 sesegx10
FORCE_RESTORE_MEDIA_SERVER = seppuu01 sesegx10
[root@sesegx10 logs]# nbemmcmd -getemmserver | grep MASTER

MASTER         7.5                 sesegx10                      sesegx10
[root@sesegx10 logs]# /usr/openv/db/bin/nbdb_ping
Database [NBDB] is alive and well on server [NB_sesegx10].

18 REPLIES 18

Marianne
Level 6
Partner    VIP    Accredited Certified

Are you restoring from disk or tape?

Which media server performed the catalog backup in production?

Oh - SERVER names in bp.conf are NOT matching:

Prod:


SERVER = sesegx10.sagir.qc
CLIENT_NAME = sesegx10
EMMSERVER = sesegx10

 

DR:


SERVER = sesegx10
CLIENT_NAME = sesegx10 
EMMSERVER = sesegx10

 

All names must match EXACTLY.

Do you have an alias in /etc/hosts for shortname and FQDN?

Please find the restore log ( /usr/openv/netbackup/logs/user_ops/root/logs/Recover<date-time>.log  ) and post as File attachment.

 

Noor_Toorabally
Level 4

Hello Marianne,

Thank you for your reply.

The media server which took the backup in PRODUCTION is listed as sesegx10:

sesegx10.sagir.qc_1364461370 Thu Mar 28 05:02:50 EDT 2013 NBU-Catalog vault_netbackup_catalog sesegx10 S00012 1 Yes No null No No

 

Also I tried with the long name in the bp.conf to match the PRODUCTION bp.conf, but the same result.

The alias in  /etc/hosts for shortname and FQDN is there and they both resolve to the same IP:

[root@sesegx10 ~]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
172.29.65.205   sesegx10.sagir.qc sesegx10

[root@sesegx10 ~]# ping sesegx10
PING sesegx10.sagir.qc (172.29.65.205) 56(84) bytes of data.
64 bytes from sesegx10.sagir.qc (172.29.65.205): icmp_seq=1 ttl=64 time=0.033 ms

[root@sesegx10 ~]# ping sesegx10.sagir.qc
PING sesegx10.sagir.qc (172.29.65.205) 56(84) bytes of data.
64 bytes from sesegx10.sagir.qc (172.29.65.205): icmp_seq=1 ttl=64 time=0.029 ms


 

There is no additional information in the restore log ( /usr/openv/netbackup/logs/user_ops/root/logs/Recover<date-time>.log  ), here is the content:

[root@sesegx10 logs]# more Recover1366031870.log
09:19:54 INF - scheduler is suspended
Restore started 04/15/2013 09:19:54

09:19:55 (293.xxx) /usr/openv/netbackup/db/images -s 03/28/2013 05:02:50 -e 03/28/2013 05:02:50 - no files matched in the given date range

09:19:55 (293.xxx) INF - Status = NBU-Catalog policy restore error.

09:19:55 ERR - Failed to execute command /usr/openv/netbackup/bin/bprestore -w -T -X -C sesegx10.sagir.qc -t 35 -p NBU-Catalog -e 1364461370 -L "/usr/openv/netbackup/logs/user_ops/root/logs/Recover
1366031870.log" "/usr/openv/netbackup/db/images" on host sesegx10 (5)

09:20:00 INF - attempting to freeze media used in recovery
09:25:00 WRN - media S00012 is frozen to prevent overwrite of catalog backup
09:30:01 WRN - media S00012 is frozen to prevent overwrite of catalog backup


 

I would like to point out something concerning the discrepancy between PRODUCTION and DR bp.conf file. We installed the DR server with the name sesegx10 as netbackup master server name since all query to get the master server name in the PRODUCTION server (except the single entry in the bp.conf) is sesegx10.
However in doing the installation this way in the DR, the entry in the bp.conf come out to be sesegx10 when we install it this way, which is logical.
But I cannot understand why the entry in the PRODUCTION bp.conf is sesegx10.sagir.qc since the  nbemmcmd -listhosts give the name sesegx10 instead.

I suspect that this may be due to :
1. A manual modification of the bp.conf file on the PRODUCTION by some user.
2. A discrepancy due to the fact that the PRODUCTION MASTER SERVER is at 7.5.0.4 after an upgrade from 6.5.6 while the DR MASTER SERVER is a 7.5 installation from scratch which is then upgraded to 7.5.0.4.
I dont have enough knowledge to assess if this may be the case. Please give me your view.

Jean-Pierre_Bai
Level 4
Partner Accredited

What name did you use to install your master on the DR system ?

I would use the long name (sesegx10.sagir.qc). That what the bprestore command is looking for.

Also check that you don't have links to open/db and openv/netbackup/db.

Good if you answered Marianne's question, how are you runnning restore ? Is catalog backup from disk or tape. I guess your using DR recover file.

Noor_Toorabally
Level 4

The catalog backup is to tape and I use a DR file.

I used the short name sesegx10 to install the DR system. I also tried to reinstall with the long name sesegx10.sagir.qc but i get the same error.

I have a link of /usr/openv/netbackup/db/images pointing to /u030/netbackup/db/images on both PRODUCTION and DR server.

I also did another test, I did a catalog backup on DR server itself and tried to recover in on itself and this works fine.

 

I do not understand why the catalog backup comming from the PRODUCTION is not working.

Jean-Pierre_Bai
Level 4
Partner Accredited

The link you have should not be a problem. nevertheless did you try restore without the link ?

Can you post the recovery log like Marianne asked ?

"Please find the restore log ( /usr/openv/netbackup/logs/user_ops/root/logs/Recover<date-time>.log ) and post as File attachment."

When you did reinstall did you have both names in hosts (short and long), did you reinstall from scratch  when you used long name ?

 

Jean-Pierre_Bai
Level 4
Partner Accredited

I see you have a MORE of your recovery log above. What command did you use to generate the recover ?

Did you try bprecover -wizard ?

Seems like the first part did not work.

Here is a sample bprecover I did not so long ago with 7.5.0.4 undet Linux RH. It failed beacuse openv/db was on a link. But nevertheless the output does not seem the same as what you have above.

I see nowhere the db/images -e command in my output.

Below is an output from bprecover -wizard with specfication of DR file name and  location.

15:57:47 INF - scheduler is suspended

Restore started 01/16/2013 15:57:47

15:57:47 (180.xxx) -----------------------------------------------------------

15:57:47 (180.xxx) WARNING: The following files and directories will not

15:57:47 (180.xxx) be restored because they are not present when the backup

15:57:47 (180.xxx) was done on Fri 11 Jan 2013 12:42:24 PM CET. These files and

15:57:47 (180.xxx) directories were either moved or deleted prior to this

15:57:47 (180.xxx) backup, but did exist in a previous backup. If needed,

15:57:47 (180.xxx) they can be restored by doing a normal restore instead

15:57:47 (180.xxx) of doing a true image restore.

15:57:47 (180.xxx) -----------------------------------------------------------

15:57:47 (180.xxx) /produits/nbu_data_frrmmutnbup12/openv/db/staging/DARS_DATA.db is not in the true image list. Skipping.

15:57:47 (180.xxx) / ETC.

Noor_Toorabally
Level 4

Hello Jean-Pierre,

Yes I did an installation from scratch for long and short name with no result.

I am attaching the restore logs (Recover1366125997.log) generated when launching the bprecover -wizard (and without the link of the openv/db/image) as well as the full output of the wizard (wizard.txt)

Noor_Toorabally
Level 4

Files attached

Jean-Pierre_Bai
Level 4
Partner Accredited

What happens if you do a full catalog restore ?

Look like the first step concerning images is not done as you specficied do not do full catalog restore. So thats not the error.

So my guess is your failing on second step. I guess bprestore command is available, and can be executed.

I suppose you are running with root.

Anything in the logs ? I guess your tape is mounted right ?

 

 

Jean-Pierre_Bai
Level 4
Partner Accredited

I had a similar problem resulting in code 2818, the issue was with /usr/openv/db being on a link the restore process wanted to access /usr/openv/db/emm.db.

The tar process was stuck and I had to restart server to clear it.

Picked up job log, file list and detailed status,  and was asked for the following NBU logs with verbosity 5

From the master server: bprd  tar  bpdbm

From the media server:  bptm  bpbrm  bpdm

Noor_Toorabally
Level 4

Hello Jean-Pierre,

If I do a full catalog restore, it proceeds with the restore, but I have not investigated that part because our DR procedure is related to partial catalog recovery.
I am running the command as root and the tape is mounted and positionned before the error.
In my case, the  /usr/openv/db  is not a link.

The log from the server is as follows:
bprd: none generated
tar: none generated
bpdbm:as attached
bptm:as attached
bpbrm: none generated
bpdm: none generated

Jean-Pierre_Bai
Level 4
Partner Accredited

OK so you are saying a full restore works, its your partial restore that does not work is that correct ?.

 

Noor_Toorabally
Level 4

Let me try that quickly. I am not sure if the full restore works as well, but it was getting further than the partial restore. I launched the restore right now and will get the result shortly. Thank you your reply.

Noor_Toorabally
Level 4

I did the full catalog restore, It ended with some errors (in the recover log attached) but the content of the catalog tape has been copied to the /u030/netbackup/db/images (which was the catalog directory on the PRODUCTION server), i stoppped netbackup, made a symbolic link at /usr/openv/netbackup/db/images, pointing to /u030/netbackup/db/images, touched the file db_marker.txt in /u030/netbackup/db/images and started netbackup.

The next step I tried to do is the configure storage devices on the GUI so that the storage devices restored from the catalog backup are cleaned (because they are not present on the DR site) and so that the new ones are added. I get error "Could not connect to vmd on host sesegx10.sagir.qc (70), I hence notice that VMD is not running.

[root@sesegx10 logs]# /usr/openv/volmgr/bin/vmglob -listall
Could not connect to vmd on host sesegx10 (70)

[root@sesegx10 logs]# nbemmcmd -listhosts
NBEMMCMD, Version: 7.5.0.4
Failed to initialize EMM connection.  Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
Command did not complete successfully.

 

I have attached the wizard and the recover logs.

Noor_Toorabally
Level 4

After the full catalog recovery and the procedure in my previous thread, I just ran the following and everything works, my catalog is recovered, but I don't see why:

[root@sesegx10 openv]# ./db/bin/nbdb_ping
Database [NBDB] is alive and well on server [NB_sesegx10].
[root@sesegx10 openv]# ./db/bin/nbdb_admin -auto_start  NBDB
Successfully added NBDB to databases.conf.
[root@sesegx10 openv]# /usr/openv/db/bin/nbdbms_start_server
NB_dbsrv is already running.
[root@sesegx10 openv]# /opt/VRTSpbx/bin/vxpbx_exchanged start
Instance of pbx is already running. Stop it first.
[root@sesegx10 openv]# /usr/openv/db/bin/nbdbms_start_server
NB_dbsrv is already running.
[root@sesegx10 openv]# /usr/openv/db/bin/nbdb_upgrade
Verifying the running version of NBDB ...
NBDB version 7.5.0.4 verified.
Nothing to upgrade. Version unchanged.

Database [NBDB] validation successful.

Start nbemm manually in console GUI
Start vmd manually in console GUI

Everything works...

Jean-Pierre_Bai
Level 4
Partner Accredited

I don't understand your comment about the touch file, I thought you had set up NBU as it was on your production system with the same links and so on.

I would recheck links to make sure all is the same, same names and so on. Esepcially in /usr/openv/db and /usr/openv/netbackup/db

Then rerun your full catalog restore.

I think this time it will run without errors.

 

Noor_Toorabally
Level 4

Hello,

 

The full catalog restore works with the links to db/images but then netbackup lags and gets a lot of errors because the full restore on DR have reference to all the tapes library/media servers etc which of course do not exist on DR.

 

I tried to remediate this by decommisionning all the tape devices and robots which are not relevant on the DR site by the following:

for i in {1..100}; do ./tpconfig -delete -drive $i;done
for i in {1..100}; do ./tpconfig -delete -robot $i;done
./vmglob -listall -java
for i in `./vmglob -listall -java | awk '{print $5}' | sort -u`; do ./vmglob -delete -devhost $i; done
./vmglob -listall -java

 

I also decommisionned the invalid media servers with the nbdecommission tools. But even now I am getting a last bug:

 

When I interogatre the catalog, I see images as primary copy. Hovewer they do not show up in the restore GUI. The bplist also list nothing.

# bplist -C seprdu02 -l -R /
EXIT STATUS 227: no entity was found

I tried to run a catalog>select image>verify (a media-verify) which fails with the error:

04/19/2013 11:55:14 - Error bpverify (pid=32759) Expected filename /.var_disk_os_patch in database, found no more files.
04/19/2013 11:55:14 - Error bpverify (pid=32759) Expected filename /.lsof_seprdu02 in database, found no more files.
04/19/2013 11:55:14 - Error bpverify (pid=32759) Expected filename /.lesshst in database, found no more files.
04/19/2013 11:55:14 - Error bpverify (pid=32759) Expected filename /lista in database, found no more files.
04/19/2013 11:55:14 - Error bpverify (pid=32759) At least 10 database compare errors occurred, not logging any more.
04/19/2013 12:02:45 - Info bptm (pid=310) waited for empty buffer 606 times, delayed 1057 times
04/19/2013 12:02:45 - end reading; read time: 0:07:31
04/19/2013 12:02:45 - Info tar (pid=309) done. status: 0
04/19/2013 12:02:45 - Info bptm (pid=310) completed reading backup image
04/19/2013 12:02:45 - Info bptm (pid=310) EXITING with status 0 <----------
04/19/2013 12:02:46 - Error bpverify (pid=32759) Verify of policy sun_seprdu02_bd, schedule Full_Backup (seprdu02_1364153429) failed, the database contains conflicting or erroneous entries.
04/19/2013 12:02:46 - Info tar (pid=309) done. status: 0: the requested operation was successfully completed
04/19/2013 12:02:46 - Error bpverify (pid=32759) Status = no images were successfully processed.
04/19/2013 12:02:46 - end Verify; elapsed time 0:13:40
no images were successfully processed  (191)

Jean-Pierre_Bai
Level 4
Partner Accredited

I have seen that before, image available in catalog but gui will not display file list, if your catalog is compressed and you do not have decompress software installed on your master, like ncompress.

Is this you case ?

Maybe you mean something else by not showing up in gui ? Please elaborate if it is the case.

I guess in bp/images there is an entry for your client with real life files behind ?