04-15-2013 06:57 AM
The error while trying to retore on our freshly installer DR server while recovering the catalog is as follows:
Catalog recovery is in progress. Please wait...
EXIT STATUS 2818: NBU-Catalog policy restore error
ERR - Failed to execute command /usr/openv/netbackup/bin/bprestore -w -T -X -C sesegx10.sagir.qc -t 35 -p NBU-Catalog -e 1364461370 -L "/usr/openv/netbackup/logs/user_ops/root/logs/Recover1366033056.log" "/usr/openv/netbackup/db/images" on host sesegx10 (5)
Here are the details of PRODUCTION MASTER server and DR MASTER SERVER:
Both versions are Netbackup 7.5.0.4
PRODUCTION MASTER SERVER:
[root@sesegx10 ~]# hostname sesegx10.sagir.qc [root@sesegx10 ~]# uname -n sesegx10.sagir.qc [root@sesegx10 ~]# cat /proc/sys/kernel/hostname sesegx10.sagir.qc [root@sesegx10 ~]# sysctl kernel.hostname kernel.hostname = sesegx10.sagir.qc [root@sesegx10 ~]# cat /etc/sysconfig/network | grep -i hostname HOSTNAME=sesegx10.sagir.qc [root@sesegx10 ~]# hostname -f sesegx10.sagir.qc [root@sesegx10 ~]# hostname -a sesegx10 [root@sesegx10 ~]# nbemmcmd -listhosts | grep sesegx10 master sesegx10 server sesegx10 [root@sesegx10 ~]# nbemmcmd -listhosts NBEMMCMD, Version: 7.5.0.4 The following hosts were found: media sedefu05 media sedefu06 media sepfou04 media sepfou03 media seprdu01 media seprdu02 media seprdu04 media seprdu03 media sesegu02 media sesegu01 media sepfou02 media sepfou01 media sedefu04 media sedefu03 media sedefu01 media sedefu02 media seppuu02 media seppuu01 media sesegx09 media sesegx08 master sesegx10 client sesegx09 client seppux26 client seppux27 client seppux06 client seppux05 client seppux11 client seppux10 client seppuu03 client seppuu04 client seprdu06 client seprdu05 client sesegx05 server sesegx10 Command completed successfully. [root@sesegx10 ~]# grep sesegx10 /usr/openv/netbackup/bp.conf SERVER = sesegx10.sagir.qc CLIENT_NAME = sesegx10 EMMSERVER = sesegx10 CONNECT_OPTIONS = sesegx10 0 1 0 FORCE_RESTORE_MEDIA_SERVER = sesegu01 sesegx10 FORCE_RESTORE_MEDIA_SERVER = seppuu02 sesegx10 FORCE_RESTORE_MEDIA_SERVER = sedefu02 sesegx10 FORCE_RESTORE_MEDIA_SERVER = sesegu02 sesegx10 FAILOVER_RESTORE_MEDIA_SERVERS = seppuu02 sesegx10 THROTTLE_BANDWIDTH = sesegx10 400000 [root@sesegx10 ~]# nbemmcmd -getemmserver | grep MASTER MASTER 7.5 sesegx10 sesegx10 [root@sesegx10 ~]# /usr/openv/db/bin/nbdb_ping Database [NBDB] is alive and well on server [NB_sesegx10].
DR MASTER SERVER:
[root@sesegx10 logs]# hostname sesegx10.sagir.qc [root@sesegx10 logs]# uname -n sesegx10.sagir.qc [root@sesegx10 logs]# cat /proc/sys/kernel/hostname sesegx10.sagir.qc [root@sesegx10 logs]# sysctl kernel.hostname kernel.hostname = sesegx10.sagir.qc [root@sesegx10 logs]# cat /etc/sysconfig/network | grep -i hostname HOSTNAME=sesegx10.sagir.qc [root@sesegx10 logs]# hostname -f sesegx10.sagir.qc [root@sesegx10 logs]# hostname -a sesegx10 [root@sesegx10 logs]# nbemmcmd -listhosts | grep sesegx10 server sesegx10 master sesegx10 [root@sesegx10 logs]# nbemmcmd -listhosts NBEMMCMD, Version: 7.5.0.4 The following hosts were found: server sesegx10 master sesegx10 client seprdu05 foreign_media sesegx08 foreign_media sesegx09 Command completed successfully. [root@sesegx10 logs]# grep sesegx10 /usr/openv/netbackup/bp.conf SERVER = sesegx10 CLIENT_NAME = sesegx10 EMMSERVER = sesegx10 FORCE_RESTORE_MEDIA_SERVER = sesegx08 sesegx10 FORCE_RESTORE_MEDIA_SERVER = sesegx09 sesegx10 FORCE_RESTORE_MEDIA_SERVER = seprdu01 sesegx10 FORCE_RESTORE_MEDIA_SERVER = seprdu02 sesegx10 FORCE_RESTORE_MEDIA_SERVER = seprdu03 sesegx10 FORCE_RESTORE_MEDIA_SERVER = seprdu05 sesegx10 FORCE_RESTORE_MEDIA_SERVER = sesegx12 sesegx10 FORCE_RESTORE_MEDIA_SERVER = seppuu01 sesegx10 [root@sesegx10 logs]# nbemmcmd -getemmserver | grep MASTER MASTER 7.5 sesegx10 sesegx10 [root@sesegx10 logs]# /usr/openv/db/bin/nbdb_ping Database [NBDB] is alive and well on server [NB_sesegx10].
04-15-2013 07:31 AM
Are you restoring from disk or tape?
Which media server performed the catalog backup in production?
Oh - SERVER names in bp.conf are NOT matching:
Prod:
SERVER = sesegx10.sagir.qc CLIENT_NAME = sesegx10 EMMSERVER = sesegx10
DR:
SERVER = sesegx10 CLIENT_NAME = sesegx10 EMMSERVER = sesegx10
All names must match EXACTLY.
Do you have an alias in /etc/hosts for shortname and FQDN?
Please find the restore log ( /usr/openv/netbackup/logs/user_ops/root/logs/Recover<date-time>.log ) and post as File attachment.
04-15-2013 08:08 AM
Hello Marianne,
Thank you for your reply.
The media server which took the backup in PRODUCTION is listed as sesegx10:
sesegx10.sagir.qc_1364461370 Thu Mar 28 05:02:50 EDT 2013 NBU-Catalog vault_netbackup_catalog sesegx10 S00012 1 Yes No null No No
Also I tried with the long name in the bp.conf to match the PRODUCTION bp.conf, but the same result.
The alias in /etc/hosts for shortname and FQDN is there and they both resolve to the same IP:
[root@sesegx10 ~]# cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost 172.29.65.205 sesegx10.sagir.qc sesegx10 [root@sesegx10 ~]# ping sesegx10 PING sesegx10.sagir.qc (172.29.65.205) 56(84) bytes of data. 64 bytes from sesegx10.sagir.qc (172.29.65.205): icmp_seq=1 ttl=64 time=0.033 ms [root@sesegx10 ~]# ping sesegx10.sagir.qc PING sesegx10.sagir.qc (172.29.65.205) 56(84) bytes of data. 64 bytes from sesegx10.sagir.qc (172.29.65.205): icmp_seq=1 ttl=64 time=0.029 ms
There is no additional information in the restore log ( /usr/openv/netbackup/logs/user_ops/root/logs/Recover<date-time>.log ), here is the content:
[root@sesegx10 logs]# more Recover1366031870.log 09:19:54 INF - scheduler is suspended Restore started 04/15/2013 09:19:54 09:19:55 (293.xxx) /usr/openv/netbackup/db/images -s 03/28/2013 05:02:50 -e 03/28/2013 05:02:50 - no files matched in the given date range 09:19:55 (293.xxx) INF - Status = NBU-Catalog policy restore error. 09:19:55 ERR - Failed to execute command /usr/openv/netbackup/bin/bprestore -w -T -X -C sesegx10.sagir.qc -t 35 -p NBU-Catalog -e 1364461370 -L "/usr/openv/netbackup/logs/user_ops/root/logs/Recover 1366031870.log" "/usr/openv/netbackup/db/images" on host sesegx10 (5) 09:20:00 INF - attempting to freeze media used in recovery 09:25:00 WRN - media S00012 is frozen to prevent overwrite of catalog backup 09:30:01 WRN - media S00012 is frozen to prevent overwrite of catalog backup
I would like to point out something concerning the discrepancy between PRODUCTION and DR bp.conf file. We installed the DR server with the name sesegx10 as netbackup master server name since all query to get the master server name in the PRODUCTION server (except the single entry in the bp.conf) is sesegx10.
However in doing the installation this way in the DR, the entry in the bp.conf come out to be sesegx10 when we install it this way, which is logical.
But I cannot understand why the entry in the PRODUCTION bp.conf is sesegx10.sagir.qc since the nbemmcmd -listhosts give the name sesegx10 instead.
I suspect that this may be due to :
1. A manual modification of the bp.conf file on the PRODUCTION by some user.
2. A discrepancy due to the fact that the PRODUCTION MASTER SERVER is at 7.5.0.4 after an upgrade from 6.5.6 while the DR MASTER SERVER is a 7.5 installation from scratch which is then upgraded to 7.5.0.4.
I dont have enough knowledge to assess if this may be the case. Please give me your view.
04-15-2013 08:45 AM
What name did you use to install your master on the DR system ?
I would use the long name (sesegx10.sagir.qc). That what the bprestore command is looking for.
Also check that you don't have links to open/db and openv/netbackup/db.
Good if you answered Marianne's question, how are you runnning restore ? Is catalog backup from disk or tape. I guess your using DR recover file.
04-15-2013 11:52 AM
The catalog backup is to tape and I use a DR file.
I used the short name sesegx10 to install the DR system. I also tried to reinstall with the long name sesegx10.sagir.qc but i get the same error.
I have a link of /usr/openv/netbackup/db/images pointing to /u030/netbackup/db/images on both PRODUCTION and DR server.
I also did another test, I did a catalog backup on DR server itself and tried to recover in on itself and this works fine.
I do not understand why the catalog backup comming from the PRODUCTION is not working.
04-16-2013 12:26 AM
The link you have should not be a problem. nevertheless did you try restore without the link ?
Can you post the recovery log like Marianne asked ?
"Please find the restore log ( /usr/openv/netbackup/logs/user_ops/root/logs/Recover<date-time>.log ) and post as File attachment."
When you did reinstall did you have both names in hosts (short and long), did you reinstall from scratch when you used long name ?
04-16-2013 01:21 AM
I see you have a MORE of your recovery log above. What command did you use to generate the recover ?
Did you try bprecover -wizard ?
Seems like the first part did not work.
Here is a sample bprecover I did not so long ago with 7.5.0.4 undet Linux RH. It failed beacuse openv/db was on a link. But nevertheless the output does not seem the same as what you have above.
I see nowhere the db/images -e command in my output.
Below is an output from bprecover -wizard with specfication of DR file name and location.
15:57:47 INF - scheduler is suspended
Restore started 01/16/2013 15:57:47
15:57:47 (180.xxx) -----------------------------------------------------------
15:57:47 (180.xxx) WARNING: The following files and directories will not
15:57:47 (180.xxx) be restored because they are not present when the backup
15:57:47 (180.xxx) was done on Fri 11 Jan 2013 12:42:24 PM CET. These files and
15:57:47 (180.xxx) directories were either moved or deleted prior to this
15:57:47 (180.xxx) backup, but did exist in a previous backup. If needed,
15:57:47 (180.xxx) they can be restored by doing a normal restore instead
15:57:47 (180.xxx) of doing a true image restore.
15:57:47 (180.xxx) -----------------------------------------------------------
15:57:47 (180.xxx) /produits/nbu_data_frrmmutnbup12/openv/db/staging/DARS_DATA.db is not in the true image list. Skipping.
15:57:47 (180.xxx) / ETC.
04-16-2013 08:33 AM
Hello Jean-Pierre,
Yes I did an installation from scratch for long and short name with no result.
I am attaching the restore logs (Recover1366125997.log) generated when launching the bprecover -wizard (and without the link of the openv/db/image) as well as the full output of the wizard (wizard.txt)
04-16-2013 08:57 AM
Files attached
04-16-2013 12:02 PM
What happens if you do a full catalog restore ?
Look like the first step concerning images is not done as you specficied do not do full catalog restore. So thats not the error.
So my guess is your failing on second step. I guess bprestore command is available, and can be executed.
I suppose you are running with root.
Anything in the logs ? I guess your tape is mounted right ?
04-17-2013 02:36 AM
I had a similar problem resulting in code 2818, the issue was with /usr/openv/db being on a link the restore process wanted to access /usr/openv/db/emm.db.
The tar process was stuck and I had to restart server to clear it.
Picked up job log, file list and detailed status, and was asked for the following NBU logs with verbosity 5
From the master server: bprd tar bpdbm
From the media server: bptm bpbrm bpdm
04-17-2013 06:45 AM
Hello Jean-Pierre,
If I do a full catalog restore, it proceeds with the restore, but I have not investigated that part because our DR procedure is related to partial catalog recovery.
I am running the command as root and the tape is mounted and positionned before the error.
In my case, the /usr/openv/db is not a link.
The log from the server is as follows:
bprd: none generated
tar: none generated
bpdbm:as attached
bptm:as attached
bpbrm: none generated
bpdm: none generated
04-17-2013 07:41 AM
OK so you are saying a full restore works, its your partial restore that does not work is that correct ?.
04-17-2013 09:55 AM
Let me try that quickly. I am not sure if the full restore works as well, but it was getting further than the partial restore. I launched the restore right now and will get the result shortly. Thank you your reply.
04-17-2013 11:35 AM
I did the full catalog restore, It ended with some errors (in the recover log attached) but the content of the catalog tape has been copied to the /u030/netbackup/db/images (which was the catalog directory on the PRODUCTION server), i stoppped netbackup, made a symbolic link at /usr/openv/netbackup/db/images, pointing to /u030/netbackup/db/images, touched the file db_marker.txt in /u030/netbackup/db/images and started netbackup.
The next step I tried to do is the configure storage devices on the GUI so that the storage devices restored from the catalog backup are cleaned (because they are not present on the DR site) and so that the new ones are added. I get error "Could not connect to vmd on host sesegx10.sagir.qc (70), I hence notice that VMD is not running.
[root@sesegx10 logs]# /usr/openv/volmgr/bin/vmglob -listall
Could not connect to vmd on host sesegx10 (70)
[root@sesegx10 logs]# nbemmcmd -listhosts
NBEMMCMD, Version: 7.5.0.4
Failed to initialize EMM connection. Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
Command did not complete successfully.
I have attached the wizard and the recover logs.
04-17-2013 12:02 PM
After the full catalog recovery and the procedure in my previous thread, I just ran the following and everything works, my catalog is recovered, but I don't see why:
[root@sesegx10 openv]# ./db/bin/nbdb_ping
Database [NBDB] is alive and well on server [NB_sesegx10].
[root@sesegx10 openv]# ./db/bin/nbdb_admin -auto_start NBDB
Successfully added NBDB to databases.conf.
[root@sesegx10 openv]# /usr/openv/db/bin/nbdbms_start_server
NB_dbsrv is already running.
[root@sesegx10 openv]# /opt/VRTSpbx/bin/vxpbx_exchanged start
Instance of pbx is already running. Stop it first.
[root@sesegx10 openv]# /usr/openv/db/bin/nbdbms_start_server
NB_dbsrv is already running.
[root@sesegx10 openv]# /usr/openv/db/bin/nbdb_upgrade
Verifying the running version of NBDB ...
NBDB version 7.5.0.4 verified.
Nothing to upgrade. Version unchanged.
Database [NBDB] validation successful.
Start nbemm manually in console GUI
Start vmd manually in console GUI
Everything works...
04-17-2013 12:38 PM
I don't understand your comment about the touch file, I thought you had set up NBU as it was on your production system with the same links and so on.
I would recheck links to make sure all is the same, same names and so on. Esepcially in /usr/openv/db and /usr/openv/netbackup/db
Then rerun your full catalog restore.
I think this time it will run without errors.
04-19-2013 09:12 AM
Hello,
The full catalog restore works with the links to db/images but then netbackup lags and gets a lot of errors because the full restore on DR have reference to all the tapes library/media servers etc which of course do not exist on DR.
I tried to remediate this by decommisionning all the tape devices and robots which are not relevant on the DR site by the following:
for i in {1..100}; do ./tpconfig -delete -drive $i;done
for i in {1..100}; do ./tpconfig -delete -robot $i;done
./vmglob -listall -java
for i in `./vmglob -listall -java | awk '{print $5}' | sort -u`; do ./vmglob -delete -devhost $i; done
./vmglob -listall -java
I also decommisionned the invalid media servers with the nbdecommission tools. But even now I am getting a last bug:
When I interogatre the catalog, I see images as primary copy. Hovewer they do not show up in the restore GUI. The bplist also list nothing.
# bplist -C seprdu02 -l -R /
EXIT STATUS 227: no entity was found
I tried to run a catalog>select image>verify (a media-verify) which fails with the error:
04/19/2013 11:55:14 - Error bpverify (pid=32759) Expected filename /.var_disk_os_patch in database, found no more files. 04/19/2013 11:55:14 - Error bpverify (pid=32759) Expected filename /.lsof_seprdu02 in database, found no more files. 04/19/2013 11:55:14 - Error bpverify (pid=32759) Expected filename /.lesshst in database, found no more files. 04/19/2013 11:55:14 - Error bpverify (pid=32759) Expected filename /lista in database, found no more files. 04/19/2013 11:55:14 - Error bpverify (pid=32759) At least 10 database compare errors occurred, not logging any more. 04/19/2013 12:02:45 - Info bptm (pid=310) waited for empty buffer 606 times, delayed 1057 times 04/19/2013 12:02:45 - end reading; read time: 0:07:31 04/19/2013 12:02:45 - Info tar (pid=309) done. status: 0 04/19/2013 12:02:45 - Info bptm (pid=310) completed reading backup image 04/19/2013 12:02:45 - Info bptm (pid=310) EXITING with status 0 <---------- 04/19/2013 12:02:46 - Error bpverify (pid=32759) Verify of policy sun_seprdu02_bd, schedule Full_Backup (seprdu02_1364153429) failed, the database contains conflicting or erroneous entries. 04/19/2013 12:02:46 - Info tar (pid=309) done. status: 0: the requested operation was successfully completed 04/19/2013 12:02:46 - Error bpverify (pid=32759) Status = no images were successfully processed. 04/19/2013 12:02:46 - end Verify; elapsed time 0:13:40 no images were successfully processed (191)
04-20-2013 12:39 AM
I have seen that before, image available in catalog but gui will not display file list, if your catalog is compressed and you do not have decompress software installed on your master, like ncompress.
Is this you case ?
Maybe you mean something else by not showing up in gui ? Please elaborate if it is the case.
I guess in bp/images there is an entry for your client with real life files behind ?