cancel
Showing results for 
Search instead for 
Did you mean: 

Problem Recovering Netbackup Catalog

Alberto_Colombo
Level 4
Partner Accredited

 

Hi,
 
i've a question about restoring FULL Netbackup 7.1 Catalog using N5000 Appliance (in DR site)
 
Here is my setup:
 
One NetBackup Master Server 7.1.0.3 with RHEL 5.6 on a Virtual Machine (vmware) - it's a CLONE of our Master Netbackup PROD server (same ip - same hostname)
one N5000 with FW 1.4.1.1
one N5020 with FW 1.4.1.1
 
 
we have a SLP in our PROD env which manage catalog backup: 
1) catalog is backupped on N5020 and 
2) then is duplicated on N5000
 
in our DR site we have tried to restore one catalog backup from N5000 in this way:
 
1.       Create the disk storage server for the storage using the disk storage server wizard on the master server you are recovering to
2.       Create the disk pool for the storage using the disk pool wizard
 
at this point our Netbackup Master DR Clone can see N5000.
 
Now, we ran the command ‘nbcatsync -sync_dr_file <disaster recovery file>’ to synchronize the disaster recovery file to the new disk pool:
 
All media resources were located
   Primary Disk Media Id      Current Disk Media Id
   =====================      =====================
   @aaaaO                     @aaaab
 
Once these steps were completed we ran the ‘bprecover -wizard -copy N’ command. (on our N5000 there is a "-copy 3")
 
[root@srvnbms01 ~]# bprecover -wizard -copy 3
 
Welcome to the NetBackup Catalog Recovery Wizard!
 
Please make sure the devices and media that contain catalog disaster recovery
data are available
Are you ready to continue?(Y/N)
y
 
Please specify the full pathname to the catalog disaster recovery file:
/dati/INFO_DR/Catalog_bck_1336042970_FULL
srvnbms01.client.it_1336042970
All media resources were located
 
To recover the FULL catalog including the relational database (NBDB), select (F),
to recover the PARTIAL catalog including only the NetBackup catalog image
and configuration files as designated by the disaster recovery file, select (P), or
to recover only the RELATIONAL database (NBDB), select (R).
F
 
Catalog recovery is in progress. Please wait...
Import phase 1 started Tue 29 May 2012 05:49:00 PM CEST
INF - Create DB information for path @aaaab.
INF - Initiation of bpdm process to phase 1 import path @aaaab was successful.
 
the restore of images finished successfully.
 
then, in "activity monitor" started the Import job, but after 2hrs it gave us a lor of errors:
 
05/29/2012 18:48:16 - begin Import
05/29/2012 18:48:17 - requesting resource @aaaab
05/29/2012 18:48:17 - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=N5000_Dedupe_Pool;Path=PureDiskVolume;StorageServer=nba-5000.client.it;MediaServer=srvnbms01.client.it
05/29/2012 18:48:18 - Info bpdm (pid=11944) started
05/29/2012 18:48:18 - started process bpdm (pid=11944)
05/29/2012 21:16:32 - Critical bpdm (pid=11944) sts_read_image failed: error 2060017 system call failed
05/29/2012 21:16:32 - Critical bpdm (pid=11944) image read failed: error 2060017: system call failed
05/29/2012 21:16:32 - Error bpdm (pid=11944) ERR - Invalid disk header file prdcsdb01.client.it_1333304216_C2_HDR_R3, skipping.
05/29/2012 21:16:32 - Error bpdm (pid=11944) ERR - Orphaned image fragment prdcsdb01.client.it_1333304216_C2_F1_R3, skipping.
05/29/2012 21:16:38 - Critical bpdm (pid=11944) sts_read_image failed: error 2060017 system call failed
05/29/2012 21:16:38 - Critical bpdm (pid=11944) image read failed: error 2060017: system call failed
05/29/2012 21:16:38 - Error bpdm (pid=11944) ERR - Invalid disk header file prdcsdb01.client.it_1333305469_C2_HDR_R1, skipping.
05/29/2012 21:16:38 - Error bpdm (pid=11944) ERR - Orphaned image fragment prdcsdb01.client.it_1333305469_C2_F1_R1, skipping.
05/29/2012 21:16:43 - Critical bpdm (pid=11944) sts_read_image failed: error 2060017 system call failed
05/29/2012 21:16:43 - Critical bpdm (pid=11944) image read failed: error 2060017: system call failed
05/29/2012 21:16:43 - Error bpdm (pid=11944) ERR - Invalid disk header file prdcsdb01.client.it_1333333012_C2_HDR_R2, skipping.
05/29/2012 21:16:43 - Error bpdm (pid=11944) ERR - Orphaned image fragment prdcsdb01.client.it_1333333012_C2_F1_R2, skipping.
05/29/2012 21:16:54 - Critical bpdm (pid=11944) sts_read_image failed: error 2060017 system call failed
05/29/2012 21:16:54 - Critical bpdm (pid=11944) image read failed: error 2060017: system call failed
...
 
what am i missing?
how can i solve this problem?
 
thank you in advance, any tip is welcome!!!
 
Alberto
 
11 REPLIES 11

Jeff_Foglietta
Level 5
Partner Accredited Certified

Are you still having this problem Alberto? I'm curious as to why you would get an invalid header error. Has symantec upport been able to solve this?

Alberto_Colombo
Level 4
Partner Accredited

Hi Jeff,

thank you for your reply.

after trying to do a DR restore as described in my first post, we have had no chance to try one more time.

as soon as we'll have another chance to do a DR test, i want to do as follow:

  • i've find out (and i want to try it out when we will do another test) that i can also use this parameter:

          ALT_RESTORE_COPY_NUMBER

          to specify to use a copy number different from the primary one. i want to see if it can help.

  • use a different backup catalog to restore (even if there was NO evidence that the one used last time was in any case corrupted)
  • if neither of these options will solve our problem, and if we'll have the very same error, i'll open a case with Symantec support.

i'll post here when i'll have some news, 

regards,

Alberto

 

Mark_Solutions
Level 6
Partner Accredited Certified

Alberto

There is a posssible issue for you here so I will explain my thinking about what may be happening ...

The first part of the catalog completes its restore successfully - this is the images part - it does this by reading the disk images, not by reading anything from the catalog as it doesn't have one!

Once that completes it comes to do the second phase of the restore which is the NetBackup databases - but by this time is does have all of the catalog images restored so uses those as the reference for the restore - however they now refer completely to the old system so calls the wrong files / ID's / media server etc.

So it may be that you have to do the following:

1. select to restore images only first during your catalog recovery

2. once that completes do a nbcatsync so that all images and fragements are correct

3. Restore the databases - you can use bprecover -r - nbdb to do that.

This tech note pretty much says the same thing:

http://www.symantec.com/docs/TECH127922

Hope this helps

Alberto_Colombo
Level 4
Partner Accredited

 

Hi Mark,
thank you for your reply!
for sure i'll try to follow your advices - thank to your explanation i've better understood the use of nbcatsync utility and the need to do a 2 step restore.
As soon as i'll try one more time to do a DR restore, i'll post here if it will have solved my problem!
 
Regards,
Alberto

Alberto_Colombo
Level 4
Partner Accredited

still in trouble here:

1) i've restored Netbackup catalog images (using bprecover -wizard -copy 3): everything fine!

images were successfully restored

2) i've used nbcatsync -sync_dr_file <full path to dr file>

it changed the disk media id

 

# nbcatsync -sync_dr_file /dati/INFO_DR/Catalog_bck_1340794951_FULL
All media resources were located
   Primary Disk Media Id                Current Disk Media Id
   =====================      =====================
   @aaaaO                                  @aaaab
 

3) trying bprecover -r -nbdb or bprecover -wizard -copy 3 (using R option) fail because it still want to use the PROD appliance:

 

 

begin Restore
06/28/2012 15:11:10 - restoring from image srvnbms.client.it_1340794840
06/28/2012 15:11:10 - requesting resource @aaaaM
06/28/2012 15:11:10 - Error nbjm (pid=20906) NBU status: 2067, EMM status: Disk volume not found
06/28/2012 15:11:10 - Error nbjm (pid=20906) NBU status: 2067, EMM status: Disk volume not found
06/28/2012 15:11:11 - Info bpbrm (pid=27107) srvnbms.client.it is the host to restore to
06/28/2012 15:11:11 - Info bpbrm (pid=27107) reading file list from client
06/28/2012 15:11:11 - Info bpbrm (pid=27107) connecting to bprd to get file list
06/28/2012 15:11:11 - connecting
06/28/2012 15:11:11 - Info bpbrm (pid=27107) starting bptm
06/28/2012 15:11:11 - Info tar (pid=27113) Restore started
06/28/2012 15:11:11 - connected; connect time: 0:00:00
06/28/2012 15:11:11 - Info bpbrm (pid=27107) bptm pid: 27114
06/28/2012 15:11:11 - Info bptm (pid=27114) start
06/28/2012 15:11:11 - started process bptm (pid=27114)
06/28/2012 15:11:11 - Info bpdm (pid=27114) reading backup image
06/28/2012 15:11:11 - Info bptm (pid=27114) using 128 data buffers
06/28/2012 15:11:11 - Error bptm (pid=27114) NBJM returned an extended error status: Disk volume not found (2067)
06/28/2012 15:11:11 - requesting resource @aaaaM
06/28/2012 15:11:11 - Error nbjm (pid=20906) NBU status: 2067, EMM status: Disk volume not found
06/28/2012 15:11:11 - Error nbjm (pid=20906) NBU status: 2067, EMM status: Disk volume not found
06/28/2012 15:11:12 - Critical bptm (pid=27114) sts_get_cred failed: error 3 emmerr 254 
06/28/2012 15:11:12 - Critical bptm (pid=27114) failure to open sts for storage server nba-5020.client.it: plug-in reports error 2060001 one or more invalid arguments
06/28/2012 15:11:12 - Info bptm (pid=27114) EXITING with status 83 <----------
06/28/2012 15:11:17 - Info tar (pid=27113) done. status: 83: media open error
06/28/2012 15:11:17 - Error bpbrm (pid=27107) client restore EXIT STATUS 83: media open error
06/28/2012 15:11:17 - restored from image srvnbms.client.it_1340794840; restore time: 0:00:07
06/28/2012 15:11:17 - end Restore; elapsed time 0:00:08
media open error  (83)
 
the catalog backup copy is done via SLP as follow
in our DR file i've references to N5020 (the prod appliance where the primary backup copy is), to N5000 (where there is the duplicated backup copy), and to a tape library, where we do an inline backup also.
i've purged the DR file from any references to N5020 or to tape library, but nothing changed....
i'm running out of ideas, any help is appreciated!!!
regards,
Alberto

chashock
Level 6
Employee Accredited Certified

This might be way over-simplifying your issue, but have you tried restarting EMM?

Alberto_Colombo
Level 4
Partner Accredited

restarted netbackup services / rebooted the whole server (just to be sure), but nothing changed:

 

# bprecover -r -nbdb
 
Beginning recovery of NBDB.  Please wait...
ERR - Failed to recover NBDB on srvnbms (5)

tarangdce
Level 4
Partner

Try commenting ur primary disk media line in DR file.

Alberto_Colombo
Level 4
Partner Accredited

 

Hi All,

we decided to try another DR test, following a more specific Symantec technote:

http://www.symantec.com/business/support/index?page=content&id=HOWTO32925

as in the other test, we managed to have our backup images restored on local filesystem.

The steps we’ve done are:

1.      nbcatsync -sync_dr_file <disaster recovery file>

2.      bprecover -wizard -copy 3 (we selected P option, to recover the PARTIAL catalog including only the NetBackup catalog image)

we restored almost 65GB data in ~ 1h

3.      nbcatsync -backupid image_id (To fix the disk media IDs in the image headers)

i’ve read here:

http://www.symantec.com/business/support/index?page=content&id=TECH160521

that nbcatsync is not recommended in our case (in DR, Netbackup Master Clone can just see N5000)

after ~18hrs, nbcatsync if far far away from finishing its job (i can see it progressing in a log file under admin directory)

 

we are not sure that (when nbcatsync will finish its job) we can restore NBDB, and then start restore our data.

But even if that would be possible, this is a really slow solution, obviously we cannot think to wait days before starting to restore our data.

 

Moreover, under /usr/openv/netbackup/db/images we have 107, while our DR goal is to protect just 10 clients.

So, 2 questions:

1.      Is it a way to prune the NBDB catalog in advance (that is to say, before restoring it in DR site), so that we will be able to just restore a subset of data related with just the clients we need to protect?

(of course, when i’ve written about “pruning the catalog”, i was referring to the catalog that is to be restored in DR site, PROD catalog has to remain untouched!)

2.      If we make a NBDB online backup like this (on Netbackup Master PROD):

nbdb_backup -dbn NBDB -online /root/backup/backup_1 -truncate_tlog

and then restore it to Netbackup Master Clone, is it still required to To fix the disk media IDs in the image headers using nbcatsync?

Is there something else we can do?

 

regards,

Alberto

Alberto_Colombo
Level 4
Partner Accredited

after almost 48hrs, nbcatsync exited with this error:

[root@nbms01 INFO_DR]# nbcatsync -backupid nbms01.client.it_1343473367
nbcatsync: failure occurred while locating image: prderdb01.client.it_1338862941
nbcatsync: EXIT STATUS 114
nbcatsync: unimplemented error code 114 (114)
You have new mail in /var/spool/mail/root
[root@nbms01 INFO_DR]# 

Alberto_Colombo
Level 4
Partner Accredited

one more problem - it seems having a catalog DB restored has become quite impossible with Symantec Appliances no

when we executed this command:
nbcatsync -backupid image_id

that changed our N5000 MediaID!!!

In fact, now in our PROD env no SLP using N5000 works anymore, because in NB catalog N5000 has @aaaaX as it's MediaID, while when we execute this command (again on NB Master Prod):

[root@srvnbms01 ost-plugins]# nbdevquery -listdv -stype PureDisk -dp N5000_Dedupe_Pool -U
Disk Pool Name      : N5000_Dedupe_Pool
Disk Type           : PureDisk
Disk Volume Name    : PureDiskVolume
Disk Media ID       : @aaaaY
Total Capacity (GB) : 16747.03
Free Space (GB)     : 6981.75
Use%                : 58
Status              : DOWN
Flag                : ReadOnWrite
Flag                : AdminDown
Flag                : InternalDown
Num Read Mounts     : 0
Num Write Mounts    : 1
Cur Read Streams    : 0
Cur Write Streams   : 0

How can we fix the disk media ID so that SLP will be working again?

 

thank you,