cancel
Showing results for 
Search instead for 
Did you mean: 

Backup server crashed, need recovery images

dterm
Level 3

Hello, 

I have an urgent situation, so I need guidance and if anyone can help me out with considerations I would appreciate it. You will probably think how can I be such a situation without planning backup of the backup server, but I never got to it and I am in this situation. 

Background: I have a Windows 2003 x32 Std Ed. backup server in a secure environment which just crashed. We recently implemented NetBackup 6.5.4 in that environment and were backing up physical and virtual machines. We were planning to install another backup server next year, so no redundancy now. I don't have an image or backup of the backup server. Basically, I was trying to restore a VM for one my customer and while it was performing a restore the backup server rebooted automatically. Upon reboot, it was stuck at hal.dll file missing or corrupt. (I had ton of issues tryig to view the console including kvm being broken, that's besides the point) I couldn't insert any Windows 2003 cd to get to recovery console, tried many things - checked bios, different cds but it bypassed to hard drive and I got that hal.dll missing or corrupt error and to replace the missing file. Anyway, so after talking with Dell and another third party, I am left to myself. Hardware tests were fine on cd-rom, hard drives and memory. One thing I noticed was when I used F8 to go to safe mode, it asked me to restore from a list of OS and it was Windows 2003 x64 Datacenter edition. That was the customer's VM I was restoring, so I am afraid it might have overwritten my backup server's system files. However, I was using it and was fine until that auto reboot. I chose to restore using "restore from original location" and had seen the help menu before choosing that option, but didn't find anything that would have caused overwriting system32 files on the backup server. Third party support for netbackup told me it couldn't have overwritten or netbackup would have crashed and not rebooted the os, but I think he is wrong. I think the server might have had a system failure causing the reboot. It could also be completely something else as it has been in the past with McAfee hips and antivirus product. Anyway...

Questions: 1) Am I in a no-win situation here? Can I recover this backup server without a true backup?

2) If I can't recover, am I going to be able to recover the images on the tape. I had the tapes in the library for all the backups including catalog backups. If I rebuild Windows and reinstall NetBackup, how do I recover my inventory of media and images on tape. All that will be lost and I never got chance to remove my catalog backup from the tape library before the crash, so I am not sure what's the media label/number that has the catalog backup. 

3) If I am able to recover in anyway, then do I need to have the same Windows? Can I install Windows 2008 x32 or x64 and may be keep NetBackup 6.5.4 still on the server to recover images if it's possible? I know I can't upgrade to NetBackup 7 and assume I will be able to recover; if I have to rebuild and it's Windows 2008 rather than the original OS of the server then I wonder if it would work with the same NetBackup version as original.

I would appreciate any help, links to pdfs, etc.

Thank you.

17 REPLIES 17

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

You can reinstall and recover backup server following appropriate procedure. Here is an overview.

  1. Re-install Windows and patches.
    It is better to use new disk to keep current disk data. You might need data on current disk later(ex. data in disk storage unit, disaster recovery insformation file).
  2. Install and configure netbackup server with same hostname.
  3. Apply same patch(6.5.4) to NetBackup.
  4. Configure tape devices.
  5. inventory tapes. Don't forget to configure Media ID generation Rule before.
  6. Perform catalog recovery. If you have disaster recovery information, no need to determine which tapes are used for catalog backup. Media IDs are recoreded in DR info.
    If you lost DR info, or catalog backups are performed as offline catalog backup, you have to determine which tapes are used.

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

Troubleshooting Guide details disaster recovery procedure.
http://www.symantec.com/docs/TECH52829

Catalog recovery from an online backup - page 540
Catalog recovery from offline backup - page 566

By the way, have you tried repaire install of Windows from cdrom?
# I'm not sure it is possible in this case. Just idea.

Marianne
Level 6
Partner    VIP    Accredited Certified

Hopefully you DO have a catalog backup to recover from....

If not, you will need to spend a couple of days (weeks if there are lots of tapes...) doing phase 1 imports of each tape in the robot, followed by phase 2 imports.

During this time you will not be able to use any of the tapes that are currently in the robot for backups because nobody knows which ones contain backups and which ones are free....

dterm
Level 3

 

If I do phase 1 and phase 2 import, I know it will take a long time for all the images that are on all the tapes to be imported. I would have to go one by one for media id, correct? Since I don't know which tapes are used and which are not, in case of no DR info. I have an old DR info, not the most recent one. I don't think the old DR info would do any good, since it won't know the new backups or images on tapes.

It makes me happy to know that all is not lost if I am able to do phase 1 and phase 2 import, and that would require me to have the same version of Windows and NetBackup, right? Windows 2003 and NetBackup 6.5.4. I probably can't use this incident to upgrade my software, since I am likely going to be rebuilding the os and reinstalling netbackup.

Thanks for the post.

 

http://www.symantec.com/business/support/index?page=content&id=TECH43584

dterm
Level 3

I have tried repair install from windows, but cd-rom doesn't recognize any windows cds (even original MS). So I can't get to recovery console. I tried different versions of Windows as well. It recognizes Dell management tools cd though, so the cd-rom works and it passed the hardware test in the Dell utility.

If I should install netbackup on new disks, I don't have any hardware for it. I would have to make a VM. Then, the question becomes of how do I still access the old hard drives on the current backup server that crashed?

Ok, so even without DR info I should be able to find out which tapes are used and check images on it to be able to restore in the future.

 

Thanks for the post.

Mark_Solutions
Level 6
Partner Accredited Certified

Do you have disk images on the old Master, or was it just a Master Server and not a Media Server as well?

If it was a Media Server as well then you will need to rebuild it - either on the same hardware or new hardware - and then install NBU 6.5 to the same location as the original was (So C:\Program Files\Veritas or where ever it was originally) and patch to 6.5.4.

Get your tape drive / library attached and configure that in Windows and then in NetBackup and inventory it.

If you dont have the latest DR file you will need to try and work out which tape was the latest catalog tape

If you dont have the DR file then follw the instructions in the Troubleshooting guide how to recreate that information - page 559 in the one i have.

At least you will then have your system back

If there were images on disk on the Master then perhaps you will be able to recover these at a later date, even if you have to clean up the catalogs and then copy them across and re-import them.

If you are really stuck seek professional services to help you out

Hope this helps

Mark_Solutions
Level 6
Partner Accredited Certified

just seen your latest post - if you know which tapes may have bee used as catalog tapes then you can use them to recreate the DR file - see my message above about the troubleshooting guide

Marianne
Level 6
Partner    VIP    Accredited Certified
You have 2 choices here - install W2003 (if you were on later NBU version, you could've gone W2008) install & patch to 6.5.4 & use the DR info available to recover, then start phase 1 import of all tapes that appear to be unassigned, followed by phase 2 import. Second choice: start new and import all tapes. Start new means use new hardware, new OS latest NBU version. No need to install same OS and NBU. You can start as many phase 1 imports as drives available. Just be sure to start phase 2 imports before phase 1's start expiring. E.g. If Dailys have ret of 1 week, do phase 2 import within 1 week. First option will also put back all your policies and should complete rest of imports in shorter time. All depends on size of environment and amount of tapes...

dterm
Level 3

Finally was able to recover using the DR file by installing parallel version of Windows and retrieving the latest DR file on the server. Everything went fine except all my backups are not running when scheduled. I have restarted the NetBackup services and the server for good measure as suggested in the troubleshooting guide. I can run manual backups successfully, but jobs error with status code 196 when scheduled.
 

I am researching to see if there is any command I have to run after recovering to start the jobs, I thought I read it somewhere but also read that restarting the services will do the trick, too. Any ideas?

Thanks for all of your posts. They were helpful.

Mark_Solutions
Level 6
Partner Accredited Certified

nbpemreq -resume_scheduling

Marianne
Level 6
Partner    VIP    Accredited Certified

The status 196 indicates that the backups were queued but never went active? If queued, can you see the reason in Job Details?

Also see if this TN helps: http://www.symantec.com/docs/TECH46212

Symptoms: Backup jobs fail to start at scheduled times after having issues with the master server "hanging". Backup jobs can be initiated manually. 

The condition is created when the job state in the pempersist file is not correctly written.
........

If a graceful shutdown and restart did not correct the problem, deletion of the pempersist file is required.
1.  Cancel all jobs (do not suspend them).
2.  Stop all NetBackup services on the master server
3.  Delete the pempersist file (found in /usr/openv/netbackup/bin/bpsched.d directory on a UNIX/Linux master server, or in <install_path>\veritas\netbackup\bin\bpsched.d directory on a Windows master server)

Note: Ensure all NetBackup activity, e.g. backup, restore, duplication, etc is complete before stopping services.

dterm
Level 3

 

For me, the command was nbpem -resume_scheduling. I ran it, though it didn't resolve the issue.

I will check on what you suggested, Marianne.

Mark_Solutions
Level 6
Partner Accredited Certified
Make sure that no corruption had crept into the schedules and that your media servers are not showing as deactivated Maybe recreate a policy or schedule to see if that one works or use nbpemreq -reread_policies (check this command in the commands guide as this is off the top of my head without reference) Hope this helps (did you restart NetBackup after the resume scheduling?)

dterm
Level 3


I read in one of the tech articles below it could be a dns resolution issue so I also placed the server info in the hosts file. It didn't resolve the issue either. I am not able to access the gui netbackup admin console because it says "unable to connect to the selected netbackup host x" and make sure the user has privileges, local host is listed in the server list of destination host x, there is valid network connection, check auth, and all services are running.

The server runs the scheduled backups jobs for a day or two and then stops. I have to reboot for now to access the gui and have the scheduled jobs run. The days it doesn't, it errs with 196 error.

I tried bpclntcmd -self and -pn commands according to one of the tech articles below and I noticed the -pn resolves to a 200 subnet when it supposed to be the 100 subnet. I have two interfaces: 100 and 200 subnets. I even disabled the 200 subnet connection and looked for configuration file similar bp.conf. I know the file is for Unix and I should be able to do the same configuration with host properties in NetBackup GUI, so I tried looking for a setting after the reboot to access the GUI but I don't see anything I can change that would resolve bpclntcmd -pn to the 100 subnet as a priority, if not just that subnet.

tech73955
tech73691
tech37496
tech1769

Thanks.

Marianne
Level 6
Partner    VIP    Accredited Certified

PLEASE open a support ticket with Symantec.

You have been battling with this for way too long now.
A support engineer will be able to 'see' a lot more than us if WebEx session can be established.

dterm
Level 3

I don't have the ability to provide support engineer a remote access into any of our networks as it is secured.

 

Thanks for everyone's help. I appreciate it.

Mark_Solutions
Level 6
Partner Accredited Certified

Have you put its own name (short or FQDN depending on what it looks like in the Admin Console) in its own hosts file against the 100 network so that it always works on that network

Handy things hosts files.

Also check it registry - especially the HKLM\Software\NetBAckup\CurrentVersion\Config\  key for anything that may not be quite right.