cancel
Showing results for 
Search instead for 
Did you mean: 

NDMP Backup Failures

Jig
Level 3
Recently we have been having problems with the NDMP backups.  Daily backups would fail but Weekly full backups would be ok.  However since yesterday no backups work at all.  We are trying to backup a Net App Filer with around 2tb of data.    Below there is 2 errors that we get
 
1:
 
19/04/2007 09:34:11 - requesting resource aylbackup3-hcart3-robot-tld-0-aylav1file1
19/04/2007 09:34:11 - requesting resource aylbackup3.NBU_CLIENT.MAXJOBS.aylav1file1
19/04/2007 09:34:11 - requesting resource aylbackup3.NBU_POLICY.MAXJOBS.Aylav1file1_VIRUS
19/04/2007 09:34:11 - Error nbjm(pid=3532) NBU status: 800, EMM status: All compatible drive paths are down, but media is available
resource request failed(800)
 
2:
 
19/04/2007 09:12:00 - requesting resource aylbackup3-hcart3-robot-tld-0-aylav1file1
19/04/2007 09:12:00 - requesting resource aylbackup3.NBU_CLIENT.MAXJOBS.aylav1file1
19/04/2007 09:12:00 - requesting resource aylbackup3.NBU_POLICY.MAXJOBS.Aylav1file1_VIRUS
19/04/2007 09:12:01 - granted resource aylbackup3.NBU_CLIENT.MAXJOBS.aylav1file1
19/04/2007 09:12:01 - granted resource aylbackup3.NBU_POLICY.MAXJOBS.Aylav1file1_VIRUS
19/04/2007 09:12:01 - granted resource 0344L3
19/04/2007 09:12:01 - granted resource IBM.ULT3580-TD3.000
19/04/2007 09:12:01 - granted resource aylbackup3-hcart3-robot-tld-0-aylav1file1
19/04/2007 09:12:05 - started process bpbrm (3168)
19/04/2007 09:12:05 - connecting
19/04/2007 09:12:06 - connected; connect time: 00:00:01
19/04/2007 09:12:10 - mounting 0344L3
19/04/2007 09:12:20 - Error bptm(pid=4732) error requesting media, TpErrno = Robot operation failed    
19/04/2007 09:12:21 - Warning bptm(pid=4732) media id 0344L3 load operation reported an error    
19/04/2007 09:12:21 - current media 0344L3 complete, requesting next media Any
19/04/2007 09:12:50 - granted resource 0249L3
19/04/2007 09:12:50 - granted resource IBM.ULT3580-TD3.000
19/04/2007 09:12:50 - granted resource aylbackup3-hcart3-robot-tld-0-aylav1file1
19/04/2007 09:12:51 - mounting 0249L3
19/04/2007 09:19:10 - Error bptm(pid=4732) error requesting media, TpErrno = Robot operation failed    
19/04/2007 09:19:11 - Warning bptm(pid=4732) media id 0249L3 load operation reported an error    
19/04/2007 09:19:11 - current media 0249L3 complete, requesting next media Any
19/04/2007 09:19:41 - Error bptm(pid=4732) NBJM returned an extended error status: resource request failed (800)  
19/04/2007 09:19:41 - end writing
resource request failed(800)
19/04/2007 09:19:46 - Error ndmpagent(pid=4476) terminated by parent process        
 
I have rebooted the server, all the drives are up and there are plenty of tapes and disk space to use.
 
I am new to Netbackup so please keep it simple. 
 
Thanks in Advance.
 
 
1 ACCEPTED SOLUTION

Accepted Solutions

Nathan_Kippen
Level 6
Certified

I've had simliar problems in my enviornment.  We have a EMC Celerra NDMP device.

What was happening to me was that a job would run, and if the job failed, then a lot of the times there was a disconnect in communication between the filer and the tape drive.  The filer would refuse to use the tape drive for whatever reason, but the tape drive was still usable by other media servers (with SSO).

The solution, although not very pratical, was to reboot the data mover (celerra), and that would "reset" the communication between the filer and the drive. 

So in short, you might want to try rebooting your nas device to get the drive talking to it again.

The solution that I actually ended up with is doing a 3-way NDMP backup via the media server.  I've had a lot more success doing it that way, and to be truthful, the backup speeds are just as fast (for me).



View solution in original post

9 REPLIES 9

Alexander_Harri
Level 4
Problem 1 is caused by problem 2.

When Netbackup receives a certain # of drive errors within a time period (2 in 24 hours, I think), it sets the drive to the 'Down' status. This prevents Netbackup from using it to do jobs.

When a job can't find any storage units that it is allowed to run jobs with, it fails with error code 800.

I can't help you with problem 2, we're still working on it ourselves. Your best bet would be to start contacting your vendors (Symantec, Netapp, etc) and working through it with them.

Thajwas__
Level 4
Hi Jig,
 
Are you able to carry the inventory on the library. And what about the non-ndmp backups, are those successfull.
Also verify the o/p of below commands
tpautoconf -probe
vmoprcmd -d
 
thanks
thajwas

Jig
Level 3
Hi Thajwas
 
Yes I can do a inventory.
 
Normally the non ndmp jobs are successful apart from last night when the npem service failed.  This is fairly a frequent issue which is outstanding with symantec. 
 
VMOPRCMD -D returns the result below. 
tpautoconf -probe does not return anything. 
 
E:\VERITAS\Volmgr\bin>vmoprcmd -d
                               PENDING REQUESTS
                                    <NONE>
                                 DRIVE STATUS
Drv Type   Control  User      Label  RecMID  ExtMID  Ready   Wr.Enbl.  ReqId
 0 hcart3   TLD                -                     No       -         0
 1 hcart3   TLD                -                     No       -         0
 1 hcart3   TLD                -                     No       -         0
 2 hcart3   TLD                -                     No       -         0
 3 hcart3   TLD                -                     No       -         0
 4 hcart3   TLD                -                     No       -         0
                            ADDITIONAL DRIVE STATUS
Drv DriveName            Shared    Assigned        Comment
 0 LTO3_DRIVE1           No       -
 1 IBM.ULT3580-TD3.000   No       -
 1 IBM.ULT3580-TD3.000   No       -
 2 LTO3_DRIVE3           No       -
 3 LTO3_DRIVE4           No       -
 4 LTO3_DRIVE5           No       -
E:\VERITAS\Volmgr\bin>

Thajwas__
Level 4
Hi Jig,
 
Kindly verify the below o/p`s
 
tpautoconf -probe <ndmp_host_name>
tpautoconf -verify <ndmp_host_name>
 
on filer sysconfig -t
 
Thanks
Thajwas

Jig
Level 3
Hi
 
After some lenghty discussions with Symantec it was decided we needed to reload netbackup.  We done this yesterday and I have now disabled our NDMP job as I think it was causing all the other jobs to fail.  Once I have a full backup of all the other servers I will start to look into this again.
 
Thanks

Srikanth_Gubbal
Level 6
Certified

hi,

i have same problem, have you got any resolution.

regards,
Srikanth.

Darren_Dunham
Level 6
Srikanth, you may not have exactly the same problem.  It's probably better if you start a new thread for your issue.

Show any NBU logs from the failure, and backup logs from the NDMP host.  What version of NBU are you running?  What is your NDMP host and OS?  Are you doing direct, remote, or 3-way NDMP backups?

--
Darren

Nathan_Kippen
Level 6
Certified

I've had simliar problems in my enviornment.  We have a EMC Celerra NDMP device.

What was happening to me was that a job would run, and if the job failed, then a lot of the times there was a disconnect in communication between the filer and the tape drive.  The filer would refuse to use the tape drive for whatever reason, but the tape drive was still usable by other media servers (with SSO).

The solution, although not very pratical, was to reboot the data mover (celerra), and that would "reset" the communication between the filer and the drive. 

So in short, you might want to try rebooting your nas device to get the drive talking to it again.

The solution that I actually ended up with is doing a 3-way NDMP backup via the media server.  I've had a lot more success doing it that way, and to be truthful, the backup speeds are just as fast (for me).