cancel
Showing results for 
Search instead for 
Did you mean: 

NetBackup 7.6.0.1 status 12 on duplication jobs

m_karampasis
Level 4
Partner Accredited

Dear all,

We are experiencing the following issue on a specific media server most of the duplication jobs are failing with the following status:

05/22/2015 10:06:23 - requesting resource LCM_nbumsrv2_ALL_LTO
05/22/2015 10:14:32 - Info nbrb (pid=27590880) Limit has been reached for the logical resource LCM_nbumsrv2_ALL_LTO
05/27/2015 18:16:47 - granted resource  LCM_nbumsrv2_ALL_LTO
05/27/2015 18:16:48 - begin Duplicate
05/27/2015 18:16:48 - Error bpduplicate (pid=37880058) cannot open /usr/openv/var/global/nbstserv/bid_file_rs_1432278383_0_0, A file or directory in the path name does not exist.
05/27/2015 18:16:48 - end Duplicate; elapsed time 0:00:00
05/27/2015 18:16:48 - started process RUNCMD (pid=37880058)
file open failed  (12)

 

Thank you in regards.

25 REPLIES 25

mikebounds
Level 6
Partner Accredited

One of my duplications has just finished and there are no other duplication jobs queued, and the file was around for a minute or 2 after the job completed, but now it is gone, so I would guess that if you still have files in /usr/openv/var/global/nbstserv it would be because SLP is still active as nbstlutil shows - did you verify that imageids that are NOT in /usr/openv/var/global/nbstserv have "inactive" set to true.

Mike

m_karampasis
Level 4
Partner Accredited

Dear all,

We tend to believe that Marianne is right, all the failed duplication jobs are after 5 days and some hours. Attached you will find the detailed status from many failed jobs.

Unfortunately the unified logging for 226 does not show any think. Attached you will find the log.

@Mike we cannot find image_id's in order to run the command.

BR,

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

You need nbstserv logs for 22 and 23 May. Looking in today's log won't help.

Were all of the jobs that you show in the attachment in queued or active state for all of those days?

SIX days in this case!!

05/23/2015 21:33:09 - requesting resource LCM_nbumsrv2_ALL_LTO
05/23/2015 21:37:24 - Info nbrb (pid=27590880) Limit has been reached for the logical resource LCM_nbumsrv2_ALL_LTO
05/29/2015 05:57:49 - begin Duplicate 

What is weird here is that Martin's testing showed that SLP duplication failed with an error when no resources are available.

Do you have other duplication jobs in queued or active state waiting for resources?

You really need to have a good look at duplication resources  - you should never have jobs waiting for tape drives this long.

As per my post of yesterday - work through the SLP Best Practice Guide to assess and tune your environment.
All duplication jobs should be finished before next backup window starts.

 

mph999
Level 6
Employee Accredited

By default, SLP retires a duplictation multiple times - I think it was 10 minute intervals for 5 or 6 attempts, then after that it tries something like 8 hrs later, though this is from memory so may not be exact.

For each attempt, it creates a new BID file so the old BID files for previous attempts should not be required, so if they exsist or not, it doesn't matter.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi Martin

I think you meant retries, not retires....

If I am not mistaken - the retries only come into effect if intial attempt was unsuccessful, right?

Here we do not see a failure.
The duplication is started, resources are requested, but only receive resources 5 - 6 days later.
It seems that initial BID files have been removed in the meantime.

Either a bug (as initial job should have failed) or something else going on that we don't have insight into....

mph999
Level 6
Employee Accredited

Yes, I can't spell ....

Ahh, no failure, ok that changes things a bit ...

Any particular reason we are waiting 5 days for resources ?

I'll have to look to see if there is a timout - not seen this, but then again, not often we are waiting 5 days ...