05-28-2015 12:26 AM
Dear all,
We are experiencing the following issue on a specific media server most of the duplication jobs are failing with the following status:
05/22/2015 10:06:23 - requesting resource LCM_nbumsrv2_ALL_LTO
05/22/2015 10:14:32 - Info nbrb (pid=27590880) Limit has been reached for the logical resource LCM_nbumsrv2_ALL_LTO
05/27/2015 18:16:47 - granted resource LCM_nbumsrv2_ALL_LTO
05/27/2015 18:16:48 - begin Duplicate
05/27/2015 18:16:48 - Error bpduplicate (pid=37880058) cannot open /usr/openv/var/global/nbstserv/bid_file_rs_1432278383_0_0, A file or directory in the path name does not exist.
05/27/2015 18:16:48 - end Duplicate; elapsed time 0:00:00
05/27/2015 18:16:48 - started process RUNCMD (pid=37880058)
file open failed (12)
Thank you in regards.
Solved! Go to Solution.
05-28-2015 04:53 AM
One of my duplications has just finished and there are no other duplication jobs queued, and the file was around for a minute or 2 after the job completed, but now it is gone, so I would guess that if you still have files in /usr/openv/var/global/nbstserv it would be because SLP is still active as nbstlutil shows - did you verify that imageids that are NOT in /usr/openv/var/global/nbstserv have "inactive" set to true.
Mike
05-29-2015 02:59 AM
Dear all,
We tend to believe that Marianne is right, all the failed duplication jobs are after 5 days and some hours. Attached you will find the detailed status from many failed jobs.
Unfortunately the unified logging for 226 does not show any think. Attached you will find the log.
@Mike we cannot find image_id's in order to run the command.
BR,
05-29-2015 03:26 AM
You need nbstserv logs for 22 and 23 May. Looking in today's log won't help.
Were all of the jobs that you show in the attachment in queued or active state for all of those days?
SIX days in this case!!
05/23/2015 21:33:09 - requesting resource LCM_nbumsrv2_ALL_LTO 05/23/2015 21:37:24 - Info nbrb (pid=27590880) Limit has been reached for the logical resource LCM_nbumsrv2_ALL_LTO 05/29/2015 05:57:49 - begin Duplicate
What is weird here is that Martin's testing showed that SLP duplication failed with an error when no resources are available.
Do you have other duplication jobs in queued or active state waiting for resources?
You really need to have a good look at duplication resources - you should never have jobs waiting for tape drives this long.
As per my post of yesterday - work through the SLP Best Practice Guide to assess and tune your environment.
All duplication jobs should be finished before next backup window starts.
06-01-2015 02:29 AM
By default, SLP retires a duplictation multiple times - I think it was 10 minute intervals for 5 or 6 attempts, then after that it tries something like 8 hrs later, though this is from memory so may not be exact.
For each attempt, it creates a new BID file so the old BID files for previous attempts should not be required, so if they exsist or not, it doesn't matter.
06-01-2015 03:43 AM
Hi Martin
I think you meant retries, not retires....
If I am not mistaken - the retries only come into effect if intial attempt was unsuccessful, right?
Here we do not see a failure.
The duplication is started, resources are requested, but only receive resources 5 - 6 days later.
It seems that initial BID files have been removed in the meantime.
Either a bug (as initial job should have failed) or something else going on that we don't have insight into....
06-01-2015 04:25 PM
Yes, I can't spell ....
Ahh, no failure, ok that changes things a bit ...
Any particular reason we are waiting 5 days for resources ?
I'll have to look to see if there is a timout - not seen this, but then again, not often we are waiting 5 days ...