02-02-2013 05:07 AM
After rebooting my NBU servers, I have over a thousand Duplication jobs trying to run.
A little info: We have 2 DD670's. Half backups go to one, half to the other, then each backup duplicates to the other DD670 after the backup completes.
All that shows up for the job is this:
On the Job Overview Tab:
Job Type: Duplication
Master Server: <server name>
Job Policy: SLP_LCP_DD02_Weekly
Job Schedule: Dup
Priority: 0
On the Detailed Status Tab:
Nothing at top - all fields blank
in Status:
2/2/2013 1:46:29 AM - requesting resource LCM_dd01-su
2/2/2013 1:36:35 AM - Info nbrb(pid=3248) Limit has been reached for the logical resource LCM_dd01-su
I have over 1500 sitting in queue like this? How do I keep these from launching? If I cancel them and clean them all up, 5-10 min later they all kick in again... It seems even though they run and complete, they just queuue up again and run again.
Baffled......
Thanks,
John
Solved! Go to Solution.
02-02-2013 10:21 AM
it looks like, Max I/O streams is the problme.
you did set the Max concurent jobs in Storage unit is 90 for each.
and also Max I/O streams for each disk pools is 90.
so at any point of time only 90 streams can be active for each disk Pool, but as you specified 90 in Storage units, from eaah SLP source is allocating 90 streams, destination is 0 results all are in Queue.
its like:-
for DD01_SU
MaX I/O stream in disk pool =90
Max jobs in disk stu = 90
for DD02_SU
MaX I/O stream in disk pool =90
Max jobs in disk stu = 90
so when duplication starts for DD01 LSP. all these are gettting allocated at source end, DD01 SLP taking 90 at source and noting left for the SLP DD02 jobs results queue jobs.
its same way for DD02 SLP.
its a dead lock situation.
3 ways to come out, and first cancle all Duplicate jobs, and impliment one of the below.
1) Reduce the Max jobs count in each STU.
or
2) Increase I/O at Disk pool (not recommented as its DD670, might not be albe to handle more )
or
3)Deactive one SLP untill other SLP gets compleated.
02-02-2013 05:19 AM
2/2/2013 1:36:35 AM - Info nbrb(pid=3248) Limit has been reached for the logical resource LCM_dd01-su
1)do you see any other active jobs for LCM_dd01-su?
2) what are the max concurrent jobs for the stu LCM_dd01-su?
3) what is the max I/O streams for the disk pool assiciated with STU LCM_dd01-su?
4) what is the output of /usr/openv/netbackup/bin/admincmd/nbstlutil report
"It seems even though they run and complete, they just queuue up again and run again." definatlly it should not be, unless they defined in SLP for one more copy.
02-02-2013 06:46 AM
02-02-2013 08:28 AM
hi,
called dd02_su and dd02_su and valid duplication jobs do not request this su., ---> I assume those are dd01_su and dd 02_su, correct me if I am wrong.
please let us know the 2 SLP names that are in Issue.
please porovide the below output
nbstl -L -all_versions
/usr/openv/netbackup/bin/admincmd/nbstlutil report ---> is the command just run it.'
bpstulist -label <storage unit name> ---> for both stoarage unit name
please provide these attachments as attachment.
02-02-2013 09:00 AM
02-02-2013 09:09 AM
hi,
i am still looking for
the 2 SLP names that are in Issue.
/usr/openv/netbackup/bin/admincmd/nbstlutil report ---> is the command just run it.'
bpstulist -label <storage unit name> ---> for both stoarage unit name
and also the details status of latest Dupliction job.
02-02-2013 09:12 AM
02-02-2013 09:17 AM
02-02-2013 10:21 AM
it looks like, Max I/O streams is the problme.
you did set the Max concurent jobs in Storage unit is 90 for each.
and also Max I/O streams for each disk pools is 90.
so at any point of time only 90 streams can be active for each disk Pool, but as you specified 90 in Storage units, from eaah SLP source is allocating 90 streams, destination is 0 results all are in Queue.
its like:-
for DD01_SU
MaX I/O stream in disk pool =90
Max jobs in disk stu = 90
for DD02_SU
MaX I/O stream in disk pool =90
Max jobs in disk stu = 90
so when duplication starts for DD01 LSP. all these are gettting allocated at source end, DD01 SLP taking 90 at source and noting left for the SLP DD02 jobs results queue jobs.
its same way for DD02 SLP.
its a dead lock situation.
3 ways to come out, and first cancle all Duplicate jobs, and impliment one of the below.
1) Reduce the Max jobs count in each STU.
or
2) Increase I/O at Disk pool (not recommented as its DD670, might not be albe to handle more )
or
3)Deactive one SLP untill other SLP gets compleated.
02-02-2013 10:46 AM
02-02-2013 10:51 AM
02-02-2013 11:50 AM
LCM_dd01-su is acatully mean of dd01-su Storage unit.
84 might becuase of the large number of I/O streams.
you will see the image ID , once its start replicating..
first think that you need to do is, recude the Max concenret jobs in STU., may be to 40 ro 50. for both stoarge untis.
cancle all duplication jobs in activity moniter, and let them start again.
and see how they are moving.
----------
PS:- if you do not want them to duplicate , you can cancle them using nbstlutil command.
02-02-2013 01:01 PM
02-18-2013 12:27 PM
Duplication jobs have finally finished after 3 solid days of running. Everything is caught up and back to normal.