cancel
Showing results for 
Search instead for 
Did you mean: 

Duplication Jobs running wild

John_Deackman
Level 3

After rebooting my NBU servers, I have over a thousand Duplication jobs trying to run. 

A little info:  We have 2 DD670's.  Half backups go to one, half to the other, then each backup duplicates to the other DD670 after the backup completes.

 

All that shows up for the job is this:

On the Job Overview Tab:

Job Type: Duplication
Master Server:  <server name>
Job Policy: SLP_LCP_DD02_Weekly
Job Schedule: Dup
Priority: 0

On the Detailed Status Tab:

Nothing at top - all fields blank

in Status:
2/2/2013  1:46:29 AM - requesting resource LCM_dd01-su
2/2/2013  1:36:35 AM - Info nbrb(pid=3248) Limit has been reached for the logical resource LCM_dd01-su

 

I have over 1500 sitting in queue like this?  How do I keep these from launching?  If I cancel them and clean them all up, 5-10 min later they all kick in again...  It seems even though they run and complete, they just queuue up again and run again.

Baffled......

Thanks,
John

 

1 ACCEPTED SOLUTION

Accepted Solutions

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

it looks like, Max I/O streams is the problme.

you did set the Max concurent jobs in Storage unit is 90 for each.

and also Max I/O streams for each disk pools is 90.

so at any point of time only 90 streams can be active for each disk Pool, but as you specified 90 in Storage units,  from eaah SLP source is allocating 90 streams, destination is 0 results all are in Queue.

its like:-

for DD01_SU

MaX I/O stream in disk pool =90

Max jobs in disk stu = 90

for DD02_SU

 

MaX I/O stream in disk pool =90

Max jobs in disk stu = 90

so when duplication starts for DD01 LSP. all these are gettting allocated at source end,  DD01 SLP taking 90 at source  and noting left for the SLP DD02 jobs  results queue jobs.

its same way for DD02 SLP.

its a dead lock situation.

3 ways to come out, and first cancle all Duplicate  jobs, and impliment one of the below.

1) Reduce the Max jobs count in each STU.

or

2) Increase I/O at Disk pool (not recommented as its DD670, might not be albe to handle more )

or

3)Deactive one SLP untill other SLP gets compleated.

 

 

 

View solution in original post

13 REPLIES 13

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

2/2/2013  1:36:35 AM - Info nbrb(pid=3248) Limit has been reached for the logical resource LCM_dd01-su

 

1)do you see any other active jobs for LCM_dd01-su?

2) what are the max concurrent jobs for the stu LCM_dd01-su?

3) what is the max I/O streams for the disk pool assiciated with STU LCM_dd01-su?

4) what is the output of /usr/openv/netbackup/bin/admincmd/nbstlutil report

 

"It seems even though they run and complete, they just queuue up again and run again." definatlly it should not be, unless they defined in SLP for one more copy.

 

 

 

John_Deackman
Level 3
Thanks Nagalla, 1) No, I dont even have a storage unit called LCU_dd01-su, they are only called dd02_su and dd02_su and valid duplication jobs do not request this su. 2) The valid dd_01 nad dd02_su are both set to 90 concurrent jobs. 3) Max I/O is also set to 90per volume. 4) what part of that report are you specifically looking for? Lots of options...

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

hi,

called dd02_su and dd02_su and valid duplication jobs do not request this su., ---> I assume those are dd01_su and dd 02_su, correct me if I am wrong.

please let us know the 2 SLP names that  are in Issue.

please porovide the below output 

nbstl -L -all_versions   

/usr/openv/netbackup/bin/admincmd/nbstlutil report   ---> is the command just run it.'

bpstulist -label <storage unit name>  ---> for both stoarage unit name 

please provide these attachments as attachment.

 

John_Deackman
Level 3
File attached.

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

hi,

i am still looking for 

the 2 SLP names that  are in Issue.

/usr/openv/netbackup/bin/admincmd/nbstlutil report   ---> is the command just run it.'

bpstulist -label <storage unit name>  ---> for both stoarage unit name 

 

and also the details status of latest Dupliction job.

John_Deackman
Level 3
Missed that part, here they are.

John_Deackman
Level 3
Here is a current Duplication waiting to run

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

it looks like, Max I/O streams is the problme.

you did set the Max concurent jobs in Storage unit is 90 for each.

and also Max I/O streams for each disk pools is 90.

so at any point of time only 90 streams can be active for each disk Pool, but as you specified 90 in Storage units,  from eaah SLP source is allocating 90 streams, destination is 0 results all are in Queue.

its like:-

for DD01_SU

MaX I/O stream in disk pool =90

Max jobs in disk stu = 90

for DD02_SU

 

MaX I/O stream in disk pool =90

Max jobs in disk stu = 90

so when duplication starts for DD01 LSP. all these are gettting allocated at source end,  DD01 SLP taking 90 at source  and noting left for the SLP DD02 jobs  results queue jobs.

its same way for DD02 SLP.

its a dead lock situation.

3 ways to come out, and first cancle all Duplicate  jobs, and impliment one of the below.

1) Reduce the Max jobs count in each STU.

or

2) Increase I/O at Disk pool (not recommented as its DD670, might not be albe to handle more )

or

3)Deactive one SLP untill other SLP gets compleated.

 

 

 

John_Deackman
Level 3
Ok, but I guess what i need to know is, what is LCM_dd01-su? There is a way to clear the Duplications, as I have had Symantec help me do this one other time, but why is there no information within the job as to what image it is duplicating to the other DD?

John_Deackman
Level 3
Is LCM_dd01-su just a logical name the internal operations of NBU call the SU? I also noticed I am getting a Status 84 (failed media write) on some of these.

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

LCM_dd01-su is acatully mean of dd01-su Storage unit.

84 might becuase of the large number of I/O streams.

you will see the image ID , once its start replicating..

 

first think that you need to do is, recude the Max concenret jobs in STU., may be to 40 ro 50. for both stoarge untis.

cancle all duplication jobs in activity moniter, and let them start again.

and see how they are moving.

----------

PS:- if you do not want them to duplicate , you can cancle them using nbstlutil command.

John_Deackman
Level 3
Waiting to see the results of the changes. Ended up rebooting (after shutting down NetBackup) all of the servers. Just another quick question for you, I do not see a bpjobs.act.db file in the \NetBackup\db\jobs folder. Did this go away in 7.5?

John_Deackman
Level 3

Duplication jobs have finally finished after 3 solid days of running.  Everything is caught up and back to normal.