cancel
Showing results for 
Search instead for 
Did you mean: 

Mutilistream Job does not start after resuming from Suspend state

rizwan84tx
Level 6
Certified

Dear all,

Master/media NBU : 6.5.6 | Win 2003 EEx64

Policy with Multistream enabled, doesnt start after i resume from suspend. Rather, the parent job stays in Queue and childjobs show in suspend state. MPX level= 32 is set to all jobs and still its not able to write along with other few backup jobs (not multistreamed) are running.

Scenario:

1) We have 4 drive mapped to the media server, i downed 3 drives and started two policies

             Policy1 : MPX 32 | not multistreamed | 3 jobs (3 clients)

             Policy2:  MPX 32 | multistreamed | 5 jobs (1 client)

2) Alll 8 jobs started writing on the drive D1 (rest 3 drives are in down state)

3) Suspended the Policy 2 parent job and resumed after a while. The job is unable get the drive.

             Parent : Queue & Child: Suspend state

4) When i UPed the other drive D2, then the parent job started writting on drive D1.

 

Its weired why the Parent job did not start after resuming, though there were only 3 job runnign on the drive and MPX is set to 32.

 

Any inputs or idea on this case?

 

37 REPLIES 37

watsons
Level 6

Just like your config, PolicyA is comprised of 2 different clients, not multistream but multiplexing = 4.

watsons
Level 6

Here I got 5 jobs running (2 from PolicyA, 3 from PolicyB) at first using just one drive. Then after the first checkpoint, I suspended parent job of PolicyB, wait for > 5 minutes, only I resume the job.

The 3 suspended jobs did not go active, which behaves just like your case, it reported a "No drives are available". Then I went to bring up a 2nd drive, waited for about 5 minutes, it is still not getting a drive!! From here it is different from your result.

I then suspended PolicyA, and once I did that only PolicyB started to pick up a drive.

I am still checking the log files to see how resource request was processed. But doing a search, I found this issue has been reported before in the forum:

https://www-secure.symantec.com/connect/forums/unable-resume-suspended-backup-jobs

Unfortunately there is no solution/suggestion yet.

I'll share my log analysis later.

 

rizwan84tx
Level 6
Certified

Yeah!! the logic is same. NBU needs any one drive free, for running the resumed job on the original drive. Here come the question..."So how do we force nbrb to allocate the drive for job to resume?"

watsons
Level 6

Was checking mainly the bptm & nbrb logs. 

Drive count:

- before suspending policyB = 0
- after suspending policyB = 0
- after enabling 2nd drive = 0 (still..)
- after suspending policyA, which released the 1st drive = 1

Only then policyB got the resource and start writing.

Further dig into nbrb... shows that MPX does play a part here.

When everything first started, all the allocations, due to the MPX setting, joined into an MPX group for the same resource (drive + media). For my case, there are 3 clients therefore 3 allocationIDs.

PolicyA:  allocID001 & allocID002

PolicyB:  allocID003

When policyB got suspended, the allocID003 is dropped (from "nbrbutil -dump" output) but maybe it didn't get updated to the MPX group. So when it got resumed, it's checking into the MPX group for available resource, but there is none available. So in my case that explains why when i suspended policyA, policyB got the resource back. Again, I don't know why yours had a different behavior, may need to check your nbrb logs.

I am not too sure if it's a bug or not, or just by design that MPX group won't be update during job suspension. Personally, I don't like the media multiplexing approach so I had never need to deal with this. 

Having said that, I am just interested at this unique scenario and like to find out more... did you log a support case anyway?

rizwan84tx
Level 6
Certified

Thanks for the analysis and i did notice that nbrb allocation ID for Policy 2 missing after suspending the Job and after UPing the 2nd drive it joined MPX group. However i will test using your method, by suspending the Policy 1 instead of Uping the 2nd drive and check if resource get allocated.

I haven't opened a case with symantec yet!! Will do after seeing the test result.

Thanks for your interest in this case and appreciate your help!!!

rizwan84tx
Level 6
Certified

Tested as per your setup, I got the same result. 

- Triggered Policy1 and 2

- Suspended Policy 2 (Multistreamed) and resumed after a while --> Unable to allocate drives.                  nbrb dump after suspending - did not show the allocation for policy 2 Job ID and it was not added in MPX group of Policy 1 jobs.

nbrb dump after resuming - nbrb was unable to allocated the resource and notices "requested allocation ID".

- Suspended Policy 1 -> Drive was released to policy 2.

 

Will Log case and update!!

 

rizwan84tx
Level 6
Certified

Logged Case :  414-824-624

Symantec tech has also produced the same issue with the above scenario...Hope they are analysing with backline team and find whether its a bug or NBU design for MPX group.

Yogesh9881
Level 6
Accredited

I am also facing same issue with NBU 6.5.2A Master server installed on windows 2k3 & client is media server with NBU 6.5.2 on win2k3.

Plz let us know if you got any updates from Symantec Support.

Thank you in advance

rizwan84tx
Level 6
Certified

So far i've not received any updates from the support...except that they are analysing the logs....

Yogesh9881
Level 6
Accredited

Any Updates ?

rizwan84tx
Level 6
Certified

So far, update is backline are reproducing the same issue....haven't got any updates from them on the cause of this issue. They are not sure if this is a bug or problem.

Symantec tech support is very SLOW!!!

Andy_Welburn
Level 6

that following a tape drive issue & subsequent drive replacement, to enable testing of the replaced drive we downed all other drives & actually encountered this very issue - i.e. the suspended jobs would not restart on the replaced drive which already had active jobs running on it until I upped the other drives, at which point it started to write to the active, replaced drive.

So basically seeing the same thing as yourself - never encountered this before as we must've always had a drive free when restarting suspended, multi-streamed jobs.

((NB6.5.6 Solaris 9 master/media, child jobs were NFS mounts on master))

rizwan84tx
Level 6
Certified

Thank you Andy for sharing this information. Lets see what Symantec says about this, the new update got from them   is   "Case has been escalated to Engineering team from Backline. And Backline is working with Engineering team on it".

rizwan84tx
Level 6
Certified

Here is the final update from NBU engineering team for this issue.

This issue has been acknowledged as a defect .

We have received the binary for this issue  for netbackup version 7.0

This issue will be fixed in coming netbackup release of 7.5

If any one experiancing this issue, has NBU 7.0; log a case and get the binaries.

 

@ Watson,

          Thank you very much for your co-operation on testing this issue.

watsons
Level 6

My pleasure, Rizwan!

Can you share the EEB # of the fix binary? Thanks :)

rizwan84tx
Level 6
Certified

I guess the EEB # is 2424870, because thats what the filename has eebinstaller.2424870.1.AMD64.exe

rizwan84tx
Level 6
Certified

NBU 7: 2424870.1

NBU 6.5: 2424870.2

Yogesh9881
Level 6
Accredited

Thnakx for the update smiley...........