cancel
Showing results for 
Search instead for 
Did you mean: 

Mutilistream Job does not start after resuming from Suspend state

rizwan84tx
Level 6
Certified

Dear all,

Master/media NBU : 6.5.6 | Win 2003 EEx64

Policy with Multistream enabled, doesnt start after i resume from suspend. Rather, the parent job stays in Queue and childjobs show in suspend state. MPX level= 32 is set to all jobs and still its not able to write along with other few backup jobs (not multistreamed) are running.

Scenario:

1) We have 4 drive mapped to the media server, i downed 3 drives and started two policies

             Policy1 : MPX 32 | not multistreamed | 3 jobs (3 clients)

             Policy2:  MPX 32 | multistreamed | 5 jobs (1 client)

2) Alll 8 jobs started writing on the drive D1 (rest 3 drives are in down state)

3) Suspended the Policy 2 parent job and resumed after a while. The job is unable get the drive.

             Parent : Queue & Child: Suspend state

4) When i UPed the other drive D2, then the parent job started writting on drive D1.

 

Its weired why the Parent job did not start after resuming, though there were only 3 job runnign on the drive and MPX is set to 32.

 

Any inputs or idea on this case?

 

1 ACCEPTED SOLUTION

Accepted Solutions

rizwan84tx
Level 6
Certified

NBU 7: 2424870.1

NBU 6.5: 2424870.2

View solution in original post

37 REPLIES 37

Andy_Welburn
Level 6

i.e. what resources are not available as far as NB is concerned?

rizwan84tx
Level 6
Certified

It indicates that Storage unit is unable to allocate drives for the job.

I have attached the nbrbutil dump while the job was in queue

rizwan84tx
Level 6
Certified

Andy / Anyone,

Any clue or idea why its happening?

Andy_Welburn
Level 6

then are they mounted with media inappropriate for the resumed jobs?

I have a feeling you're going to say "No. The loaded tape(s) are in the correct pool, retention level etc etc" & the drives do allow multiplexing etc etc .... if that is the case I haven't a clue.

rizwan84tx
Level 6
Certified

The loaded tape in the drive are in correct pool, retention and drive is set to multiplexing, the resumed job only goes active on the drive with loaded tape, when a different drive is available.

Andy_Welburn
Level 6

when a different drive is available."

Huh? So it waits for something it doesn't need to become available before it uses something it does need that's always been available? Where's the logic there?

rizwan84tx
Level 6
Certified

Correct, Thats the problem... It waits for other drive to be available (with no jobs), but nevers uses it; rather it goes active on the drive in which it was writing earlier before suspend.

MariusD
Level 6

i can't start again some suspended jobs. I have suspended for 1 hour and now i want it tu start again the jobs.

 

Some commands to force start the suspended jobs?

 

or what is to do?

 

 

Regards,

Marius

Andy_Welburn
Level 6

i.e. jobs are not resuming until a second drive is made available & then the resumed job uses the other tape drive?

Or is it that you just need to know how to resume a job? If the latter it should've really been a separate discussion, but....

from the GUIs "Help":

To resume a suspended or an incomplete job

 

 

  • 1. Open the Activity Monitor and select the Jobs tab.

     

     

    2. Select the suspended or the incomplete job you want to resume.
    3. Select Actions > Resume Job. All selected jobs are resumed.
     
    Note: Only the backups and the restores that contain checkpoints can be suspended.
    **********************************************************************************************************
    [[or right-click job(s) & select "Resume Job"]]

    rizwan84tx
    Level 6
    Certified

    MariusD,

    Are your jobs multistreamed and experiancing the same senerio as mentioned in my post?

    watsons
    Level 6

    Since you used job suspend and resume, I believe the policy (non database agent) is checkpoint-enabled. What is the checkpoint interval?

    How long did you start to suspend the job after running, and how long did you resume thereafter?

    How many times did you test? Is it always the case when only you up'ed the 2nd drive, and the job would start taking the 1st drive again? 

    I like to test it in my box to see if I get the same problem...  :)

    Marianne
    Level 6
    Partner    VIP    Accredited Certified

    Something else that might help to troubleshoot is nbrb resource allocation. It takes a while before nbrb releases resources and it only checks periodically for available resources.

    Sometimes (for some or other unknown reason) resources remain allocated. This can be seen by checking output of 'nbrbutil -dump'. MDS allocations are listed at the bottom of the output. These 'orphaned' allocations can be released with 'nbrbutil -releaseMDS <mdsAlocationKey>'.

    rizwan84tx
    Level 6
    Certified
    • The Checkpoint interval is 15 minutes.
    • This issue was observed when i noticed the suspended jobs (mutlistreamed) did take the drive after resuming. Hence i tested it.
    • I tested couple of times, suspended the job after 20 minutes from writing, then resumed the same after 2-3 minutes.
    • In realtime backup, the suspended multistream job will stay in queue untill any other drive (except the one which it wrote earlier) gets totally free.
    • Job resumes on the same original drive when it see any one of the other drives free.

    I would appreciate if you share your test results. In your test make sure that you have 2 policies one multistreamed and other not. Start both and try suspending the multistream parent alone and resume; provided that you have only one drive UP.

    rizwan84tx
    Level 6
    Certified

    Marianne,

    MDS allocations in EMM: Shows allocation for other jobs (not multistreamed) running in the drive. Im afraid ,if i release the MDS allocation the active jobs allocated for the media will get cancelled.

    rizwan84tx
    Level 6
    Certified

    Any clue or update? :|

    MariusD
    Level 6

    :)) thanks ....i know how to resume a job ..the problem is when I try to resume a suspended job ...this job does not start.

     

    The Policy is set with checkpoints by 30 min and "allow multiple data streams".

    The job was started for 6 Hours and I suspended and after 30 min i tried again to resume this job ...but he has not started. The drives were empty.

     

    I have stop this job and started again.

    MariusD
    Level 6

    Yeah :)

    watsons
    Level 6

    Rizwan, I am still running the test...

    Didn't follow exactly your config due to environment difference.

    Anyway, I got 5 jobs running (2 from PolicyA, 3 from PolicyB) at first using just one drive. Then after the first checkpoint, I suspended PolicyA (just to test) wait for more than an hour, only I resume the job and PolicyA continues as it is. I know this is different from your scenario, but I just want to make sure I can resume it.

    I will test with suspending PolicyB (like what you did) once I have a chance.

    rizwan84tx
    Level 6
    Certified

    Is Policy A multistreamed jobs from a client?