cancel
Showing results for 
Search instead for 
Did you mean: 

Catch-up with failed duplications

IanB
Level 4

Can somebody help, please?

Our set-up:  NetBackup 6.5.6 master/media servers, 150-odd clients/policies, D2D2T, images on fibre-connected SAN duplicate to a pair of tape libraries - one local, one remote.  The DSSUs are swept every hour.

Scenario: Our fibre link to remote tape library is being respliced this weekend, so duplications to that library will fail during the hour this job takes.  Duplications to local tape library should not be affected, so parent job will log as successful, and images will in time be cleared from the DSSUs. 

Question:  There won't be any backup activity to worry about, but what is the best way to catch up the missing duplications, to ensure we still have 2 copies?  The main problem seems to be identifying what images were in the parent job's "plan" (only images completed or being processed when the fibre is cut will show in the Details, JobOverview tab).  If I could list these "planned" images easily, I'd create a Bidfile and run bpduplicate.  The only watertight way I can see, is to use bpduplicate with -s option (date/time) set far enough back to be sure to catch the oldest possible "planned" image - then all those images that already have the right number of copies will be skipped, and only the incomplete sets will be filled in.

But this looks a bit like overkill - is there a better way?  I don't want to waste a lot of valuable "window" time by stopping all activity several hours before the work starts, and restarting everything afterwards - after all, no need to halt the other (local) tape library, which can keep going throughout.

Any suggestions?

TIA

Ian

5 REPLIES 5

Mark_Solutions
Level 6
Partner Accredited Certified

How exactly are you doing this?

Is it disk staging, vault or SLP's in use?

Either way you should be able to suspend the run during the work period and then allow it to carry on again once down so that it can catch up

With Disk Staging, Vault or SLPs any missed images would be re-run laster following a failure without doing anything - so i just need to know how you control your duplications to know what is the best way for you

IanB
Level 4

Thanks for your reply.  We're using disk staging, with a 24x7 window, checking each DSSU every hour.  No SLPs or Vault.

Suspend option for running & queued jobs in Admin Console is greyed out - presumably because we don't use Checkpoints.  The "take checkpoints every" option is available for backup policies, but is not ticked.  However, it's the duplications I'm interested in, and I can't see the option on the Storage Unit or Staging Schedule screens.  If I setup checkpoints for backups, will that create checkpoints for duplications as well?  What impact does checkpointing have on throughput?  I don't want to slow things down, as it's hard work getting everything duplicated through the week before the next avalanche!

Thanks

Ian

Mark_Solutions
Level 6
Partner Accredited Certified

Yes - you cannot suspend duplications - what i meant is to just stop the DSSU duplications around the time of the work you will be doing.

Checkpoints are for policies only (I do like to use them as they are handy if a large job fails a long way through) - they will not let you suspend duplications

Assuming you have pretty good performance then as you know when the work will be taking place just change the schedule window on the DSSU from being 24x7 to have say a 4 hour gap in it - 2 hours before the work starts (to make sure what is already queued gets finished) up to an hour after the work should be finished.

If any duplications are still running when the work is due to start then you can just cancel them and as they have no window they will not re-run

Once the window opens again after the work is done they will get re-submitted, so no need to do anything manually

Hope this helps

IanB
Level 4

Many thanks, Mark.

The reminder about checkpoints is useful, and we should investigate this again - I don't know why we ignored it when we first started with NetBackup 10 years ago.  We do have several long backups now (12hrs+, 2TB data, millions of small files), so it is now probably worth doing every few hours, provided it doesn't slow the job down too much.

As regards duplications, there is often a long queue at weekends, and it can take a day or more for jobs to work their way through.  I can't afford to close the duplication window for a long period while the queue dissipates, so I'll just have to run the bpduplicate job as soon as possible after the fibre is up again (while most images involved in failed duplications are still on disk).  If I leave it too long, many images will be cleared off the DSSUs and the local tape library will be overloaded with the task of acting as source for the re-runs.

Thanks for your excellent advice, anyway.

Ian

 

Mark_Solutions
Level 6
Partner Accredited Certified

If an image is on a DSSU it shouldn't dissapear until it has been duplicated

Quickest way to do things would as follows ..

1. As soon as the work is about to begin set an exclude date on the DSSU schedule

2. Cancel all running duplications so that the work can be done

3. As soon as the work has been done remove the exclude date to let them all kick back in.

If you have cancelled some and then run them manually using bpduplicate then they will run again when the window opens next time - and if you restrict the number of copies allowed they will just keep failing

There should be no need to use bpduplicate on images stored in a DSSU