cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup Restore priority problem over duplications

Nithin_S1
Level 3

We are using SYMC Netbackup 7.5.0.4 and EMC Data Domain. Data from the EMC Data Domain gets duplicated to the Tape library for offsite purposes.

With just 10 drives, only 10 duplications jobs runs parallel and others remain in queue.  During this time when we get a restore request, it just remains in teh queue and never gets the require priority. Even if I cancel one of the duplication job to release the drive, it's the another queued job goes active but restore remains in queue again. For some reason Restore is always given less priority than the queued duplicaiton jobs which shouldn't be the case.

Any ideas? I tried suspending the SLP (nbstlutil -lifecycle inactive) and but don't won't to kill/cancel all the duplications as that's not a practical solution.

 

Regards

Nithin

2 ACCEPTED SOLUTIONS

Accepted Solutions

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

before canceling the duplication job, reduce the number of concurent Drives in the tape storage unit to 9 or less, so that once you cancle, they will not get assigned to another SLP job, and avaliable for restore.

see the netbackup admin guide 2 to understand how EMM allocates the Resources to the jobs.

View solution in original post

Andy_Welburn
Level 6

There is always a little confusion when high priority jobs queue when lower priority jobs that are queued appear to get precedence over resource allocation - an issue we've covered a few times on these pages.

If resources are already in use (e.g. specific media & therefore tape drives) and there are jobs queued that can utilise those resources, then they will take precedence irrespective of the priority of any other queued job that requires some of those same resources (e.g. tape drive).

NetBackup works this way so that it does not unnecessarily keep loading & unloading media - if queued jobs of a lower priority can utilise the loaded media then they will take precedence over a higher priority queued job that would require different media.

Understanding the Job Priority setting on Windows
http://www.symantec.com/business/support/index?page=content&id=HOWTO34237

The only time I've personally seen a higher priority (restore) job 'jump in', as it were, is when the lower priority jobs required a media change at which point the higher priority job took control of the drive & the lower priority jobs waited until the drive (or another) became free.

Do you try & mitigate this by wastefully always having one drive free just in case you get a restore request when all drives are being utilised, or do you deal with it as and when it occurs?

View solution in original post

9 REPLIES 9

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

before canceling the duplication job, reduce the number of concurent Drives in the tape storage unit to 9 or less, so that once you cancle, they will not get assigned to another SLP job, and avaliable for restore.

see the netbackup admin guide 2 to understand how EMM allocates the Resources to the jobs.

Nicolai
Moderator
Moderator
Partner    VIP   

It is a know working method of Netbackup. Keeping a tape drive writing seems to have higher priority than a restore request witch require a dismount/mount operation.

The work around is creating a storage unit with only 9 drives in it - this reserve one tape drives for restores. There is no 1:1 relation between physical drives and storage units in Netbackup. Done this myself.

Marianne
Level 6
Partner    VIP    Accredited Certified

Are you saying that you need to restore from the duplicate copy on tape?

What is the retention on the DD? 
One would think that with dedupe, you would be able to keep the backup copy on DD disk for longer, making restores from the DD a non-issue, right?

Restore from physical tape should really only be needed in a DR environment or if you want to restore from old backups (like a year old...).

Nithin_S1
Level 3

Thanks Nagalla, that worked:-)

 

@Nicolai, we have 10 media servers and one drive per media server. So it would be 10 different storage units. Reason being we backup different zones like DMZ/External and they need those media servers in same subnet that can communicate to the clients.

My main concern, till I upgrade to Netbackup 7.5.0.4 and started using Data Domain, restore had the highest priority. Earler the moment previous job gets over, it would be the restore that's automatically picked from the queue which's no more happening.

 

@ Marianne, Images are kept on DD only for 2 weeks and if the restore request is prior to that, it would be in tapes that we store in offsite for DR reasons. In this case, it was two weeks older and we recalled the tape and initiated the restore. Restore just be in keep waiting and only the duplications are getting the priority. 

and Nagalla's method worked but it's definitely a workaround.

 

 

Andy_Welburn
Level 6

There is always a little confusion when high priority jobs queue when lower priority jobs that are queued appear to get precedence over resource allocation - an issue we've covered a few times on these pages.

If resources are already in use (e.g. specific media & therefore tape drives) and there are jobs queued that can utilise those resources, then they will take precedence irrespective of the priority of any other queued job that requires some of those same resources (e.g. tape drive).

NetBackup works this way so that it does not unnecessarily keep loading & unloading media - if queued jobs of a lower priority can utilise the loaded media then they will take precedence over a higher priority queued job that would require different media.

Understanding the Job Priority setting on Windows
http://www.symantec.com/business/support/index?page=content&id=HOWTO34237

The only time I've personally seen a higher priority (restore) job 'jump in', as it were, is when the lower priority jobs required a media change at which point the higher priority job took control of the drive & the lower priority jobs waited until the drive (or another) became free.

Do you try & mitigate this by wastefully always having one drive free just in case you get a restore request when all drives are being utilised, or do you deal with it as and when it occurs?

Marianne
Level 6
Partner    VIP    Accredited Certified

Two weeks seems very little for dedupe device. Data Domain actually recommends 'more than 2 weeks' to ensure good dedupe rates.
If this is a regular requirement (to restore backups older than 2 weeks), you may want to re-look at retention levels. Or else have a 'next working day' restore policy  for older backups...
Purchasing another tape drive for restore purposes only may be an option if restores become a daily task.

You may want to upgrade to NBU 7.6 where the entire SLP processing can be scheduled and (if I remember correctly, can be suspended).

Apologies - I realize that I am not addressing your priority issue - just wanted to give you additional options...

Nithin_S1
Level 3

Thanks Andy!

We don't have dedicated drive for restore, I would need atleast 4 if going with that option as not that all media servers and clients can communicate due to firewall/security reasons.

Ok, that clarifies why the restore was not getting priority as we had low priority duplication jobs that could make use of already mounted media.

Can adjust the Storage Unit now to use "0" drives during the restore and cancel the duplication job using the drives for now.

 

 

Nithin_S1
Level 3

Thanks Marianne, I was never aware of the priority for already mounted tapes even job has low priority.

It looks the case here as many jobs that are active/queued from disk -> Tape, are marked to be written to the tape loaded until tha's full. When you cancel, the tape still has enough free space which never gets over and those jobs that can duplicate to same tape keep getting prioritized and come on top per Andy's notes and the Restore still in queue.

I temporarily suspended the duplications, change the SU to use 0 drives, cancelled the active duplication and re-initiated the restore. It's now go the priority.

 

Nicolai
Moderator
Moderator
Partner    VIP   

You can also inactivate/activate SLP processing as well :

# nbstlutil inactive -wait -lifecycle name 

nbstlutil active -wait -lifecycle name 

http://www.symantec.com/docs/HOWTO43731

In 7.6 you can defines SLP windows where SLP operation are allowed to take place.