Forum Discussion

The_Director's avatar
14 years ago

Error 96 - Pause and notify before cancelling

We don't use scratch tapes because LTO5's are expensive and just beacuse I have 12 idle drives and 12 streams of small jobs doesn't mean I would like to write to a fraction of all 12 tapes when it could all fit on 2 or 3 so I assign the number of tapes I believe it is going to take for the job which isn't always right. 

My question is this. Why can't NetBackup when it gets to a point where its finishing up a tape and needs a new tape but doesn't have one to grab say, I need a new tape let me pause the backup for a certain amount of time and alert someone to add a tape to the right pool? And/or have a policy you can setup to add a tape or certain number of tapes if the job is going to 96!

This is the most frustrating thing to me about NetBackup (ok not really but right now it is). It seems like a simple fix and maybe someone has figured out a script to do this. If you have please share beacuse I now have to add 6 LTO5 tapes to a monthly backup set of 24 tapes after 2 NDMP jobs (5.5TB and 4.5TB) failed just before they were about to finish. Leaving me with a number of invalid images but unusable space on the current tapes.

  • You could write a script that checks a volume pool and if there is 1 or less available volumes then have that script move a new tape into the volume pool...

  • Enhancement requests can be logged on the Ideas section of Connect.

    The other option is to understand how NBU selects media and how to best use existing features.

    Read up on 'Maximum number of partially full media property' in Volume Pool properties (NBU Admin Guide I):

    extract:

    Specifies the number of partially full media to allow in the volume pool for each
    of the unique combinations of the following in that pool:
    ■ Robot
    ■ Drive type
    ■ Retention level
    The default value is zero, which does not limit the number of full media that are
    allowed in the pool.
    NetBackup writes data only to the number of partially full media with a given
    combination of attributes in the volume pool. When the number of partially full
    media is reached for a given combination of attributes, NetBackup queues backup
    jobs until media becomes available. If a volume becomes full, NetBackup assigns
    another volume for use if one is available in the pool or in a scratch pool.
    When the number of partially full media is reached, NetBackup queues backup
    jobs until media becomes available. If a media becomes full, NetBackup assigns
    another media for use if one is available in the pool or in a scratch pool.

    ..........................
    ..........................

  • I looked into this but my understanding is that I would have to set this number to at least the number of drives avilable in my storage unit. I would like to be able to set this to 1 or 2 and have it only use this function as a way to select scratch media not for selecting media that is already in the volume pool I am writing to. Then I could have backups writing to all the tapes I assign to the pool and only grab a new tape if it gets down to 1 or 2 partially full tapes

  • Maybe there could be a script that acts the same as diskfull_notify.sh only for tapes. If a 96 is going to occur pause the backup and notify someone.

     

    # diskfull_notify.sh
    #
    # this script is called by NetBackup when a disk full condition occurs.
    # The following events are currently supported:
    #   1. Disk full is encountered when writing a backup image to a
    #      disk storage unit.  The parameters in this case are:
    #       program name (bpdm or bptm)
    #       file being written 
    #
    #      The file being written is still open()'ed by the active bpdm or bptm.
    #
    #      Default action is to delay 0 minutes and then bpdm or bptm will
    #      retry the write.  This script may be modified for other desired
    #      actions, such as removing other files in the affected directory
    #      or filesystem.
  • You do NOT need to set it to the number of drives in your storage unit.

    You can have a storage unit of 10 drives and set this to 2.

    another easy option is set your storage unit to 2 tape drives and have your policy use that storage unit of 2 tape drives - then no matter what the policy can ONLY write to 2 tape drives at a time.

    A storage unit does not get set to specific tape drives but just the number of tape drives. So if you have 10 tape drives and you create a policy of 2 tape drives the jobs will get 2 of the 10 to backup to.  If one of the tape drives goes down, it will get another one but still only a max of 2 drives.

    Say you have 5 policies and each has its own volume pool (not the best way but some people do this) you could create 5 different storage units of 2 drives each and give each policy a different storage unit so each policy backups to 2 drives, so each can only use 2 tapes from its volume pool.

    there are so many way you can limit the number of tapes that net backup will use that you do not need this special script. 

    And if you limit the policy to 2 tape drives and the pool runs out of tapes it can get one from the scratch pool and you don't have to do this micro management every day.

  • Sorry I didn't explain the situation fully.

    I think I understand the option correctly in that if I have a Storage Unit with 10 drives and I set the option to 2 it will only use 2 tapes at a time.

     

    My problem is this:

    I have 12 drives and need to use all 12 to meet my backup window. I assigned 23 LTO5 tapes to the pool (the ammount of tapes used in the previous weeks set). Everything works great until backups for a certain client decide to grow past a full tape in one week. Resulting in 2 96's at the end of a 5.5TB and 4.5TB NDMP backup. Once this happens I have a number of invalid images on tapes that don't get cleaned up and I have to add 7ish tapes to get the jobs to now restart from the beginning and complete. 

    I would like to have the ability to write to all of the tapes assigned to the volume pool using all 12 drives. Once all the tapes in the volume pool are full save 1 or 2 tapes then have the policy add a scratch tape to the pool for use. This way I use all assigned tapes fully and then only 1 tape will be partially full.

  • I hadn't thought about this but im going to do this as a work around for sure. I will always have one empty tape but thats better than nothing. I think I'll have to tweak the selection of "usage" or "available tapes" a little since I've seen NetBackup give me a 96 even if there are available tapes. I assumed it was because the tape that showed as available was preallocated for another backup job that was writing to it. 

    Thanks for the help!

  • Problem is Netbackup could not predict your data size, especially as you said a sudden surge of data size in one of the client.

    The diskfull_notify script cannot do much as well other than to notify the admin for the incident, but keep retrying may still be failing eventually unless the admin can act that fast to resolve the disk full issue. If you are looking for notify, a bpend_notify would do that but still you will get error 96 before reaching this point. 

    I haven't seen one implement an automated script to manage this kind of "if current tapes are almost used up, add more into scratch pool" - well, how do you predict? There got to be another process to monitor this.. which I think is too much of a hassle. Netbackup is designed to work best with scratch pool...

    If I were in this situation, I probably would start using checkpoint in those policies, so if I have "available_media" output in cronjob to keep me update of tape usage, then maybe I can suspend the job before it gets the error 96. Once more tape is replennished, the job can be resumed and wouldn't need to start from point 0 again.

    Or.. we can split the backup selection and use multi-streaming, just to make sure we don't have to backup all data once again if error 96 happens.

  • I don't expect NetBackup to predict my data size (although now that you say that...haha), instead I would be more than happy if I could have a setting that would allow me to choose when to assign scratch tapes to a pool, or be able to use the maximum partially full media option separate from my storage unit settings.

    Basically if I could write to a predefined number of tapes(using storage unit settings) and then if I need tapes beyond that (based on number of partially full media) it could assign scratch tapes based on my settings (ie just enough to bring it back up to the acceptable number of partially full media).