Backup to tape post verify fails but new verify jo...

Jeremy_Isaac · ‎01-03-2013

We have a BackupExec 2012 media server directly attached to two robotic libraries by SAS. When backup jobs run on more than one tape, they usually stay confined to one library and everything stays happy. Sometimes (rarely), a backup job that spans more than one tape starts the job in one library and then (for some reason) starts writing to another tape in the other library. BackupExec finishes the backup fine in this case, but during the verify that comes after, the job waits for the tape in the second library to be inserted into the first. When this happens, I find it easier to cancel the job and run a new verify job on that backup. The new verify job succeeds with no issues. This happens more frequently if the job spans three tapes and has only happened once when the job spans two (in this particular instance, it happened at the same time a three tape job jumped libraries, so it may be related). Both libraries have plenty of tapes that can be overwritten, so there's no reason that I can think of for the job to have started using a different library. Aside from that, there seems to be an issue in the post backup verify stage that doesn't like that the job used tapes in more than one library. When this happens, the job will sit on the first library that was used for the backup waiting for a tape that resides in the second library and continue to send emails asking for the tape to be inserted.

Here's how one such example played out:

* Job starts and selects tape 1 from library A.

* Tape 1 fills to capacity and selects tape 2 from library A.

* Tape 2 fills to capacity and selects tape 3 from library B.

* Job finishes before filling tape 3.

* Verify starts on tape 1 in library A.

* Verify finishes tape 1 and moves on to tape 2 in library A.

* Verify finishes tape 2 and sends an alert that it needs tape 3 to be inserted into library A (even though BackupExec knows that tape 3 is in library B).

* I get annoyed with the email alerts and cancel the backup job.

* I submit a verify for the canceled job (which, by the way, shows the correct byte count for such a job at roughly 1.7 TB)

* Verify job successfully finds all three tapes and shows the same byte count as the canceled job.

While jumping from library A to library B isn't exactly how I think the backup needs to run, it's probably unimportant that this behavior change. What I would really like to see changed is the logic that the new verify job used to successfully find all three tapes be applied to the backup post verify. Just to be complete, every backup job to these two libraries is allowed to append first, then overwrite. This process has been working fine, for the most part. Sometimes, append jobs somehow leave behind an extra appendable tape. This is a separate issue and only a very minor annoyance. I just end up moving out some extra tapes that still have free space on them from time to time, but it's no big deal. Each library holds 22 tapes and it's never been so bad that I've had to move out more than two appendable tapes (leaving two behind, because they'll just be appended to anyway).

Gurvinder · ‎01-03-2013

Are the Jobs only directed to Library A when it spans to Library B to pick up the 3rd Tape ?

Jeremy_Isaac · ‎01-03-2013

All of the jobs are targeting the Autoloaders pool, which allows any two jobs to run concurrently against our two libraries. The fact that it jumps isn't quite as big an issue as the fact that it fails the backup post verify yet succeeds a standalone verify job. I could arbitrarily assign a job to a library if I find particular jobs that are more troublesome than others (one comes to mind), but I would rather see this potential bug fixed instead.

Gurvinder · ‎01-03-2013

If you think it is a bug then you will have to open a tech support ticket and get this investigated. You can try to direct the backup to a particular library and see how it goes

Jeremy_Isaac · ‎03-01-2013

I finished talking with support on this issue and I forgot to update this post. The official response was to target one autoloader, rather than target the autoloaders pool. I would prefer the 15 or so jobs to target the pool so that any two jobs can run concurrently, no matter which job happens to finish when. I would also prefer that any backup job that starts on one library stays on that library until the job finishes. This all works fine if every job uses one or two tapes. As soon as a job requires a third tape, all bets are off. It might jump to another library, but it might not. I have taken the two most troublesome jobs and targeted a different library for each. The rest of the jobs funnel in as best they can. One additional thing to note, jobs that target one autoloader appear to have a slightly higher priority than jobs that target the autoloaders pool. I guess this makes sense if there are only a few jobs to take care of, but it is kind of unusual given how it was scheduling jobs before.

The upshot: Jobs that take up three tapes may fail to verify if all of these things are true:

* The job targets a pool of libraries

* The job decided to switch libraries on the third tape

* The verify occurs immediately after the backup as part of the same operation

I got the sense that somebody might bring this up to the development team, but I guess the official workaround is to target one autoloader.

VOX

Backup to tape post verify fails but new verify job succeeds