Backup Exec 2012 (patched to date).
Windows 2003 SBS SP2
Single HP Ultrium 3 internal SAS tape drive.
Last Friday I inserted a brand new tape into the backup rotation. For this I simply put the new tape in the drive and ran a label job on it. This puts the tape, as you probably already know, into the scratch media pool. Saturday morning I discovered I had received repeated emails, from BENT, asking me to remove the media. I connected to the server and noticed that the Friday night backup job was still running, with 0KB written to the tape, and a remove media alert was visible in BENT. When I checked the properties of the brand new tape in the drive I noticed it had been allocated to one of my backup media pools that are set to overwrite protect the backup data for a set period of time. However, it appears BENT had done this to the tape before it had run the backup; thereby preventing itself from overwriting the blank tape (this mishandling of the media pool allocation is a separate issue I am still dealing with on a Symantec support case)!? So, to get the backup job running again I thought, all I have to do is cancel the backup job, reallocate the tape back to the scratch pool and rerun the job. So, I ignored the media remove request and cancelled the job. Only to find that BENT had ejected the tape anyway!?
Surely, this is a fault in the logic of BENT. The tape should not be ejected until after the operator acknowledges the media removal alert. The backup job hasn't been set up to eject the tape on completion and I didn't acknowledge the media removal request, so BENT has a bug (faulty logic) whereby the tape is still ejected, in this scenario. In this instance, I had to ask my client to go into their office, on a weekend, so they can push the tape back into the drive. Whereby I moved the tape back into the scratch pool and kicked the job off again, which completed without further errors.
What should happen, when a tape that is overwrite protected and a backup job is trying to overwrite the tape is as follows:
1. A media removal alert is generated.
2. The operator is given the opportunity to check the backup job's settings and the state of the media in the drive and take actions to remediate the problem. In my case, this was to cancel the backup job, reallocate the blank tape to the scratch pool and then kick the backup job off again. However, I was precluded from doing this because BENT had already shosen to eject the tape.
3. If my chosen course of action was to swap the media then, and only then, Backup Exec should eject the tape if I acknowledge the media removal alert.
As far as I can see this is a fundamental flaw in the logic of the product that precludes it from being suitable for use at sites that have a single tape backup unit where backups need to be managed unattended.
Any help or suggestions would be appreciated.
Backup Exec will eject a tape if it is either write protected or not appendable. This is by design, and this has been done over virtually all the versions. With a tape autoloader/library it would simply check the next slot...with a stand-alone drive, it will eject the tape if it is unusable.
So there is no design flaw...I know of other backup vendor products that do this.
Have you tried to add that tape into the specific media set that will run for that day?
BE does not move tapes from the scratch media set to any other media set, unless it has written to it as part of a job.
When a tape cannot be overwritten, it is always ejected and a media insert alert issued to request for an overwritable tape. This is by design and is this way since forever.
"Backup Exec for NT", the old product name before Microsoft changed the OS name, which triggered a product name change to BEWS, "Backup Exec for Windows Servers". I think this happened back around BE 8 or 9. Old habits die hard :)
Thanks for your reponses.
BENT, Sorry... I've been using Backup Exec since the days of Arcada (pre Seagate/Veritas and Symantec) and ironically, the acronym seems to fit better with 2012 (sorry, couldnt resist that one)!
I am aware of the way in which, BEWS and BENT before it, works with tape media that is overwrite or append protected. My issue is as follows:
1. BEWS detects the media as not being suitable for the current backup type of the job currently running. This can be because of operator error, allocating a tape to the wrong media pool. Or, in my specific case, this appears to have been the job engine, somehow, allocating the blank (scratch media pool) tape to a media set that is overwrite protected, before it started to write the data for the backup job, thereby locking iteslf out from using the blank tape).
1.2. As BEWS has detected the media issue it issues a media removal alert and ejects the tape. A little point aboput this alert is that it can be acknowledged or cancelled.
Hmm... If there is only one outcome then "Cancel" is pointless and this alert should only be acknowledgeable... Unless the cancel option gives us a chance to remediate the issue with the media and save the backup job (see below)!
3. My issue is that because it has ejected the tape, in an unattended backup scenario, the remote operator is not able to remediate the incorrect tape media set association, until somebody attends the site to push the tape back in. Thereby making proper unattended operations not possible.
4. This situation would be better handled as follows:
4.1. BEWS detects the media as not being suitable for the current backup type of the job currently running.
4.2. BEWS issues a media removal alert, but does not eject the tape yet.
4.3. The remote operator now has a chance to take a look at the media properties and ensure the media is either associated with the correct media set or ejected. If the tape is to be ejected the operator acknowlegdes the media removal request and the tape is ejected.
4.4. However, if the tape can simply be allocated to the correct media set, the operator can cancel the media removal request and the backup job can continue.
The key think here is that, all that needs to happen is the tape eject should not be made the default behaviour, as there is no benefit to this. Media set protection has done it's job and paused the job awaiting input from an operator. If the tape is not ejected the job can be recovered remotely. All that Symantec developers need to do is simply move the event of the tape being ejected, from a default state, regardless of what's going on, to something that only happens when the operator acknowledges the tape to be removed. Simple really...
PS: Just because something has always been or behaved the same, does not mean it is right or cannot be improved upon. If this was the case we would all still be running around in furs clubbing eachother on the head.
...that might be worth an idea. although ejecting a tape would be by design...have you checked the settings to see if it's a simple case of deselecting a setting to prevent the tape ejecting?
My issue is not the fact that BEWS media protection pauses the job and alerts the operator. That part of the design is great. The problem is that, regardless of what's going on, the tape is immediately ejected and the opportunity for rescuing the job has been lost.
The design flaw is only in the default behaviour of the tape being ejected. Simply tweak the design so that the tweak makes the job more recoverable and then make the tape eject an operator controlled event, from acknowledging the media removal event.
I know of other backup vendors that do this too...
The media was a brand new blank tape that had been labelled in BEWS that afternoon. The default behaviour, as you undoubtedly already know, is for newly labelled media to be allocated to the scratch pool. The scratch pool is exactly where this tape sat until the BEWS job engine decided to stuff it up.
If all I did was label the new tape in BEWS, the BE2012 interface makes allocating the tape to any pool other than the default scratch pool, impossible. To do this I would have had to select the "Online Tape" media group and then gone to the properties of the media before I could allocate it to another media pool. Something that I would have only done if I was delirious or having a psychotic episode. Neither of which was the case, at the time. :)
Yep, gathered that much, hence suggesting you add it in as an Idea. The first thing you're going to be told is to make sure the tape inserted is available for a write...but, if there was a delay between ejecting the tape that you could set, it would make more sense...add that in and see what happens.
Yea... tried that in the extensive testing with L1 and L2 engineers.
The current behaviour of BEWS is to have a hissy fit and spit the tape (not the dummy). Whereas changing this from a default behaviour would, as far as I can see, only improve the recoverability of otherwise lost jobs.
Come to think of it, I can't actually think of a scenario where ejecting the tape, by default, is a good idea at all... Can you think of any?
Surely the only time the tape shoul dbe ejected as the immediate default behaviour would be if the tape is full or the job, has finished, and if the job definition specifies it. If the remote operator wants to eject a tape there are many ways this can be done. If somebody is on site and only has physical access to the tape drive, they can also eject the tape. Why make it the default action and only increase the chances of jobs failing? As long as your media protection is set correctly you should never need to have the tape ejected as a default action, unless the media fills up or the job has finished. The latter of which I do not have set for this job.
Note to all of you who are saying this is the way it has always been done. Stating this does not help us. I am not disputing how long this has been the behaviour. I am challenging what I think is, at best, a great improvement for the product and, at worst, a fix for a design flaw.
For sure...and I've actually experienced this before. I looked after sites in some real backwaters places in Africa. Using stand-alone tape drives, when a tape ejected, it took a site engineer hours to drive to the site to put a tape in, or they had to rely on someone to put the tape in. Most times this never happened, and therefore queue up 5% backup success rates for example. Hence I got the go-ahead to put in HP StorageWorks MSL2024 G3 libraries...
I'd definitely vote this up if it was an Idea...post back with the link when you have done so.
Why make it the default action and only increase the chances of jobs failing?
As long as your media protection is set correctly
Going by the discussions posted in the forum, there are a lot of OPP not set correctly.
Ejecting the tape saves the operator a step. Also, the operator may press the power button instead of the eject button, thus causing more problems.
Thanks Craig. I have other clients with BEWS and autoloaders and, obviously, the default behaviour makes a lot more sense. However, just about every small/remote/unattended site using BEWS will be experiencing this problem and will have done, in theory, for multiple revisions of BEWS/BENT. One of my points about this case is that this is a design error; with an effect that is just as real as any software coding error and therefore should be handled with a bug fix and not by a feature request. Currently, this behaviour essentially makes BEWS not fit for the purpose for which it was intended (in this case remote unattended backup sites). Also, considering the screaming issues BE2012 is currently suffering, unless Symantec accept this as a failure/fault in their design and issue a bug-fix number, I don't think it's going to get looked at for a good while, if ever.
I've seen plenty of other posts about this issue and you have also said you have encountered it and were lucky enough to get the autoloaders. However, the resistance from Symantec support, I have had regarding this, has been significant and the engineers have been persisting with their "this is by design" mantra. So, this is going to be one of those problems that most people let go because they hit the "by design" firewall from support. But I will not lie down... Oh no... I'm mad now...
As I have already stated, by making eject the default immediate action you remove ALL opportunities to even try to recover a backup job that has media overwrite/append issues.
Going by the discussions posted in the forum, there are a lot of OPP not set correctly.
I think you are not crediting the majority of people who use this product correctly and don't screw up their OPP. Also, there is always the opportunity for the operator to work out OPP and avoid issues with that. At the moment, it doesn't matter how well you understand OPP... There is no choice with the the tape being spat out... FULL STOP! Why write the software to cater for the minority of people not using it correctly, when the majority who are using it correctly will benefit from the change/fix I am proposing?
Ejecting the tape saves the operator a step.
Forcing the operator to deal with an ejected tape in a remote backup job is a far bigger problem than perhaps saving them the odd eject job. The majority of reasons to eject the tape are covered in other default behaviours and job specific settings in BEWS. i.e. Tapes can be set to eject at the end of a job, as part of the job definition. Also, tapes will automatically eject if they are full. Both very sensible automatic tape eject scenarios.
There are no other scenarios where you would want the tape to be ejected automatically!!
Not when there's a mismatch between the job and the OPP, that's for sure! Just because the tape hasn't been ejected doesn't mean it's being overwritten. Tape not ejected does not mean the same as tape being written to! Remember... My point is that the media remove alert still has to be acknowledged or cancelled before the tape is ejected or written to.
Also, the operator may press the power button instead of the eject button, thus causing more problems.
Please!!!! The operator may also decide to stick their head in a lit oven. How do you propose Backup Exec prevents this?
None of your reasons so far equate to the fundamental failures caused by the tape being ejected as the default, immediate action, with no operator action, for backups at unattended sites.
Hehe...well, I'd say support will ALWAYS say that. They're kind of "scripted" with what they can do, but Connect is the place where you yourself can put an idea forward, and with enough votes, get it considered for some future release.
I doubt that Symantec will retrofit this into older products (as in BE 2010 R3 for instance), but going forward it would make putting an approved idea in easier.
IIRC, the old DLT class of tape drives could actually "suck" the tape back into the drive when it was partially ejected. Therefore, in the old days, I think that the alert options to acknowledge or cancel the eject actually caused different results.
I am simply throwing this out as an historical aside. In no way am I dismissing your suggestions for improvements with modern hardware.
I have found out how the meida in my OP got misallocated in the scheduled backup job. It wasn't the backup job engine it was this...
If a scheduled backup job has a problem it is often immediately put on hold by BEWS. This means that the job can easily miss it's next scheduled execution time, before the operator is able to get to it. This is fair enough...
Once the operator has had a chance to resolve whatever caused the backup job to be put on hold, AFAIK, there is no way to simply tell BEWS to not run the missed job (if it is still within its schedule window).
What you have to do is:
1. Take the job off hold and allow it to start running.
2. Once the job is running, then, and only then, you can cancel the job.
The problem with this is that if you don't cancel the job before it assigns the tape in the drive to your backup media set, you have to remember to go back to the media and assign it to a nedia set that will allow the next nights scheduled job to run.
Now I understand what has happened I can adjust my working processes to cater for it. However, could the behaviour of BEWS be enhanced to improve this situation? That bit I haven;t yet had time to apply myslef to... I'm keener on the bigger and currently unresolvable issue of tape ejections!