04-26-2013 06:46 AM
Hi everyone
Been coming here for a while now for solutions, finally found a situation that doesn't seem to match others so here goes:
1 x IBM Tape Library
1 x Netbackup Master 7.1 running on Windows 2008 R2
What happens is this:
Everyday we remove/add tapes - same process, around the same time. Lately, whenever we run an inventory to insert the new tapes, the slots chosen by netbackup to insert them in are assigned to tapes currently being written on in drives. It's as if Netbackup ignores the information of those tapes, just because they are in drives right now and not idle in their slots.
Tried:
Deleting all drives (and the robot) and reinstalling step-by-step in Netbackup
Updating firmware in IBM library for library itself and drives
Billions of inventories on both sides. (ok maybe just millions)
The result is the same:
I eject Tapes A B C and D.
I insert tapes E F G and H
EFGH tapes will be assigned to the following slots: Slot 1 2 3 and 4
But wait... those slots are assigned to tapes I J K and L and those tapes are being used by the drives... what gives?
Result: When said tapes are done being used, and ejected from the drives, netbackup suddenly remembers their existence and tries to put them back in their previously-assigned slot. The result is this:
TLD(0) cannot dismount drive 2, slot 164 already is full
Operator/EMM server has DOWN'ed drive IBM.DRIVE02 (device 2)
Along with a few of these sprinkled around:
TLD(0) expected barcode (XXXXXX) in slot 164, found barcode (YYYYYY)
So, the result is a downed drive, and a tape stuck in it (has to be ejected using the IBM Library interface)
Any ideas, whatsoever?
Thanks for reading!
Solved! Go to Solution.
10-28-2013 12:18 PM
Ladies and gentlemen, it was the IBM TS3310 (3576) Firmware (which was installed by IBM to support the newer drives)
If someone else is googling this issue and finds this in the near future, upgrade to the TS3310 (3576) 640G.GS007 Library firmware, as well as the LTO5.D8D4 drive firmware.
Problem solved, thanks for the NBU help!
04-27-2013 04:14 AM
it looks like your Master server is victim of media labeling,.
does the tapes that you insert are new tapes.. and have barcode labeled?
or old reused tapes?
how you are managing barcode rules in your master server envirornment?
show us the detail errors messages that you are reciving....
04-27-2013 05:09 AM
Please tell us how exactly you are ejecting and inserting media in the robot?
What is the model name/number of your IBM robot?
Does your robot have a MAP (media access port)?
If you are using the eject facility in NBU, the robot as well as NBU will be updated.
If you are inserting the new tapes via the MAP and use NBU inventory to 'Empty MAP before updating', then NBU sends the message to the robot to move tapes into empty slots.
It is not NBU that decides where the tapes must go - it is the robot that makes this decision.
If you are using above method to eject and insert media and tapes are going into the wrong slots, there is something wrong with your robot. Log a call with your hardware vendor.
I have only ever seen these kind of errors where operators were opening the robot door to insert tapes manually.
04-27-2013 02:13 PM
I'm going to guess this is a IBM 3584 series library ...
How many times have I seen similar issues like this with this library, lots and lots ...
The bad news, is that NetBackup is not selecting the slots to put the tapes in, the library is doing that. This is almost certainly nothing to do with NBU, and nothing can be done in NBU to resolve it.
From your wording :
"Lately, whenever we run an inventory to insert the new tapes, the slots chosen by netbackup to insert them"
I think you are adding the tapes into the MAP/ CAP.
If ...
1. You are adding the tapes to the MAP
2. You have powercycled the library, and then run an inventory in NBU
and after this the issue still remains ...
There is not much else you can do, apart from call IBM.
There are some settings on the IBM libraries that can cause havic with NBU (or any other backkup software), but I've personally only seen these config issues cause slots to be non-visible.
This TN:
http://www.symantec.com/docs/TECH169477
explains that library invontary issues are outside of NBU. I appreciate that the TN does not cover your exact issue, you will have to trust me.
The way it works is like this, despite what anyone else may say.
When running an inventory, NBU sends scsi commands to the library to start the inventory/ empty CAP.
What happens after that, is decided only by the library.
What is seen in NBU (eg where the tapes end up) is only sent back from the library, NBU has no control over this at all.
Hope this helps,
Martin
04-29-2013 06:53 AM
Sounds about right. I think your library isn't doing it's audit correctly or you have tape labels that don't match the internal tape label. Call IBM.
04-30-2013 11:40 AM
Great input, thanks guys. I'm reading up on some IBM issues (it's a TS3310 Tape Library) but what has me baffled is my way of fixing it temporarily...
1) When doing an inventory after removing/adding tapes into Netbackup, I notice the inventory changes proposed are conflicting with each other:
Logically move media ID AAAAAA from slot 111 to standalone residence
Logically move media ID BBBBBB from slot 222 to standalone residence
Logically move media ID CCCCCC from standalone to slot 333
Logically move media ID DDDDDD from standalone to slot 111
Logically move media ID EEEEEE from standalone to slot 222
So, the first 2 lines suggest moving tapes from slots to outside the library, and the 3 next lines are new tapes I've added. The problem is that the 2 first tapes, A and B, are not in those slots, but are in fact in drives, currently being written. Netbackup ignores them and decides to assign their reserved slots (empty yes, but assigned to them) to new tapes. When those jobs finish, the tapes are ejected but the slots are assigned to new tapes, so netbackup downs the drive and leaves the tape in the drive. I have to manually select the tape in IBM and remove it, then assign it to a whole new slot.
A bit more about my choice of words:
New Tape = tape that was previously ejected, 99.9% of the time sent to a storage facility outside the building for a month. Sorry, I didn't mean an actual new tape with new label - these are tapes Netbackup already knows and has already labelled.
Barcode rules are managed by Netbackup using our policies - doesn't seem to be the issue.
We always select "empty MAP" when running inventory. We manually eject tapes and send them outside the building, and then manually insert tapes. Inventory every time...
If it is in fact the IBM robot that chooses which slots to put them in, then again, wow. I'll have to run some tests later today/tomorrow and see how it behaves first-hand.
Thanks again for the replies, I will get back with more info soon!
04-30-2013 12:10 PM
>>The problem is that the 2 first tapes, A and B, are not in those slots, but are in fact in drives, currently being written. Netbackup ignores them and decides to assign their reserved slots (empty yes, but assigned to them) to new tapes.
As Martin states above "What is seen in NBU (eg where the tapes end up) is only sent back from the library, NBU has no control over this at all."
You will need to engage IBM support. The library assigns slots for the media not NetBackup. The library seems to be forgetting the media in the drives will return to respective slots.
05-02-2013 02:52 PM
"If it is in fact the IBM robot that chooses which slots to put them in, then again, wow. I'll have to run some tests later today/tomorrow and see how it behaves first-hand."
It is, I promise you ...
Despite 'very' popular opinion, including that of certain hardware vendors, Netbackup actally has very very little to do with tape drives/ robots.
It does not write, or read from tape for a start, the actual write /read to a tape is carried out by the OS.
For inventories, loading, unloading drives, these are simple scsi commands sent to the library, what happens after the scsi command is sent, is 100% out of NBUs control.
Sure, if the NBU config is wrong this will cause issues, but the vast majority of tape and library issues are outside of NBU, and anything related to a TAPE_ALERT or ASC/ ASCQ errrors are 100% outside of NBU (apart from perhaps the cleaning tape alert if NBU is meant to be cleaning but has incorrect config for the cleaning tapes).
M
05-03-2013 04:39 AM
Just wanted to double double check how you load tapes (though it does sound like the infamous IBM library issue)
Do you only put tapes into the mail slot (load port or what ever you want to call it) or do the operators open the doors / magazines and insert tapes into empty slots?
I have had a lot of customers that have loaded tapes into empty magazine slots not realizing that it that empty slot belongs to a tape in a drive it cannot then be unloaded as it belongs to that slot.
Just wanted to be sure of your process when loading tapes
06-26-2013 07:01 AM
Actually we load tapes through the i/o door - the arm then loads them into a free slot (well, free as in empty but it looks like it's choosing the wrong slots)
Just an update - we're using Netbackup Vault now to do auto-ejects, which are fine. It's the loading part (and inventory) which just looks like it's trying to load tapes into taken slots (the taken slots are taken by tapes currently being written, everytime)
06-26-2013 08:38 AM
07-18-2013 08:31 AM
Thanks for all the tips everyone - the library (IBM) works just fine on another system (CommVault) - this is a netbackup-only issue. It seems to happen exactly as is:
- XX amount of tapes in slots
- YY amount of tapes in drives
XX tapes are ok, YY tapes don't even exist according to netbackup, even though it sees them inside tape drives at this very moment.
Insert new tapes, inventory, netbackup will ignore tapes YY and assign the new tapes to those previously used by YY. Chaos ensues once YY tapes are done writing and are ejected.
07-18-2013 09:02 AM
Hmm, I will revisit this and do a tab of research, but I am very sure that we have no control over this.
However, the comm vault example suggests otherwise, I have to agree.
M
07-18-2013 09:28 AM
Can you set the robots log
Create /usr/openv/volmgr/debug/robots
Add VERBOSE to /usr/openv/volmgr/vm.conf
Restart media manager service
stopltid
then ltid -v
Recreate the issue and post up the log
Many thanks,
Martin
07-18-2013 05:20 PM
I've been doing some testing and need to look through the logs from the library.
In the meantime, I spoke with a colleague who is very very knowledgeable with libraries. He conformed that on an 'empty map' inventory we leave it up to the library where it puts the tapes.
Comm Vault may do things differently - I have to say I have absolurly no idea.
The IM chat between me and my colleague went like this
07-21-2013 05:07 PM
George,
I sent you an email, can you log a call and post the case number up here.
I will post details up here, but at the moment things are 'unclear' and I don;t want to post up 'half of the story' as it will lead to confusion of people reading this post.
Martin
07-24-2013 10:23 AM
Thanks Martin - will do.
From the event viewer (examples):
TLD(0) expected barcode (E00063L5) in slot 73, found barcode (C00032L5 )
TLD(0) expected barcode (D00031L5) in slot 149, found barcode (A00116L5 )
TLD(0) expected barcode (C00068L5) in slot 169, found barcode (A00017L5 )
07-24-2013 10:45 AM
Is someone opening the robot door and moving tapes manually?
07-24-2013 10:48 AM
no just the I/O when tapes are ejected (to send out) and new tapes inserted.
07-24-2013 11:20 AM