cancel
Showing results for 
Search instead for 
Did you mean: 

Tape Mismatch (Netbackup thinks tapes in drives are not part of library)

Georges_N
Level 4

Hi everyone

 

Been coming here for a while now for solutions, finally found a situation that doesn't seem to match others so here goes:

 

1 x IBM Tape Library

1 x Netbackup Master 7.1 running on Windows 2008 R2

 

What happens is this:

Everyday we remove/add tapes - same process, around the same time. Lately, whenever we run an inventory to insert the new tapes, the slots chosen by netbackup to insert them in are assigned to tapes currently being written on in drives. It's as if Netbackup ignores the information of those tapes, just because they are in drives right now and not idle in their slots.

 

Tried:

Deleting all drives (and the robot) and reinstalling step-by-step in Netbackup

Updating firmware in IBM library for library itself and drives

Billions of inventories on both sides. (ok maybe just millions)

 

The result is the same:

I eject Tapes A B C and D.

I insert tapes E F G and H

EFGH tapes will be assigned to the following slots: Slot 1 2 3 and 4

But wait... those slots are assigned to tapes I J K and L and those tapes are being used by the drives... what gives?

Result: When said tapes are done being used, and ejected from the drives, netbackup suddenly remembers their existence and tries to put them back in their previously-assigned slot. The result is this:

TLD(0) cannot dismount drive 2, slot 164 already is full

Operator/EMM server has DOWN'ed drive IBM.DRIVE02 (device 2)

Along with a few of these sprinkled around:

TLD(0) expected barcode (XXXXXX) in slot 164, found barcode (YYYYYY)

So, the result is a downed drive, and a tape stuck in it (has to be ejected using the IBM Library interface)

Any ideas, whatsoever?

Thanks for reading!

26 REPLIES 26

Georges_N
Level 4

Thanks Martin - I will do everything you posted on Monday and get back to you! Very nice of you.

 

I actually did a quick test earlier today, removing tapes and adding new ones, I've attached what netbackup decides to do (assign the new tapes to certain slots) while unassigning other tapes to "standalone" - as you'll see in the screenshots, the tapes being tossed into standalone mode are currently in drives (2nd shot)

 

To be continued...

mph999
Level 6
Employee Accredited
Thanks George, I look forward to the details. I'll probably aim to speak with you at some point to ensure I have understood things exactly. I tried to reproduce this, a 30 slot VTL, with slots 1 - 22 full, I move the tape in slot 1 into a drive using tpreq. I then 'inserted' a brand new tape into the map of the VTL, and then ran an inventory to empty the map. It ignored slot 1 and instead put the tape into slot 23. So, by whatever means, NBU was aware that the empty slot 1, had a tape 'assigned' to it. Looking in the code, we definately check that if a slot is empty. There is a 'source valid' bit set by the library http://publib.boulder.ibm.com/infocenter/ts3500tl/v1r0/index.jsp?topic=%2Fcom.ibm.storage.ts3500.doc%2Fsref_3584_lselmd.html and I think it is this that could be messing things up - but I've not understood exactly how this gets involved (yet). Once I have a call, I can look at this in work time (at the moment it is an evening project ... ) and quite simply if I can't justify the behavior via the methods of investigation available to me (testing, docs, beating up colleagues etc ...) I'll get engineering involved. M

mph999
Level 6
Employee Accredited
Hi George,
 
Aside of the details I've already said we'll need, I would also like to know the following.  Apologies for all the questions but understanding the exact detail is a vital part of troubleshooting ...  (depending on your answers, so questions may be N/A)
 
OS type and NBU version on the master servers
OS type and NBU version on any media servers
OS type and NBU version on the robot control host
 
nbsu -c -t output from the master server
nbsu -c -t output from the robot control host
 
From the notes so far in the forum post, I see the library is configured as tld type.
 
Exact model of IBM library (I'm going to guess at a 3584 ...)
Firmware version on library
 
Config of library - this might be a bit tricky, so for the moment we'll go for a 'high level' description on the understanding I might come back and ask about individual settings.
 
Is ALMS enabled (if the library is partitioned, I think it has to be)
Is virtual I/O enabled
What drives are fitted in the library (make / model)
How many slots are free, how many slots in total
Is the library partitioned, how many partitions
Does issue happen across all partitions
You mentioned Comm Vault - is the library shared with Comm Vault, or any other appliactions (or perhaps you have  physically different IBM library for Comm Vault)
 
Has this every worked correctly with NBU
Is this a new installation of NBU
If it did previously work, when did it break
Where any changes made at this time (when it started to go wrong) - eg, NBU upgrade, NBU EEB applied, library firmware change, partitioning enabled, any library settings changed
 
From my previous notes (I'll include here again so that it is all in one place)
 
Robots log at full verbose level, from the robot control host. To do this create directory on robot control host as follows:
 
\veritas\volmgr\debug\robots
 
Add VERBOSE into
\veritas\volmgr\vm.conf
 
Create an empty file called
\veritas\volmgr\ROBOT_DEBUG (make sure windows doesn't add a suffix ...)
Stop /start NBU
 
Recreate issue (on robot control host)  it would be most helpful if you could make note of the barcode of the tape in the slot.
 
If possible, make a note of the tapes in the drives when you run the test, and the barcode of the tape in the MAP (or tapes if you use multiple)
the less jobs runnign the better, so if possible just have the one job running that you need to have a tape or two in drives
(The reason is that there is less activity in the log, and simply just makes things a bit simpler when lookin through them as there is less to filter out )
The time you run the inventory (as per the clock on the robot control host)  (just makes my life a bit easier again ...)
 
I think for the moment that will do ...
 
Kindest regards,
 
Martin

mph999
Level 6
Employee Accredited
Hi George, Hope you are well, how are you getting on with this issue. Many thanks, Martin

Georges_N
Level 4

Hi everyone and thanks again for all your help

 

This looks more and more like IBM-only. We've recently updated our Netbackup environment to 7.5.0.6 and it persists. I will have to re-open a new case with IBM and hopefully get a fresh set of eyes on the problem - because it's been quite a pain changing tapes (in-and-out) since the inventory side of Netbackup always tries to send tapes from drives to standalone...

 

 

Thanks again all

Georges_N
Level 4

Ladies and gentlemen, it was the IBM TS3310 (3576) Firmware (which was installed by IBM to support the newer drives)

If someone else is googling this issue and finds this in the near future, upgrade to the TS3310 (3576) 640G.GS007 Library firmware, as well as the LTO5.D8D4 drive firmware.

 

Problem solved, thanks for the NBU help!

mph999
Level 6
Employee Accredited
Thanks for the update - do you know if it was related to the 'source valid'' bit - just curious.