cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup 6.0 MP4/ Tape drives failing when unloading and then reloading tapes in the library.

drj003
Level 3

We use netbackup 6.0mp4 in one of our data centers.

2 server setup.  One server is a master and media.  The other server is a media server.

 

When taking a set of tapes out, and then putting another set in, one or more tape drives go down in the netbackup administration console.  When the issue started, we had to run the device configuration wizard.  This would sometimes make all drives go down, but after running the device configuration wizard several times, the drives would come back up.  Lately it's getting worse.  We've had to use bpdown and bpup and then use the device configuration wizard or restart the master and media server.

 

Also, when this problem happens, duplicate jobs will start.

 

How can I narrow this problem down to the source?  I don't know whether it's the tape drives, the tape library/robotic tape changer, or netbackup.

Please let me know what logs I could look at to troubleshoot.

Please let me know if you would like more information.

7 REPLIES 7

Marianne
Level 6
Partner    VIP    Accredited Certified

How do you go about to 'take the tapes out'? Are you using the GUI to eject or is the robot door opened and tapes manually taken out ? If so, is robot door opened while backups are in progress? Are new tapes put in empty slots manually or via the MAP & Inventory operation?

To get the real reason for drives being DOWN'ed, please add VERBOSE entry in <install-path>\veritas\volmgr\vm.conf  on master and media server. Restart NBU after adding entry.

All Media Management actions/errors will be logged in Event Viewer Application log (including reason for DOWN drives)

drj003
Level 3

Thank you for your response. 

I eject the tapes with the "eject volume from robot" feature of the GUI.  I input the tapes with the "inventory robot/update volume configuration" feature of the GUI.  After I put the tapes back in the robot, the drive(s) will show as down in the netbackup GUI.

Then I run the device configuration wizard.  This is when all 8 drives may go down.  At this point, there could be 1 or multiple drives in "cartridge in" status in the robot.  This is where I open the door to take them out (as the robot isn't taking them out of the drives).  If all drives haven't gone down, there could be a drive still writing.

All tapes are put in through the GUI/inventory process.

VERBOSE is now in the vm.conf file on both the Mater and the Media servers.

Where will the VERBOSE log related to the drives output to?

I will check the Event Viewer.

 

Thanks very much.

Marianne
Level 6
Partner    VIP    Accredited Certified

Nothing wrong with your methods to remove/add media an should not cause drives to go down.

The VERBOSE logging will be sent to Event viewer Application log (but only after stop/start of Device Management service).

Running Device Config wizard seems a bit excessive to fix DOWN drives, unless you know for a fact that it is caused by incorrect device mappings. This should not be done when backups and/or duplications are in progress.

Wait for next DOWN drive and check Event Viewer on the particular media server.

drj003
Level 3

Hi Marianne,

 

I did as you said.  I set the VERBOSE logging and have looked in the Application Event viewer.

I have attached a file with the errors that happen when (from server that is master and media server) taking the tapes out and putting them in.  There are many more errors of the same type, but these errors cover all types of errors I found.

 

drj003
Level 3

These errors happen at the time of tape cycle change.  They are found in the Event viewer application log (description section).

These happen first (a lot of them)-  "unable to find slot 'slotnumber' for inventory".

Then errors that look like the below.  These are only samples, there are many more with possibly different device numbers (same types of errors though).

"Fatal open error on IBM.ULTRIUM-TD2.006 (device 2, \\. \Tape3): The device is not connected.  DOWN'ing it"

"TLD(0) cannot dismount drive 1, slot 107 already"

 

"Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD2.007 (device 3)"

 

"TLD(0) drive does not exist in robot, drive = 5"

 

"Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD2.004 (device 0)"

 

"TLD(0) 3 is an invalid drive number"

 

"TLD(0) 4 is an invalid drive number"

 

"emmlib_UpdateDriveRuntime failed, status=258"

 

"TLD(0) drive does not exist in robot, drive = 4"

 

"TLD(0) cannot dismount drive 2, slot 29 already is full"

 

 

Many more like the above.

 

 

 

 

 

 

 

 

 

 

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Please log a call with your hardware vendor - there seems to be something very wrong with the robot. Maybe there is updated firmware available.
If you select the robot in GUI under Devices -> Robots, you will be able to see current firmware version in the 'Inquiry String' column.

Eric_Zhang
Level 5

my suggestions as below :

 

delete all the robots and tape drives defined in Netbackup. and restart Netbackup services .

then. re-configure the robots and tape drives .after reconfigure. restart the Netbackup services .

then. inventory the medias .

re-fire the backup jobs . see what will happen .