Drive Path keeps on getting down
Hi,
One of our tape drive just recently replaced. After it was replaced, I re-scanned / re-add to introduce its new serial number. I had to delete first the old tape drive via admin console. I did this in the master server. Then I noticed, the drive path of the replaced drive keeps on getting down, and duplication jobs keeps on failing on several media pools when robot auto assign or feeds tape in the replaced tape drive. Also the firmware version of the tape drive and library was updated. We have 4 tape drives. Then when I check on the media server which is also the drive path (bdnbu07), in it windows event logs, event id 14021 - Netbackup TLD Daemon (TLD(0) [7924] timed out after waiting 841 seconds for ready, drive 4) and event id 2636 - Netbackup (Operator/EMM server has DOWN'ed drive HP.ULTRIUM5-SCSI.000 (device 0)) keeps on flagging everytime it reads or duplicate on the affected drive. Is this something a lack of configuration on my part after the drive was replaced? Or still a hardware issue? The hardware vendor just replaced it then update firmware. And said it will automaticall detect. But turned out it didnt so I hade to delete and re-add/scan the new tape drive. Now I let the status of the drive path to be on a down state so as my duplication jobs will not be affected and tapes will not get frozen since the other 3 tape drives are working.
Would appreciate your know solution / expertise help about my scenario.
Thank You! :)
I've never tested the fact that NBU may following the drive even if they are swapped round.
Until today, and my testing shows that it doesn't seem to ... Excuse the odd serial numbers, it's a VTL.
From changer in scanDrive 1 Serial Number : "XYZZY_C1"
Drive 2 Serial Number : "XYZZY_C2"
From tpconfig -dlDrive Name IBM.ULT3580-TD4.001
Index 3
NonRewindDrivePath /dev/nst5
TLD(0) Definition DRIVE=1 (this is wrong, the drive is actually in 'physical' position 2)
Serial Number XYZZY_C2Drive Name IBM.ULT3580-TD4.002
Index 5
NonRewindDrivePath /dev/nst1
TLD(0) Definition DRIVE=2 (this is wrong, the drive is actually in 'physical' position 1)
Serial Number XYZZY_C1
So here, we can see my drives in NBU config are 'swapped' round with what the robot reports.I restarted ltid and ran a job:
Job details shows
02/16/2016 03:32:04 - granted resource E03004
02/16/2016 03:32:04 - granted resource IBM.ULT3580-TD4.001 (This in position 2 in the library, but NBU thinks it is in position 1)
From robtest we see the tape was loaded into the drive in position 1drive 1 (addr 1) access = 1 Contains Cartridge = yes
Source address = 1027 (slot 4)
Barcode = E03004L4vmoprcmd shows the drive in /dev/nst1
IBM.ULT3580-TD4.001 No No No hcart
nbmaster2 /dev/nst5 ACTIVEIBM.ULT3580-TD4.002 Yes Yes E03004 Yes hcart
nbmaster2 /dev/nst1 TLD
The job hangs as the tape never mounts, we see in bptm we're access the wrong drive.03:32:06.049 [23119] <4> create_tpreq_file: symlink to path /dev/nst5 <<<<<< WRONG, tape is in drive with path /dev/nst1
03:32:06.088 [23119] <2> manage_drive_before_load: SCSI RESERVE
03:32:06.090 [23119] <2> manage_drive_before_load: report_attr, fl1 0x00000001, fl2 0x00000000
03:32:06.090 [23119] <4> expandpath: /usr/openv/netbackup/db/media/tpreq/drive_IBM.ULT3580-TD4.001
03:32:06.172 [23119] <2> tapelib: wait_for_ltid, Mount, timeout 0(NOTE the reason the tape is found in the correct drive in vmoprcmd output is because NBU continuously scans the drive and will identify any tape that it inserted into a drive, provisding it has a NBU media header)
(Editted to add: Job eventually failed with Robot load error. Reconfigured drives so they were in sync with the robot drive positions and same job ran successfully).
Suggest you delete and reconfig drives.
Hi,
Martin's isolation is somehow what I did. I found out an incorrect Robot drive number assignment. Viewing on the gui-admin console\devices\drives VS on the Tape Library console Tape drive number and serial number. Cause the serial number does not match. Ex. Two tape drives on the gui mismatched VS on the Tape Library. The robot drive number 1's serial number on the gui does not match the serial number assigned on the Tape Library. Two drives were affected. I just change the drive number on the gui\devices\drives for the affected drives. Then restarted services. Then my duplication jobs going smooth again. Drive paths do not go down as before.
Thanks and Regards