12-03-2018 03:09 AM
Hi
we have master server solairs 10, VCS cluster, NBU 7.7.3
Media Server : HPUX B.11.31 U ia64 , NBU 7.7.3
tape library - SL3000, which has 4 LTO4 and 20 LTO5 tape drives.
My robot control has only 18 LTO5 tape drives configured on it.
As 4 LTO4 are not shared and only configured on specific media server (These drives are working fine).
2 LTO5 are specifically configured on another media server and are not share.
now out of these 18 tape drives, 8 tape drives are not working fine.
they are forever in down-tld mode, if we make them up and run backups, backups fail with tape-rbot error,
Now concern is even robtest is not able to dismount tapes, it gives scsi errors.
m d16 s262
Initiating MOVE_MEDIUM from address 1015 to 2261
move_medium failed
sense key = 0x5, asc = 0x3a, ascq = 0x0, MEDIUM NOT PRESENT
We have checked with tape library vendor according to them there is no hardware issue with tape drives and library.
ALso all devices are in claimed state and visible on OS.
Please suggest what i need to look for to fix this issue.
Solved! Go to Solution.
12-04-2018 02:27 AM
If the device name was present before, then it means that 'something' happened that caused the OS to lose connectivity to the device.
NBU uses the OS for device access.
So, you need to troubleshoot as OS-level.
Check output of '/usr/openv/volmgr/bin/scan' and 'ioscan -f'.
Check syslog file for errors.
12-03-2018 05:08 AM
The tape needes to be ejected from the drive before it can be moved back to slot.
So, if the tape is still in the drive, then the 'MEDIUM NOT PRESENT ' error is correct.
There is no 'unloaded' tape to be picked up by the robot hand.
The media server that loaded the tape needs to unload the tape first.
Robot control host cannot do this if the drive is still reserved by the media server.
bptm on media server will send unload/eject to the tape drive when the job is done.
Evidence can be seen in bptm log on the media server.
I suggest level 3 log for troubleshooting (level 5 only upon request from Veritas Support).
To see why drives are DOWN'ed, you need to add
VERBOSE
to vm.conf on all affected media servers and restart NBU/ltid.
Try to UP the drives after this.
Next time drive is DOWN'ed, the exact reason will by logged in System log on the media server.
(e.g. /var/adm/syslog/syslog.log on HP-UX media server, /var/adm/messages on Solaris media server.)
12-04-2018 02:11 AM
I am getting following error during unload command
unload d17
Opening /dev/rtape/tape73_BESTnb, on the local host, please wait...
Error - cannot open /dev/rtape/tape73_BESTnb (No such device or address)
However this drive is ther in tpconfig -d
18 TLB5-KJC-DRIVE17 hcart2 TLD(2) DRIVE=17
/dev/rtape/tape73_BESTnb DOWN
12-04-2018 02:27 AM
If the device name was present before, then it means that 'something' happened that caused the OS to lose connectivity to the device.
NBU uses the OS for device access.
So, you need to troubleshoot as OS-level.
Check output of '/usr/openv/volmgr/bin/scan' and 'ioscan -f'.
Check syslog file for errors.
12-04-2018 10:56 PM
Thanks, i have fixed the issue.
You guided me right towards syslogs, there were some SCSI RESERVATION CONFLICTs.
Operator requested SCSI Release of Drive TLB5-KJC-DRIVE11 returned RESERVATION CONFLICT
I removed them using st command.
procedure was as follows:
found drive path using tpconfig -d
then found equivalent sctl path using ioscan -m dsf | grep -i "path"
then st -f drivepath -r and vmoprcmd -crwalreleasebyname.
now everything is working fine.