LTO drives goes down 2 drives from 4 goes down
Hi
My problem is that 2 drives from 4 installed in a liblary SL150 are going donw
tpconfig -d
Id DriveName Type Residence
Drive Path Status
****************************************************************************
0 LTO-6.DRIVE.1 hcart2 TLD(0) DRIVE=1
/dev/nst3 UP
1 LTO-6.DRIVE.4 hcart2 TLD(0) DRIVE=4
/dev/nst2 UP
2 LTO-6.DRIVE.2 hcart2 TLD(0) DRIVE=2
/dev/nst1 DOWN
3 LTO-6.DRIVE.3 hcart2 TLD(0) DRIVE=3
/dev/nst0 DOWN
Currently defined robotics are:
TLD(0) robotic path = /dev/sg14
EMM Server = lllll
I am after the restart liblary sl150 and after stop/start media server
some logs from OS from /var/log/messages
Nov 30 08:45:50 ppppp tldcd[12903]: TLD(0) cannot dismount drive 2, slot 73 already is full
Nov 30 08:45:53 ppppp ltid[11722]: Operator/EMM server has DOWN'ed drive LTO-6.DRIVE.2 (device 2)
Nov 30 08:52:43 ppppp xinetd[2852]: START: nrpe pid=13101 from=::ffff:10.64.7.8
Nov 30 08:52:43 ppppp xinetd[2852]: EXIT: nrpe status=0 pid=13101 duration=0(sec)
Nov 30 08:52:52 ppppp xinetd[2852]: START: nrpe pid=13104 from=::ffff:10.64.7.8
Nov 30 08:52:52 ppppp xinetd[2852]: EXIT: nrpe status=0 pid=13104 duration=0(sec)
Nov 30 08:53:36 ppppp xinetd[2852]: START: nrpe pid=13132 from=::ffff:10.64.7.8
Nov 30 08:53:36 ppppp xinetd[2852]: EXIT: nrpe status=0 pid=13132 duration=0(sec)
Nov 30 08:53:52 ppppp xinetd[2852]: START: nrpe pid=13137 from=::ffff:10.64.7.8
Nov 30 08:53:52 ppppp xinetd[2852]: EXIT: nrpe status=0 pid=13137 duration=0(sec)
Nov 30 08:54:29 ppppp xinetd[2852]: START: nrpe pid=13166 from=::ffff:10.64.7.8
Nov 30 08:54:29 ppppp xinetd[2852]: EXIT: nrpe status=0 pid=13166 duration=0(sec)
Nov 30 08:54:30 ppppp xinetd[2852]: START: nrpe pid=13169 from=::ffff:10.64.7.8
Nov 30 08:54:30 ppppp xinetd[2852]: EXIT: nrpe status=0 pid=13169 duration=0(sec)
Nov 30 08:58:00 ppppp tldcd[13277]: TLD(0) cannot dismount drive 2, slot 73 already is full
Nov 30 08:59:25 ppppp tldd[12850]: TLD(0) [12850] timed out after waiting 855 seconds for ready, drive 3
Nov 30 09:00:06 ppppp ltid[11722]: Operator/EMM server has DOWN'ed drive LTO-6.DRIVE.3 (device 3)
Nov 30 09:02:44 ppppp xinetd[2852]: START: nrpe pid=13591 from=::ffff:10.64.7.8
example error from the job "1: (2009) All compatible drive paths are down but media is available " but I am not sure that is all.
In a libraly I see that is try to mount tapes, in a web gui of SL-150 I see that tape is in a drive but nothing happens more. Drive goes down.
[root@ppppp media]# vmoprcmd
HOST STATUS
Host Name Version Host Status
========================================= ======= ===========
lllll 761100 ACTIVE-DISK
ppppp 761100 ACTIVE
sssss 750000 DEACTIVATED
PENDING REQUESTS
<NONE>
DRIVE STATUS
Drive Name Label Ready RecMID ExtMID Wr.Enbl. Type
Host DrivePath Status
=============================================================================
LTO-6.DRIVE.1 Yes Yes 0021L6 0021L6 Yes hcart2
ppppp.tpsa.pl /dev/nst3 ACTIVE
LTO-6.DRIVE.2 No No No hcart2
ppppp.tpsa.pl /dev/nst1 DOWN-TLD
LTO-6.DRIVE.3 No No No hcart2
ppppp.tpsa.pl /dev/nst0 DOWN-TLD
LTO-6.DRIVE.4 Yes Yes 1375L6 1375L6 Yes hcart2
ppppp.tpsa.pl /dev/nst2 ACTIVE
Regards
Maciej
1: the physical tapes location does not match what NBU has, run a inventory with all tapes dismounted (problem two is likley also root cause to this issus).
2: Likely drive order is wrong, Netbackup think order is 1 2 3 4, when in reality is it 1 2 4 3.
TLD(0) [12850] timed out after waiting 855 seconds for ready, drive 3
Netbackup is expecting drive 3 to be mounted with a tape, but likley it has been mounted in drive 4. After 855 seconds Netbackup give up and down the drive. When Netbackup then mount on tape drive 4, it ends up in tape drive 3 which is already full.
Also ensure Netbackup has SCSI connection to all tape drives - use command lsscsi or /usr/openv/volmgr/bin/scan