Several drives going DOWN in short period with "MEDIA LOAD OR EJECT FAILED"
We have Netbackup 7.7.3 and attached to it tape library SL3000(attached over FC).
1. To the media/master server1(on Solaris 11 SPARC) attached robot and 13 tape drives from SL3000.
2. To the media server2(on SUSE ES 11) attached 5 tape drives from same SL3000
3. To the media server3 (on SUSE ES 11) attached same 4 tape drives from same SL3000 as media server2
It worked fine several years. But now, sometimes, when backups on server2 starts, several drives path may going down on server2 and different drives on server1.
Support of tape library did not find any issues on library.
We deleted all tape drives and added them again(this wag suggestion of tape lirary support team). Stange thing, that not shared drives was added with old names.
This helps for two weeks. And now it comes again..
tpconfig -l output:
Server1:
Device Robot Drive Robot Drive Device Second
Type Num Index Type DrNum Status Comment Name Path Device Path
robot 0 - TLD - - - - /dev/sg/c0tw500104f000b22918l0
drive - 0 hcart3 4 UP - IBM.ULTRIUM-TD6.000 /dev/rmt/7cbn
drive - 1 hcart2 3 DOWN - STK.T10000D.001 /dev/rmt/8cbn
drive - 2 hcart2 2 DOWN - server1_579004003808 /dev/rmt/4cbn
drive - 3 hcart2 1 UP - server1_579004005455 /dev/rmt/0cbn
drive - 4 hcart2 8 UP - STK.T10000D.003 /dev/rmt/3cbn
drive - 5 hcart2 7 UP - server1_579004002624 /dev/rmt/1cbn
drive - 6 hcart2 6 UP - server1_579004008297 /dev/rmt/5cbn
drive - 7 hcart2 5 UP - server1_579004005394 /dev/rmt/6cbn
drive - 8 hcart2 12 DOWN - STK.T10000D.002 /dev/rmt/2cbn
drive - 9 hcart2 11 UP - STK.T10000D.000 /dev/rmt/12cbn
drive - 10 hcart2 10 DOWN - server1_579004006393 /dev/rmt/11cbn
drive - 11 hcart2 9 UP - server1_579004006402 /dev/rmt/10cbn
drive - 12 hcart2 13 UP - server1_579004006389 /dev/rmt/9cbn
Server2:
Device Robot Drive Robot Drive Device Second
Type Num Index Type DrNum Status Comment Name Path Device Path
robot 0 - TLD - - - - server1
drive - 0 hcart2 12 DOWN - STK.T10000D.002 /dev/nst1
drive - 1 hcart2 8 UP - STK.T10000D.003 /dev/nst0
drive - 2 hcart3 4 UP - IBM.ULTRIUM-TD6.000 /dev/nst3
drive - 3 hcart2 3 DOWN - STK.T10000D.001 /dev/nst2
drive - 4 hcart2 11 UP - STK.T10000D.000 /dev/nst4
Server3:
Device Robot Drive Robot Drive Device Second
Type Num Index Type DrNum Status Comment Name Path Device Path
robot 0 - TLD - - - - server1
drive - 0 hcart2 11 UP - STK.T10000D.000 /dev/nst3
drive - 1 hcart2 3 UP - STK.T10000D.001 /dev/nst2
drive - 2 hcart2 12 UP - STK.T10000D.002 /dev/nst1
drive - 3 hcart2 8 UP - STK.T10000D.003 /dev/nst0
In messages on server1 we have errors:
Jun 2 20:20:25 server1 tldcd[18489]: [ID 702911 daemon.error] TLD(0) key = 0x4, asc = 0x53, ascq = 0x0, MEDIA LOAD OR EJECT FAILED
Jun 2 20:20:25 server1 tldcd[18489]: [ID 702911 daemon.error] TLD(0) Move_medium error
Jun 2 20:20:54 server1 ltid[2215]: [ID 702911 daemon.error] Operator/EMM server has DOWN'ed drive STK.579004003808 (device 2)
Jun 2 20:22:07 server1 avrd[2331]: [ID 702911 daemon.notice] Reservation Conflict status from STK.T10000D.000 (device 9)
Tha same time on server2 we have messages:
Jun 2 20:15:58 server2 kernel: st 11:0:0:0: [sg2] Warning! Received an indication that the mode parameters on this target
have changed. The Linux SCSI layer does not automatically adjust these parameters.
Jun 2 20:20:12 server2 kernel: st 12:0:2:0: [sg6] Warning! Received an indication that the mode parameters on this target
have changed. The Linux SCSI layer does not automatically adjust these parameters.
Jun 2 20:23:05 server2 kernel: st 12:0:0:0: [sg4] Warning! Received an indication that the mode parameters on this target
have changed. The Linux SCSI layer does not automatically adjust these parameters.
Jun 2 20:23:54 server2 ltid[15144]: Operator/EMM server has DOWN'ed drive STK.T10000D.001 (device 3)