Several drives going DOWN in short period with "MEDIA LOAD OR EJECT FAILED"
We have Netbackup 7.7.3 and attached to it tape library SL3000(attached over FC). 1. To the media/master server1(on Solaris 11 SPARC) attached robot and 13 tape drives from SL3000. 2. To the media server2(on SUSE ES 11) attached 5 tape drives from same SL3000 3.To the media server3 (on SUSE ES 11) attached same 4 tape drives from same SL3000 as media server2 It worked fine several years. But now, sometimes, when backups on server2 starts, several drives path may going down on server2 and different drives on server1. Support of tape library did not find any issues on library. We deleted all tape drives and added them again(this wag suggestion of tape lirary support team). Stange thing, that not shared drives was added with old names. This helps for two weeks. And now it comes again.. tpconfig -l output: Server1: Device Robot Drive Robot Drive Device Second Type Num Index Type DrNum Status Comment Name Path Device Path robot 0 - TLD - - - - /dev/sg/c0tw500104f000b22918l0 drive - 0 hcart3 4 UP - IBM.ULTRIUM-TD6.000 /dev/rmt/7cbn drive - 1 hcart2 3 DOWN - STK.T10000D.001 /dev/rmt/8cbn drive - 2 hcart2 2 DOWN - server1_579004003808 /dev/rmt/4cbn drive - 3 hcart2 1 UP - server1_579004005455 /dev/rmt/0cbn drive - 4 hcart2 8 UP - STK.T10000D.003 /dev/rmt/3cbn drive - 5 hcart2 7 UP - server1_579004002624 /dev/rmt/1cbn drive - 6 hcart2 6 UP - server1_579004008297 /dev/rmt/5cbn drive - 7 hcart2 5 UP - server1_579004005394 /dev/rmt/6cbn drive - 8 hcart2 12 DOWN - STK.T10000D.002 /dev/rmt/2cbn drive - 9 hcart2 11 UP - STK.T10000D.000 /dev/rmt/12cbn drive - 10 hcart2 10 DOWN - server1_579004006393 /dev/rmt/11cbn drive - 11 hcart2 9 UP - server1_579004006402 /dev/rmt/10cbn drive - 12 hcart2 13 UP - server1_579004006389 /dev/rmt/9cbn Server2: Device Robot Drive Robot Drive Device Second Type Num Index Type DrNum Status Comment Name Path Device Path robot 0 - TLD - - - - server1 drive - 0 hcart2 12 DOWN - STK.T10000D.002 /dev/nst1 drive - 1 hcart2 8 UP - STK.T10000D.003 /dev/nst0 drive - 2 hcart3 4 UP - IBM.ULTRIUM-TD6.000 /dev/nst3 drive - 3 hcart2 3 DOWN - STK.T10000D.001 /dev/nst2 drive - 4 hcart2 11 UP - STK.T10000D.000 /dev/nst4 Server3: Device Robot Drive Robot Drive Device Second Type Num Index Type DrNum Status Comment Name Path Device Path robot 0 - TLD - - - - server1 drive - 0 hcart2 11 UP - STK.T10000D.000 /dev/nst3 drive - 1 hcart2 3 UP - STK.T10000D.001 /dev/nst2 drive - 2 hcart2 12 UP - STK.T10000D.002 /dev/nst1 drive - 3 hcart2 8 UP - STK.T10000D.003 /dev/nst0 In messages on server1 we have errors: Jun 2 20:20:25 server1 tldcd[18489]: [ID 702911 daemon.error] TLD(0) key = 0x4, asc = 0x53, ascq = 0x0, MEDIA LOAD OR EJECT FAILED Jun 2 20:20:25 server1 tldcd[18489]: [ID 702911 daemon.error] TLD(0) Move_medium error Jun 2 20:20:54 server1 ltid[2215]: [ID 702911 daemon.error] Operator/EMM server has DOWN'ed drive STK.579004003808 (device 2) Jun 2 20:22:07 server1 avrd[2331]: [ID 702911 daemon.notice] Reservation Conflict status from STK.T10000D.000 (device 9) Tha same time on server2 we have messages: Jun 2 20:15:58 server2 kernel: st 11:0:0:0: [sg2] Warning! Received an indication that the mode parameters on this target have changed. The Linux SCSI layer does not automatically adjust these parameters. Jun 2 20:20:12 server2 kernel: st 12:0:2:0: [sg6] Warning! Received an indication that the mode parameters on this target have changed. The Linux SCSI layer does not automatically adjust these parameters. Jun 2 20:23:05 server2 kernel: st 12:0:0:0: [sg4] Warning! Received an indication that the mode parameters on this target have changed. The Linux SCSI layer does not automatically adjust these parameters. Jun 2 20:23:54 server2 ltid[15144]: Operator/EMM server has DOWN'ed drive STK.T10000D.001 (device 3)1.3KViews0likes5Comments