Dear All ,
I am facing a problem with my netbackup setup for last couple of days . The setup is a tape library SL500 ( with four LTO4 drives ) and two media servers . Two drives are assigned to each media server. Every thing was working fine when suddenly in one of the media server , the drives went to go down after every backup is completed. Manually we have to up the drive and the next backup proceeds . The same thing repeats if this backup gets completed. We have already logged a case with our vendor SUN and Symantec , but doesn't seems that the problem can be resolved . As per the advice from symantec , the tape library firmware , the Fiber card firmware and the drives firmware were upgraded to latest one , but still the problem persist. Symantec is pressing that the drive is faulty and SUN is of the opinion that it is a bug in the software as they have already replaced both the drives.
The OS is solaris 10 with netbackup version 6.5.3.1 . Wherein the below messages gives the drive sense key as the media error , both the drives have been replaced but the problem is still the same.
Can anyone help pls to resolve the issue . The /var/adm/messages log is as given below :
Jun 26 14:58:55 prodback17 scsi: [ID 107833 kern.warning] WARNING:
/pci@500/pci@0/pci@d/SUNW,qlc@0/fp@0,0/st@w500104f000b7a5cc,0 (st4):
Jun 26 14:58:55 prodback17 Error for Command: space Error Level: Fatal
Jun 26 14:58:55 prodback17 scsi: [ID 107833 kern.notice] Requested Block: 1 Error Block: 1
Jun 26 14:58:55 prodback17 scsi: [ID 107833 kern.notice] Vendor: HP Serial Number:
Jun 26 14:58:55 prodback17 scsi: [ID 107833 kern.notice] Sense Key: Media Error
Jun 26 14:58:55 prodback17 scsi: [ID 107833 kern.notice] ASC: 0x14 (recorded entity not found), ASCQ: 0x0, FRU: 0x0
Jun 26 14:59:00 prodback17 ltid[9581]: [ID 560358 daemon.notice] LTID - Sent ROBOTIC request, Type=3, Param2=1
Jun 26 14:59:00 prodback17 tldd[9598]: [ID 433661 daemon.notice] TLD(0) DismountTape L40582 from drive 4
Jun 26 15:01:30 prodback17 tldd[9598]: [ID 858974 daemon.notice] DecodeDismount: TLD(0) drive 4, Actual status: STATUS_SUCCESS
Jun 26 15:01:35 prodback17 ltid[9581]: [ID 133575 daemon.error] Operator/EMM server has DOWN'ed drive HP.ULTRIUM4-SCSI.4 (device 1)
Jun 26 15:02:56 prodback17 ltid[9581]: [ID 656834 daemon.notice] Operator/EMM server has UP'ed drive HP.ULTRIUM4-SCSI.4 (device 1)
Jun 26 15:06:58 prodback17 ltid[9581]: [ID 527589 daemon.notice] LTID - Sent ROBOTIC request, Type=1, Param2=0
Jun 26 15:06:58 prodback17 tldd[9598]: [ID 132060 daemon.notice] TLD(0) MountTape L41459 on drive 4, from slot 55
Jun 26 15:07:48 prodback17 tldd[9598]: [ID 543809 daemon.notice] DecodeMount: TLD(0) drive 4, Actual status: STATUS_SUCCESS
Jun 26 15:07:49 prodback17 ltid[9581]: [ID 429237 daemon.notice] LTID - received ROBOT MESSAGE, Type=54, LongParam=0, Param1=1, Param2=0