cancel
Showing results for 
Search instead for 
Did you mean: 

Drives going down after every backup

Dnyanesh
Level 3
Dear All ,

I am facing a problem with my netbackup setup for last couple of days . The setup is a tape library SL500 ( with four LTO4 drives ) and  two media servers . Two drives are assigned to each media server. Every thing was working fine when suddenly in one of the media server , the drives went to go down after every backup is completed. Manually we have to up the drive and the next backup proceeds . The same thing repeats if this backup gets completed. We have already logged a case with our vendor SUN and Symantec , but doesn't seems that the problem can be resolved . As per the advice from symantec , the tape library firmware , the Fiber card firmware and the drives firmware were upgraded to latest one , but still the problem persist. Symantec is pressing that the drive is faulty and SUN is of the opinion that it is a bug in the software as they have already replaced both the drives.

The OS is solaris 10 with netbackup version 6.5.3.1 . Wherein the below messages gives the drive sense key as the media error , both the drives have been replaced but the problem is still the same.

Can anyone help pls to resolve the issue . The /var/adm/messages log is as given below :

 Jun 26 14:58:55 prodback17 scsi: [ID 107833 kern.warning] WARNING: /pci@500/pci@0/pci@d/SUNW,qlc@0/fp@0,0/st@w500104f000b7a5cc,0 (st4):
Jun 26 14:58:55 prodback17      Error for Command: space                   Error Level: Fatal
Jun 26 14:58:55 prodback17 scsi: [ID 107833 kern.notice]        Requested Block: 1                         Error Block: 1
Jun 26 14:58:55 prodback17 scsi: [ID 107833 kern.notice]        Vendor: HP                                 Serial Number:
Jun 26 14:58:55 prodback17 scsi: [ID 107833 kern.notice]        Sense Key: Media Error
Jun 26 14:58:55 prodback17 scsi: [ID 107833 kern.notice]        ASC: 0x14 (recorded entity not found), ASCQ: 0x0, FRU: 0x0
Jun 26 14:59:00 prodback17 ltid[9581]: [ID 560358 daemon.notice] LTID - Sent ROBOTIC request, Type=3, Param2=1
Jun 26 14:59:00 prodback17 tldd[9598]: [ID 433661 daemon.notice] TLD(0) DismountTape L40582 from drive 4
Jun 26 15:01:30 prodback17 tldd[9598]: [ID 858974 daemon.notice] DecodeDismount: TLD(0) drive 4, Actual status: STATUS_SUCCESS
Jun 26 15:01:35 prodback17 ltid[9581]: [ID 133575 daemon.error] Operator/EMM server has DOWN'ed drive HP.ULTRIUM4-SCSI.4 (device 1)

Jun 26 15:02:56 prodback17 ltid[9581]: [ID 656834 daemon.notice] Operator/EMM server has UP'ed drive HP.ULTRIUM4-SCSI.4 (device 1)
Jun 26 15:06:58 prodback17 ltid[9581]: [ID 527589 daemon.notice] LTID - Sent ROBOTIC request, Type=1, Param2=0
Jun 26 15:06:58 prodback17 tldd[9598]: [ID 132060 daemon.notice] TLD(0) MountTape L41459 on drive 4, from slot 55
Jun 26 15:07:48 prodback17 tldd[9598]: [ID 543809 daemon.notice] DecodeMount: TLD(0) drive 4, Actual status: STATUS_SUCCESS
Jun 26 15:07:49 prodback17 ltid[9581]: [ID 429237 daemon.notice] LTID - received ROBOT MESSAGE, Type=54, LongParam=0, Param1=1, Param2=0
3 REPLIES 3

Marianne
Level 6
Partner    VIP    Accredited Certified

http://seer.entsupport.symantec.com/docs/320980.htm
Extract:
Fix: HP drive Firmware H4BS/012.525 is the current version of firmware.

Rajesh_s1
Level 6
Certified
Try to reconfigure the drives once again and check .... And also verify any cleanig aleart is showing on the drive , if so clean the drive and check.

shankar369
Level 3
Certified

If you are suspecting drive problem try to fire more then 20GB O.S backup with UFS Dump or Tar backup. If UFS or TAR Backup completed successfully means that is not a tape drive problem.
Once reconfigure the drive from NBU & try backup