Tape drives going down ...suspect SAN ???
Tape drives going down ...suspect SAN ???
Why I think its the SAN
Two identical libraries (STK SL3000, 5x HP LTO5) with all tape drives shared amongst the media servers. Only one ATL has problems with drives being downed.
- All media servers have HBA static binding ...no issues there
- Nothing wrong with windows OS tape or robotic drivers ...if there was both ATLs would be affected.
- No errors could be found in library logs. Everything looked normal from a physical insection of the ATL.
- Cleaning tape drives is automated and infrequent.... a manual clean didn't help.
- Power cycling tape drives didn't help. That would have cleared any SCSI locks.
- Wasted a lot time in the BPTM logs ...they are bearly human readable ..leave that one up to Support.
- Only one tape library affected. If the problem was on the media server then the second tape library would also be affected.
The ATL has behaving itself for the last three years, first one tape drive became a problem now its spreading to the other tape drives. With the ATL looking and reporting as OK and the media servers are OK plus only one ATL affected can I assume the problem is very likely to be the SAN?
Assumptions
- The robotic control SAN path is responsible only tape movements, library inventories and possibly library reporting status.
- The tape drive SAN connection is a data path, tape control (positioning) and reporting tape status.
If these assumptions are true then when you have errors on each of the tape drive SAN connections which leads me to assume a SAN issue. If all the reporting is via the robotic SAN path and the tape drive is data only then it could be either library problem or/and a SAN problem.
Have you checked Event Viewer System log?
The fix "appears" to simple......library was rebooted. It had been running for ~4yrs without a reboot. So far it hasn't reported any issues after 12 hrs. Give it a couple days that will confirm the solution.