02-10-2010 03:31 AM
Hi all,
NBU 6.5.5
ACSLS 7.3
OS : Solaris
I have an intermittent drives going Down issue when a dismount request is submit. I have gone thru all usual steps of troubleshooting with no joy.
Upgraded the firmware on both the SL 8500 and the T10k drives
Updated the EMM mappings version
I even deleted and recreated the drives ( rm -f /dev/rmt* - devfsadm - reconfigure the sg driver - sgscan ..)
Had the faulty replaced by the vendor
Set the Media unmount delay valuie to 60
Here is an excerpt of the messages:
Feb 10 06:01:55 admatriubu01 acsd[5746]: [ID 168411 daemon.error] ACS(2) dismount failure for volume TA0098 on drive (2,1,1,15), ACS status = 56, STATUS_LIBRARY_FAILURE
Feb 10 06:02:30 admatriubu01 acsd[5878]: [ID 168411 daemon.error] ACS(2) dismount failure for volume TA0098 on drive (2,1,1,15), ACS status = 56, STATUS_LIBRARY_FAILURE
Feb 10 06:20:50 admatriubu01 acsd[9563]: [ID 168411 daemon.error] ACS(2) dismount failure for volume TA0098 on drive (2,1,1,15), ACS status = 56, STATUS_LIBRARY_FAILURE
Feb 10 06:50:09 admatriubu01 acsd[14669]: [ID 498531 daemon.error] user scsi ioctl() failed, may be timeout, errno = 5, I/O error
Feb 10 06:54:56 admatriubu01 acsd[14669]: [ID 905004 daemon.error] ACS(2) dismount failure for volume TA0255 on drive (2,2,1,15), ACS status = 29, STATUS_DRIVE_IN_USE
Feb 10 06:54:56 admatriubu01 acsd[14669]: [ID 756643 daemon.error] ACS(2) waiting to resubmit dismount request (attempt 2) for volume TA0255 on drive (2,2,1,15)
Feb 10 07:02:13 admatriubu01 acsd[14669]: [ID 905004 daemon.error] ACS(2) dismount failure for volume TA0255 on drive (2,2,1,15), ACS status = 29, STATUS_DRIVE_IN_USE
Feb 10 07:02:13 admatriubu01 acsd[14669]: [ID 338898 daemon.error] ACS(2) waiting to resubmit dismount request (attempt 3) for volume TA0255 on drive (2,2,1,15)
Feb 10 07:09:30 admatriubu01 acsd[14669]: [ID 905004 daemon.error] ACS(2) dismount failure for volume TA0255 on drive (2,2,1,15), ACS status = 29, STATUS_DRIVE_IN_USE
Feb 10 07:09:30 admatriubu01 acsd[14669]: [ID 821134 daemon.error] ACS(2) waiting to resubmit dismount request (attempt 4) for volume TA0255 on drive (2,2,1,15)
Feb 10 07:16:47 admatriubu01 acsd[14669]: [ID 905004 daemon.error] ACS(2) dismount failure for volume TA0255 on drive (2,2,1,15), ACS status = 29, STATUS_DRIVE_IN_USE
Feb 10 07:16:47 admatriubu01 acsd[14669]: [ID 403389 daemon.error] ACS(2) waiting to resubmit dismount request (attempt 5) for volume TA0255 on drive (2,2,1,15)
Feb 10 07:20:29 admatriubu01 acsd[11585]: [ID 446325 daemon.error] ACS(2) going to DOWN state, status: Timeout waiting for robotic commandFeb 10 07:22:31 admatriubu01 acsd[11585]: [ID 964522 daemon.notice] ACS(2) going to UP state
Any help will be much appreciated
Anthony
Solved! Go to Solution.
02-10-2010 07:56 AM
02-10-2010 04:28 AM
02-10-2010 05:50 AM
02-10-2010 07:33 AM
Marianne,
Thank you for your advise.
Here is the log entry of the acsss_event.log whilst I tried to dismount a volume
2010-02-10 16:20:21 DISMOUNT[0]:
546 N cl_log_lh_er.c 1 99
dm_lh_lib_fail: LH error type = LH_ERR_TRANSPORT_FAILURE
2010-02-10 16:20:21 ACSSA[0]:
1431 N sa_demux.c 1 296
drive 2, 2, 1, 3: Library error, Transport failure
.
2010-02-10 16:20:37 DISMOUNT[0]:
971 N mt_action_dm.c 1 1272
dm_lh_drive_busy: LH error type = LH_ERR_TRANSPORT_BUSY 2, 1, 1,14
2010-02-10 16:23:02 command process[0]:
1283 N cp_sm_error.c 1 492
cp_sm_error, line: 491, Invalid state machine state: CPS_LOGOFF, status
STATUS_PROCESS_FAILURE
2010-02-10 16:23:02 command process[0]:
1283 N cp_sm_error.c 1 492
cp_sm_error, line: 491, Invalid state machine state: CPS_LOGOFF, status
STATUS_PROCESS_FAILURE
02-10-2010 07:56 AM
02-10-2010 08:01 AM
02-10-2010 08:35 AM
The SLconsole is showing 2 drives with both Health Device State are in ERROR- I lost account of the number of time the drives have been replaced when this is reported in SL console. This lead me to believe it can not be a HW issue. I suspect a configuration related issue which could be either in NBU or at the OS level.
Since the start of the year,
cmd_proc
Copyright 2008 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
----------------------------------ACSLS 7.3.0----------------------------------
Identifier State Free Cell Audit Mount Dismount Enter Eject
Count C/P C/P C/P C/P C/P
2, 0 online 0 0/0 0/0 0/0 0/0 0/0
2, 1 online 1867 0/0 0/0 1/0 0/0 0/0
2, 2 online 1871 0/0 0/0 0/0 0/0 0/0
2, 3 online 0 0/0 0/0 0/0 0/0 0/0
3, 0 online 0 0/0 0/0 0/0 0/0 0/0
3, 1 online 1424 0/0 0/0 0/0 0/0 0/0
3, 2 online 2318 0/0 0/0 0/0 0/0 0/0
3, 3 online 0 0/0 0/0 0/0 0/0 0/0
ACSSA>
One more information that may help; there are 18 media servers sharing 6 tapes drives. I am currenlt implemeting VTL to allievate this issue.
02-10-2010 09:14 AM
02-10-2010 10:49 AM
02-10-2010 11:19 AM
02-10-2010 12:10 PM
02-11-2010 06:16 AM
STK engineer was onsite for a health check and he advised me it could very be drive's firmware related issue.
Further investigation of the robot dump file will confirm it.
I will keep you posted how progress
Many thanks again for all your advices
Anthony
02-11-2010 08:14 AM