Troubleshooting a "failed" disk
Hello all,
First things first, I have looked at this article to try and solve my problem, without success.
My server is configured this way:
SunOS hunahpu 5.9 Generic_122300-45 sun4u sparc SUNW,Netra-T12
System Configuration: Sun Microsystems sun4u Sun Fire E2900
System clock frequency: 150 MHZ
Memory size: 49152 Megabytes
pkginfo -l VRTSvxvm
PKGINST: VRTSvxvm
NAME: Binaries for VERITAS Volume Manager by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.0,REV=05.11.2006.17.55
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Virtual Disk Subsystem
PSTAMP: Veritas-5.0-MP3.50:2008-08-17
INSTDATE: Nov 09 2009 18:41
HOTLINE: http://support.veritas.com/phonesup/phonesup_ddProduct_.htm
EMAIL: support@veritas.com
STATUS: completely installed
FILES: 1007 installed pathnames
37 shared pathnames
19 linked files
111 directories
443 executables
435179 blocks used (approx)
The server is connected to a NetApp FAS 3070 cluster..
On saturday night, we had to do UPS maintenance and powered off the servers. When I powered this server up we saw the following error messages from vxvm:
May 5 02:10:24 hunahpu vxdmp: [ID 803759 kern.notice] NOTICE: VxVM vxdmp V-5-0-34 added disk array 3090882, datype = FAS3070
May 5 02:10:24 hunahpu vxdmp: [ID 803759 kern.notice] NOTICE: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
May 5 02:10:24 hunahpu vxdmp: [ID 415628 kern.notice] NOTICE: VxVM vxdmp V-5-3-1700 dmpnode 275/0x0 has migrated from enclosure FAKE_ENCLR_SNO to enclosure DISKS
May 5 02:10:24 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x3f58 belonging to the dmpnode 275/0x28
May 5 02:10:24 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x15e0 belonging to the dmpnode 275/0x28
May 5 02:10:24 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x1d68 belonging to the dmpnode 275/0x28
May 5 02:10:24 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x2638 belonging to the dmpnode 275/0x28
May 5 02:10:24 hunahpu vxdmp: [ID 824220 kern.notice] NOTICE: VxVM vxdmp V-5-0-111 disabled dmpnode 275/0x28
May 5 02:10:57 hunahpu vxdmp: [ID 736771 kern.notice] NOTICE: VxVM vxdmp V-5-0-148 enabled path 32/0x15e0 belonging to the dmpnode 275/0x28
May 5 02:10:57 hunahpu vxdmp: [ID 899070 kern.notice] NOTICE: VxVM vxdmp V-5-0-147 enabled dmpnode 275/0x28
May 5 02:10:57 hunahpu vxdmp: [ID 736771 kern.notice] NOTICE: VxVM vxdmp V-5-0-148 enabled path 32/0x2638 belonging to the dmpnode 275/0x28
May 5 02:10:57 hunahpu vxdmp: [ID 736771 kern.notice] NOTICE: VxVM vxdmp V-5-0-148 enabled path 32/0x1d68 belonging to the dmpnode 275/0x28
May 5 02:10:57 hunahpu vxdmp: [ID 736771 kern.notice] NOTICE: VxVM vxdmp V-5-0-148 enabled path 32/0x3f58 belonging to the dmpnode 275/0x28
May 5 02:10:57 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x3f58 belonging to the dmpnode 275/0x28
May 5 02:10:57 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x15e0 belonging to the dmpnode 275/0x28
May 5 02:10:57 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x1d68 belonging to the dmpnode 275/0x28
May 5 02:10:57 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x2638 belonging to the dmpnode 275/0x28
May 5 02:10:57 hunahpu vxdmp: [ID 824220 kern.notice] NOTICE: VxVM vxdmp V-5-0-111 disabled dmpnode 275/0x28
May 5 02:11:00 hunahpu vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-546 Disk capadg01 in group capadg: Disk device not found
May 5 02:11:00 hunahpu vxesd[474]: [ID 360244 daemon.notice] Event Source daemon started
May 5 02:11:01 hunahpu vxesd[474]: [ID 342331 daemon.notice] HBA API Library Loaded
May 5 02:11:02 hunahpu vxdmp: [ID 238993 kern.notice] NOTICE: VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x113/0x2a
And indeed, the disk "capadg01" appears failed:
juanca@hunahpu / # vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 auto:none - - online invalid
c9t0d92s2 auto:cdsdisk capadg08 capadg online nohotuse
c9t0d93s2 auto:cdsdisk capadg07 capadg online nohotuse
c9t0d94s2 auto:cdsdisk capadg06 capadg online nohotuse
c9t0d95s2 auto:cdsdisk capadg05 capadg online nohotuse
c9t0d96s2 auto:cdsdisk capadg03 capadg online nohotuse
c9t0d97s2 auto:cdsdisk capadg04 capadg online nohotuse
c9t0d98s2 auto:cdsdisk capadg02 capadg online nohotuse
c9t0d99 auto - - error
- - capadg01 capadg failed nohotuse was:c9t0d99s2
Weird thing is, the server never lost connectivity to the SAN, all the other LUNs are OK and without errors.
When I looked at the disk with format, it said it didn't have a label. So I labeled it, but it appears to be incorrect (there's no partition 2):
partition> pr
Current partition table (original):
Total disk sectors available: 2201993182 + 16384 (reserved sectors)
Part Tag Flag First Sector Size Last Sector
0 root wm 34 128.00MB 262177
1 swap wu 262178 128.00MB 524321
2 unassigned wm 0 0 0
3 unassigned wm 0 0 0
4 unassigned wm 0 0 0
5 unassigned wm 0 0 0
6 usr wm 524322 1.03TB 2201993181
8 reserved wm 2201993182 8.00MB 2202009565
I believe this is the reason why vxvm won't reattach the disk, but I have been unable to fix the disk's label. I also find very strage that the partition "prints differently" from a healthy disk:
partition> pr
Current partition table (original):
Total disk cylinders available: 9803 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 9802 899.88GB (9803/0/0) 1887195136
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 - wu 0 - 9802 899.88GB (9803/0/0) 1887195136
The bad disk prints out First Sector, Size and Last Sector; while the good disk prints out Cylinders, Size and Blocks.
Does anyone have any ideas? All help would be highly appreciated.
I've attached detailed logs and server config.
Kind Regards,
Juancarlos Chacon
jchacon@gmail.com
skype: juanca_chacon