cancel
Showing results for 
Search instead for 
Did you mean: 

Troubleshooting a "failed" disk

Juancarlos_Chac
Not applicable

Hello all,

First things first, I have looked at this article to try and solve my problem, without success.

My server is configured this way:

SunOS hunahpu 5.9 Generic_122300-45 sun4u sparc SUNW,Netra-T12

System Configuration: Sun Microsystems  sun4u Sun Fire E2900
System clock frequency: 150 MHZ
Memory size: 49152 Megabytes

pkginfo  -l VRTSvxvm
   PKGINST:  VRTSvxvm
      NAME:  Binaries for VERITAS Volume Manager by Symantec
  CATEGORY:  system
      ARCH:  sparc
   VERSION:  5.0,REV=05.11.2006.17.55
   BASEDIR:  /
    VENDOR:  Symantec Corporation
      DESC:  Virtual Disk Subsystem
    PSTAMP:  Veritas-5.0-MP3.50:2008-08-17
  INSTDATE:  Nov 09 2009 18:41
   HOTLINE:  http://support.veritas.com/phonesup/phonesup_ddProduct_.htm
     EMAIL:  support@veritas.com
    STATUS:  completely installed
     FILES:     1007 installed pathnames
                  37 shared pathnames
                  19 linked files
                 111 directories
                 443 executables
              435179 blocks used (approx)
 

The server is connected to a NetApp FAS 3070 cluster..

On saturday night, we had to do UPS maintenance and powered off the servers.  When I powered this server up we saw the following error messages from vxvm:

May  5 02:10:24 hunahpu vxdmp: [ID 803759 kern.notice] NOTICE: VxVM vxdmp V-5-0-34 added disk array 3090882, datype = FAS3070
May  5 02:10:24 hunahpu vxdmp: [ID 803759 kern.notice] NOTICE: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk
May  5 02:10:24 hunahpu vxdmp: [ID 415628 kern.notice] NOTICE: VxVM vxdmp V-5-3-1700 dmpnode 275/0x0 has migrated from enclosure FAKE_ENCLR_SNO to enclosure DISKS
May  5 02:10:24 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x3f58 belonging to the dmpnode 275/0x28
May  5 02:10:24 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x15e0 belonging to the dmpnode 275/0x28
May  5 02:10:24 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x1d68 belonging to the dmpnode 275/0x28
May  5 02:10:24 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x2638 belonging to the dmpnode 275/0x28
May  5 02:10:24 hunahpu vxdmp: [ID 824220 kern.notice] NOTICE: VxVM vxdmp V-5-0-111 disabled dmpnode 275/0x28
May  5 02:10:57 hunahpu vxdmp: [ID 736771 kern.notice] NOTICE: VxVM vxdmp V-5-0-148 enabled path 32/0x15e0 belonging to the dmpnode 275/0x28
May  5 02:10:57 hunahpu vxdmp: [ID 899070 kern.notice] NOTICE: VxVM vxdmp V-5-0-147 enabled dmpnode 275/0x28
May  5 02:10:57 hunahpu vxdmp: [ID 736771 kern.notice] NOTICE: VxVM vxdmp V-5-0-148 enabled path 32/0x2638 belonging to the dmpnode 275/0x28
May  5 02:10:57 hunahpu vxdmp: [ID 736771 kern.notice] NOTICE: VxVM vxdmp V-5-0-148 enabled path 32/0x1d68 belonging to the dmpnode 275/0x28
May  5 02:10:57 hunahpu vxdmp: [ID 736771 kern.notice] NOTICE: VxVM vxdmp V-5-0-148 enabled path 32/0x3f58 belonging to the dmpnode 275/0x28
May  5 02:10:57 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x3f58 belonging to the dmpnode 275/0x28
May  5 02:10:57 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x15e0 belonging to the dmpnode 275/0x28
May  5 02:10:57 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x1d68 belonging to the dmpnode 275/0x28
May  5 02:10:57 hunahpu vxdmp: [ID 917986 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 32/0x2638 belonging to the dmpnode 275/0x28
May  5 02:10:57 hunahpu vxdmp: [ID 824220 kern.notice] NOTICE: VxVM vxdmp V-5-0-111 disabled dmpnode 275/0x28
May  5 02:11:00 hunahpu vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-546 Disk capadg01 in group capadg: Disk device not found
May  5 02:11:00 hunahpu vxesd[474]: [ID 360244 daemon.notice] Event Source daemon started
May  5 02:11:01 hunahpu vxesd[474]: [ID 342331 daemon.notice] HBA API Library Loaded
May  5 02:11:02 hunahpu vxdmp: [ID 238993 kern.notice] NOTICE: VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x113/0x2a
 

And indeed, the disk "capadg01" appears failed:

juanca@hunahpu / # vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
c1t0d0s2     auto:none       -            -            online invalid
c9t0d92s2    auto:cdsdisk    capadg08     capadg       online nohotuse
c9t0d93s2    auto:cdsdisk    capadg07     capadg       online nohotuse
c9t0d94s2    auto:cdsdisk    capadg06     capadg       online nohotuse
c9t0d95s2    auto:cdsdisk    capadg05     capadg       online nohotuse
c9t0d96s2    auto:cdsdisk    capadg03     capadg       online nohotuse
c9t0d97s2    auto:cdsdisk    capadg04     capadg       online nohotuse
c9t0d98s2    auto:cdsdisk    capadg02     capadg       online nohotuse
c9t0d99      auto            -            -            error
-            -         capadg01     capadg       failed nohotuse was:c9t0d99s2
 

Weird thing is, the server never lost connectivity to the SAN, all the other LUNs are OK and without errors.

When I looked at the disk with format, it said it didn't have a label.  So I labeled it, but it appears to be incorrect (there's no partition 2):

partition> pr
Current partition table (original):
Total disk sectors available: 2201993182 + 16384 (reserved sectors)

Part      Tag    Flag     First Sector          Size          Last Sector
  0       root    wm                34       128.00MB           262177
  1       swap    wu            262178       128.00MB           524321
  2 unassigned    wm                 0            0                0
  3 unassigned    wm                 0            0                0
  4 unassigned    wm                 0            0                0
  5 unassigned    wm                 0            0                0
  6        usr    wm            524322         1.03TB           2201993181
  8   reserved    wm        2201993182         8.00MB           2202009565
 

I believe this is the reason why vxvm won't reattach the disk, but I have been unable to fix the disk's label.  I also find very strage that the partition "prints differently" from a healthy disk:

partition> pr
Current partition table (original):
Total disk cylinders available: 9803 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders        Size            Blocks
  0 unassigned    wm       0               0         (0/0/0)             0
  1 unassigned    wm       0               0         (0/0/0)             0
  2     backup    wu       0 - 9802      899.88GB    (9803/0/0) 1887195136
  3 unassigned    wm       0               0         (0/0/0)             0
  4 unassigned    wm       0               0         (0/0/0)             0
  5 unassigned    wm       0               0         (0/0/0)             0
  6 unassigned    wm       0               0         (0/0/0)             0
  7          -    wu       0 - 9802      899.88GB    (9803/0/0) 1887195136
 

The bad disk prints out First Sector, Size and Last Sector; while the good disk prints out Cylinders, Size and Blocks.

Does anyone have any ideas?  All help would be highly appreciated.

I've attached detailed logs and server config.

Kind Regards,

Juancarlos Chacon
jchacon@gmail.com
skype: juanca_chacon

 

 

3 REPLIES 3

rsharma1
Level 5
Employee Accredited Certified

Hi Juancarlos,

                        The difference in partition o/p could be because the errored disk is formatted with EFI label but the healthy disk has a  SMI label.  LUNs greater than 1TB with Extensible Firmware Interface (EFI) disk label does not work in Veritas Volume Manager (VxVM 5.0) on Solaris 9                 

http://www.symantec.com/business/support/index?page=content&id=TECH194766

stinsong
Level 5

Juancarlos,

 

First of all, it's really strange that the DMP node be enabled and disabled when DMP module was up and configured. Maybe there is something change on the FC link or LUN profile.

But after all, you format the disk with new label which is EFI label changed it could not be recognized by VxVM. For a EFI label, there are 9 partitions with the partition 8 reserved for EFI area. For a SMI label 8 slices; numbered 0-7 with slice 2 being the whole disk. So from the output of your text we could confirm you format it into a EFI disk, which could be confirmed with Cylinders, Size and Blocks information, too.

So you need to change the EFI label into SMI label:

partition> label

[0] SMI Label
[1] EFI Label
Specify Label type[1]: 0
Warning: This disk has an EFI label. Changing to SMI label will erase all
current partitions.
Continue? y
Auto configuration via format.dat[no]?
Auto configuration via generic SCSI-2[no]?
partition> q

Then you should be able to use it in VxVM. But I'm not sure if the volume structure and data could be available still...

g_lee
Level 6

Juancarlos,

Are you sure c9t0d99 is the same disk that was presented to the system / has not changed since the failure?

The fact the disk was showing as unlabelled / you had to relabel suggests this wasn't the same disk that was presented earlier. Even if it was (and if you were using Solaris 10/configuration that supported EFI labels - see below), the fact that you've relabelled the disk would mean that you've removed/overwritten the vxvm partition (s3 for sliced, s7 for cds) so it would no longer see the dg details, which is why it would show in error.

The format output shows c9t0d99 is 1.03TB (>1TB) with an EFI label (after you relabelled the disk); the example "healthy"/working disk is 899.88GB (<1TB) with SMI label.

Disks >1TB need EFI labels on Solaris 9 & 10 - as rsharma1 pointed out earlier/above, EFI labels are only supported on Solaris 10 per the following technote:

LUNs greater than 1TB size with EFI label not supported in VxVM on Solaris 9
http://www.symantec.com/business/support/index?page=content&id=TECH194766

Can you check that c9t0d99 is definitely the same disk - if it is, do you have a prtvtoc of c9t0d99 prior to the failure for comparison / to try to restore the same configuration?

If it is not the same disk, is it possible to re-present the correct disk to the system, then rescan and see if it can see the dg information so you can try to reattach the disk.