Replacing/Restoring Failed Drive
Yestarday, I had two drives in my storage san fail. The san monitor reports that all the physical drives are fine.
bash-2.03# vxdisk list | grep AMS
AMS_WMS0_0 auto:cdsdisk remote996006 remote9960 online
AMS_WMS0_1 auto:cdsdisk remote996004 remote9960 online
AMS_WMS0_2 auto:cdsdisk remote996013 remote9960 online
AMS_WMS01_0 auto - - error
AMS_WMS01_1 auto - - error
AMS_WMS01_2 auto:cdsdisk remote996012 remote9960 online
AMS_WMS012_0 auto:cdsdisk remote996008 remote9960 online
AMS_WMS012_1 auto:cdsdisk remote996002 remote9960 online
AMS_WMS012_2 auto:cdsdisk remote996011 remote9960 online
AMS_WMS0123_0 auto:cdsdisk remote996007 remote9960 online
AMS_WMS0123_1 auto:cdsdisk remote996001 remote9960 online
AMS_WMS0123_2 auto:cdsdisk remote996010 remote9960 online
- - remote996003 remote9960 failed was:AMS_WMS01_1
- - remote996005 remote9960 failed was:AMS_WMS01_0
The devices on lines 5/6 are new. A result of me attempting to repair the two failed drives at the bottom.
So far i've unmounted the volumes associated with the drives, but there are 11 more mounts attached to the Disk Group. Being new, I'm not sure what processes and systems are using those volumes and am reluctant to unmount them at the moment.
I followed several guides trying to determine the cause of the problem and/or to restore the two disks. Here's some of the results of my effort so far.
> vxprint -htg remote9960
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME NVOLUME KSTATE STATE
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
dg remote9960 default default 126000 1032204684.1590.aeneas
dm remote996001 AMS_WMS0123_1 auto 2048 1677616896 -
dm remote996002 AMS_WMS012_1 auto 2048 1677616896 -
dm remote996003 - - - - NODEVICE
dm remote996004 AMS_WMS0_1 auto 2048 1677616896 -
dm remote996005 - - - - NODEVICE
dm remote996006 AMS_WMS0_0 auto 2048 1677616896 -
dm remote996007 AMS_WMS0123_0 auto 2048 1258212096 -
dm remote996008 AMS_WMS012_0 auto 2048 1258212096 -
dm remote996010 AMS_WMS0123_2 auto 2048 2094367888 -
dm remote996011 AMS_WMS012_2 auto 2048 2094367888 -
dm remote996012 AMS_WMS01_2 auto 2048 2094367888 -
dm remote996013 AMS_WMS0_2 auto 2048 2094367888 -
v rem-01 - DISABLED ACTIVE 1921724416 SELECT - fsgen
pl rem-01-02 rem-01 DISABLED NODEVICE 1921843200 STRIPE 4/64 RW
sd remote996004-01 rem-01-02 remote996004 36096 480460800 0/0 AMS_WMS0_1 ENA
sd remote996003-01 rem-01-02 remote996003 36096 480460800 1/0 - NDEV
sd remote996002-01 rem-01-02 remote996002 36096 480460800 2/0 AMS_WMS012_1 ENA
sd remote996001-01 rem-01-02 remote996001 36096 480460800 3/0 AMS_WMS0123_1 ENA
v rem-02 - DISABLED ACTIVE 1921724416 SELECT - fsgen
pl rem-02-02 rem-02 DISABLED NODEVICE 1921843200 STRIPE 4/64 RW
sd remote996004-02 rem-02-02 remote996004 480496896 480460800 0/0 AMS_WMS0_1 ENA
sd remote996003-02 rem-02-02 remote996003 480496896 480460800 1/0 - NDEV
sd remote996002-02 rem-02-02 remote996002 480496896 480460800 2/0 AMS_WMS012_1 ENA
sd remote996001-02 rem-02-02 remote996001 480496896 480460800 3/0 AMS_WMS0123_1 ENA
v rem-03 - DISABLED ACTIVE 1000341504 SELECT - fsgen
pl rem-03-01 rem-03 DISABLED NODEVICE 1000396800 STRIPE 2/64 RW
sd remote996006-02 rem-03-01 remote996006 106826496 419443200 0/0 AMS_WMS0_0 ENA
sd remote996008-03 rem-03-01 remote996008 1176381696 80755200 0/419443200 AMS_WMS012_0 ENA
sd remote996005-02 rem-03-01 remote996005 106826496 419443200 1/0 - NDEV
sd remote996007-03 rem-03-01 remote996007 1176381696 80755200 1/419443200 AMS_WMS0123_0 ENA
v rem-04 - DISABLED ACTIVE 1921724416 SELECT - fsgen
pl rem-04-02 rem-04 DISABLED NODEVICE 1921843200 STRIPE 4/64 RW
sd remote996004-03 rem-04-02 remote996004 960957696 480460800 0/0 AMS_WMS0_1 ENA
sd remote996003-03 rem-04-02 remote996003 960957696 480460800 1/0 - NDEV
sd remote996002-03 rem-04-02 remote996002 960957696 480460800 2/0 AMS_WMS012_1 ENA
sd remote996001-03 rem-04-02 remote996001 960957696 480460800 3/0 AMS_WMS0123_1 ENA
v rem-08 - ENABLED ACTIVE 2097152000 SELECT rem-08-01 fsgen
pl rem-08-01 rem-08 ENABLED ACTIVE 2097177600 STRIPE 2/64 RW
sd remote996008-01 rem-08-01 remote996008 36096 1048588800 0/0 AMS_WMS012_0 ENA
sd remote996007-01 rem-08-01 remote996007 36096 1048588800 1/0 AMS_WMS0123_0 ENA
v rem-30 - DISABLED ACTIVE 1887436800 SELECT - fsgen
pl rem-30-01 rem-30 DISABLED NODEVICE 1887436800 STRIPE 2/64 RW
sd remote996006-03 rem-30-01 remote996006 526269696 943718400 0/0 AMS_WMS0_0 ENA
sd remote996005-03 rem-30-01 remote996005 526269696 943718400 1/0 - NDEV
v rem-40 - ENABLED ACTIVE 2097152000 SELECT rem-40-01 fsgen
pl rem-40-01 rem-40 ENABLED ACTIVE 2097254400 STRIPE 4/64 RW
sd remote996013-01 rem-40-01 remote996013 36096 524313600 0/0 AMS_WMS0_2 ENA
sd remote996012-01 rem-40-01 remote996012 36096 524313600 1/0 AMS_WMS01_2 ENA
sd remote996011-01 rem-40-01 remote996011 36096 524313600 2/0 AMS_WMS012_2 ENA
sd remote996010-01 rem-40-01 remote996010 36096 524313600 3/0 AMS_WMS0123_2 ENA
v rem-41 - ENABLED ACTIVE 2097152000 SELECT rem-41-01 fsgen
pl rem-41-01 rem-41 ENABLED ACTIVE 2097254400 STRIPE 4/64 RW
sd remote996013-02 rem-41-01 remote996013 524349696 524313600 0/0 AMS_WMS0_2 ENA
sd remote996012-02 rem-41-01 remote996012 524349696 524313600 1/0 AMS_WMS01_2 ENA
sd remote996011-02 rem-41-01 remote996011 524349696 524313600 2/0 AMS_WMS012_2 ENA
sd remote996010-02 rem-41-01 remote996010 524349696 524313600 3/0 AMS_WMS0123_2 ENA
v rem-42 - ENABLED ACTIVE 2097152000 SELECT rem-42-01 fsgen
pl rem-42-01 rem-42 ENABLED ACTIVE 2097254400 STRIPE 4/64 RW
sd remote996013-03 rem-42-01 remote996013 1048663296 524313600 0/0 AMS_WMS0_2 ENA
sd remote996012-03 rem-42-01 remote996012 1048663296 524313600 1/0 AMS_WMS01_2 ENA
sd remote996011-03 rem-42-01 remote996011 1048663296 524313600 2/0 AMS_WMS012_2 ENA
sd remote996010-03 rem-42-01 remote996010 1048663296 524313600 3/0 AMS_WMS0123_2 ENA
v rem-43 - ENABLED ACTIVE 2085427200 SELECT rem-43-01 fsgen
pl rem-43-01 rem-43 ENABLED ACTIVE 2085427200 STRIPE 4/64 RW
sd remote996013-04 rem-43-01 remote996013 1572976896 521356800 0/0 AMS_WMS0_2 ENA
sd remote996012-04 rem-43-01 remote996012 1572976896 521356800 1/0 AMS_WMS01_2 ENA
sd remote996011-04 rem-43-01 remote996011 1572976896 521356800 2/0 AMS_WMS012_2 ENA
sd remote996010-04 rem-43-01 remote996010 1572976896 521356800 3/0 AMS_WMS0123_2 ENA
v rimg02 - DISABLED ACTIVE 944793600 SELECT - fsgen
pl rimg02-01 rimg02 DISABLED NODEVICE 944793600 STRIPE 4/64 RW
sd remote996004-04 rimg02-01 remote996004 1441418496 236198400 0/0 AMS_WMS0_1 ENA
sd remote996003-04 rimg02-01 remote996003 1441418496 236198400 1/0 - NDEV
sd remote996002-04 rimg02-01 remote996002 1441418496 236198400 2/0 AMS_WMS012_1 ENA
sd remote996001-04 rimg02-01 remote996001 1441418496 236198400 3/0 AMS_WMS0123_1 ENA
> dxadm
Select an operation to perform: 5
Select a removed or failed disk [<disk>,list,q,?] remote996003
VxVM ERROR V-5-2-1985 No devices are available as replacements for remote996003.
Select a removed or failed disk [<disk>,list,q,?] remote996005
VxVM ERROR V-5-2-1985 No devices are available as replacements for remote996005.
I attemped to reattached the failed disk:
bash-2.03# /etc/vx/bin/vxreattach -c remote996003
VxVM vxdisk ERROR V-5-1-537 Device remote996003: Not in the configuration
VxVM vxdisk ERROR V-5-1-558 Disk remote996003: Disk not in the configuration
bash-2.03# /etc/vx/bin/vxreattach -c remote996005
VxVM vxdisk ERROR V-5-1-537 Device remote996005: Not in the configuration
VxVM vxdisk ERROR V-5-1-558 Disk remote996005: Disk not in the configuration
bash-2.03# vxdisk clearimport AMS_WMS01_1
VxVM vxdisk ERROR V-5-1-531 Device AMS_WMS01_1: clearimport failed:
Disk device is offline
I think this part is where I created the two duplicate devices.
From this point, I'm going to have to step back and seek out guidance before I cause further problems.
Thanks to the help of Shane in Symantec support, we found and resolved the problem.
#> vxdmpadm listctlr all
CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME
=====================================================
c1 Disk ENABLED Disk
c8 AMS_WMS ENABLED AMS_WMS012
c6 AMS_WMS DISABLED AMS_WMS012
c8 AMS_WMS ENABLED AMS_WMS0
c6 AMS_WMS DISABLED AMS_WMS0
c8 AMS_WMS ENABLED AMS_WMS01
c6 AMS_WMS DISABLED AMS_WMS01
c8 AMS_WMS ENABLED AMS_WMS0123
c6 AMS_WMS DISABLED AMS_WMS0123
c8 GENESIS ENABLED GENESIS0The c8 was disabled because the Fiber port was bad. After we replaced the port, we ran:
#> vxdctl enable
This rescanned everything and brought the c8 controller back online. From there, we just had to unmount the other filesystems that were on the device, export the entire disk group, then reimport them.
We did have to fix the plex as well, but as far as others searching for possible solutions to similar problems, this will hopefully get them a long way towards a fast fix.