VCS HA over Solaris 10

Caushiph_Unvar · ‎12-14-2011

Hi All,

I have Veritas HA cluster running over Solaris 10 x86 plateform. day before yesterday, the cluster stopped working and the file system gone. the admin in office at that time reboot the both servers and after that, disks are available in format but in vxdisk list they are in error mode. i tried to run vxdctl enable but even after that its not comming out of this error state. i know, that it happens with new assigned lun and the only thing you have to do is format and label it. and after that when you run vxdctl enable, it comes out of error status.

followin are the out put from the cluster

(root_sahomxapp01u)@/var/VRTSvcs/log # vxdisk -e list

DEVICE TYPE DISK GROUP STATUS

OS_NATIVE_NAME ATTR

c1t0d0s2 auto:none - - online invalid c1t0d0s2 -

emc_clariion0_0 auto:cdsdisk - - error c0t60060160CA401700584727D8D2ECE011d0s2 -

emc_clariion0_1 auto:cdsdisk - - error c0t60060160CA401700D431FBE3D2ECE011d0s2 -

vxdisk -o alldgs list

DEVICE TYPE DISK GROUP STATUS

c1t0d0s2 auto:none - - online invalid

emc_clariion0_0 auto:cdsdisk - - error

emc_clariion0_1 auto:cdsdisk - - error

(root_sahomxapp01u)@/var/VRTSvcs/log # hastatus -summ

-- SYSTEM STATE

-- System State Frozen

A sahomxapp01u RUNNING 0

A sahomxapp02u RUNNING 0

-- GROUP STATE

-- Group System Probed AutoDisabled State

B App_Group sahomxapp01u Y N OFFLINE|FAULTED

B App_Group sahomxapp02u Y N OFFLINE|STARTING|FAULTED

B ClusterService sahomxapp01u Y N ONLINE

B ClusterService sahomxapp02u Y N OFFLINE

-- RESOURCES FAILED

-- Group Type Resource System

D App_Group DiskGroup App_dg sahomxapp01u

D App_Group DiskGroup App_dg sahomxapp02u

-- RESOURCES ONLINING

-- Group Type Resource System IState

F App_Group DiskGroup App_dg sahomxapp02u W_ONLINE

The disk is available in format and i can see its partition table.

AVAILABLE DISK SELECTIONS:

0. c0t60060160CA401700D431FBE3D2ECE011d0 <DGC-RAID5-0226 cyl 4349 alt

2 hd 16 sec 3012>

/scsi_vhci/disk@g60060160ca401700d431fbe3d2ece011

1. c0t60060160CA401700584727D8D2ECE011d0 <DGC-RAID5-0226 cyl 4349 alt

2 hd 16 sec 3012>

/scsi_vhci/disk@g60060160ca401700584727d8d2ece011

2. c1t0d0 <DEFAULT cyl 17747 alt 2 hd 255 sec 63>

/pci@0,0/pci8086,3a40@1c/pci1014,3b2@0/sd@0,0

luxadm -e port ------------ shows proper connectivity

cfgadm -al ---------- shows proper configured controller,

no error found in /var/adm/messages and also while rebooting the server. can anyone suggest please where should i look into?

Marianne · ‎12-14-2011

All cluster and device level error messages are normally logged in /var/adm/messages, but syslog needs to be running. Please check/verify.

Also check if messages file was not renamed in the meantime (should be messages.0)

Please extract and post all the info for 2 days ago (when the error occurred) from engine_A.log (/var/VRTSvcs/log/).

Handy NetBackup Links

TonyGriffiths · ‎12-15-2011

Hi

Is the issue observed on all nodes in the cluster or just one ?

Can you confirm that the disk has a valid partiion table ?

cheers

tony

Caushiph_Unvar · ‎12-15-2011

Hi All,

Thanks you for reply, Logs are attached for your review.

@Tony: yes observed on both nodes, i'll share the partition table shortly

regards,

Marianne · ‎12-15-2011

You need to find out what led up to all the filesystem errors on system sahomxapp02u - your system log starts at 09:31 with 'message no 11' of a filesystem error.

Dec 12 09:31:07 sahomxapp02u vxfs: [ID 702911 kern.warning] WARNING: msgcnt 11 mesg 008: V-2-8: vx_direrr: vx_dirscan_2 - /var/mqm file system dir inode 2 dev/block 0/16967 dirent inode 0 error 6

'Something' happened to disks that seemed to have caused corruption.

VCS did not stop working - as you can see in the engine log, VCS was desperately trying to import the disk group, but could not find any valid diskgroup.

The good news is that VxVM is making regular backups of diskgroup configuration to /etc/vx/cbr/bk.

See https://sort.symantec.com/public/documents/sf/5.0/aix/manpages/vxvm/vxconfigrestore_1m.html

The vxconfigrestore utility is used to restore a disk group's configuration information if this has been lost or become corrupted. The disk group whose configuration is to be restore is specified either by name or by ID.

Any disks whose private region headers have become corrupted are reinstalled when the disk group configuration is restored.

If you don't feel comfortable attempting vxconfigrestore on your own, please log a support call. A support engineer will guide you through the steps.

Handy NetBackup Links

Caushiph_Unvar · ‎12-15-2011

Thank you Marianne for your consideration,

now after reading your reply, i'll first review the old /var/adm/messages and try to find the orignal reason for this corruption mean while, i'll do following

vxconfigrestore -n mydg

and then verify the configuration with vxprint -hrt , if found okay then

vxconfigrestore -p mydg

vxprint -hrt [verify again :) ]

vxconfigrestore -c mydg

please comment.

Regards,

Marianne · ‎12-16-2011

I will feel a lot more comfortable if the config restore is done with the assistance of a Symantec Support engineer who will first examine the contents of the diskgroup backups.

vxconfigrestore will restore the the private region of the disks, but if corruption occurred at filesystem level, you will need to restore from backup...

Handy NetBackup Links

Caushiph_Unvar · ‎12-19-2011

Yes Marianne ! the issue esclated to support and they found the filesystem level corrouption so suggested to restore from backup. Thanks alot for your guidance

-Regards

VOX

VCS HA over Solaris 10