Re: System Fault and Cluster Interconnect Testing ...

Muhammad_Iqbal_ · ‎02-18-2006

Hi ,

I installed the 2 nodes cluster for on HA Application. everthing working fine and I can switch the SGs between nodes and if I reboot one node , SGs failovering to the other node.

But when I tested for system faulted (remove the power cable) or cluster inconnet , in both case SGs are not fail over from faulted system to other system. Please help me out .

Regards,

Muhammad Iqbal

Gene_Henriksen · ‎02-18-2006

questions:

1 - What do you mean be "reboot"? Are you typing "reboot" or doing a shutdown -i6 or an init 6?

2 - Can you send the output of "gabconfig -a" before and after the power off?

3 - do you have any criticial resources in the service groups?

The reason for the questions:
a shutdown -i6 or init 6 will run the rc scripts and VCS will shut down with "hastop -local -evacuate" which will migrate SGs to the other system.
If gabconfig -a doesn't show port b, then IO fencing is not configured
Without critical resources, a SG will not failover after a fault or system failure.

Muhammad_Iqbal_ · ‎02-18-2006

Hi Gene,

reboot mean init 6 , and it working fine . all SGs from rebooted system are fail overing on other syste. here is gabconfig out put:

root@dmc01 # gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 730d07 membership 01
Port b gen 730d09 membership 01
Port h gen 730d0b membership 01

======================================

proble is we remove the both inteconnect cable one system going to reboot and SGs are not failovering on other system. Same if you remove the power cable from one system. All Disk Groups resorces trying up on other system and then failed giving error that can not import see deatil in below:

I found some error in /var/adm/messages file .

Feb 15 10:59:19 dmc01 vxfen: NOTICE: VCS FEN WARNING V-11-1-43 Failed to preempt and abort node 1 from disk
Feb 15 10:59:19 dmc01 with serial number 600C0FF0000000000983A8688F557302
Feb 15 10:59:19 dmc01 vxfen: NOTICE: VCS FEN NOTICE V-11-1-23 Starting to eject leaving node(s) from data disks.
Feb 15 10:59:19 dmc01 vxfen: NOTICE: VCS FEN ERROR V-11-1-19 Call into volume manager failed. Unable to fence off data disks
Feb 15 10:59:19 dmc01 vxfen: NOTICE: VCS FEN NOTICE V-11-1-22 Completed ejection of leaving node(s) from data disks.

----------------------------------------------------------

Feb 15 16:25:26 dmc01 last message repeated 2 times
Feb 15 16:25:45 dmc01 Had: VCS ERROR V-16-1-1004 (dmc02) DiskGroup:dserver_DG:online:** ERROR: vxdg import failed on Disk Group dserver_dg after vxdctl enable
Feb 15 16:26:27 dmc01 Had: VCS ERROR V-16-1-1003 (dmc02) DiskGroup:sms_DG:online:** ERROR: vxdg import (force) failed on Disk Group sms_dg
Feb 15 16:26:59 dmc01 Had: VCS ERROR V-16-1-1004 (dmc02) DiskGroup:sms_DG:online:** ERROR: vxdg import failed on Disk Group sms_dg after vxdctl enable
Feb 15 16:28:46 dmc01 Had: VCS ERROR V-16-1-13066 (dmc02) Agent is calling clean for resource(dserver_DG) because the resource is not up even after online completed.
Feb 15 16:29:29 dmc01 Had: VCS ERROR V-16-1-1003 (dmc02) DiskGroup:dserver_DG:online:** ERROR: vxdg import (force) failed on Disk Group dserver_dg
Feb 15 16:30:01 dmc01 Had: VCS ERROR V-16-1-1004 (dmc02) DiskGroup:dserver_DG:online:** ERROR: vxdg import failed on Disk Group dserver_dg after vxdctl enable
Feb 15 16:30:03 dmc01 Had: VCS ERROR V-16-1-13066 (dmc02) Agent is calling clean for resource(sms_DG) because the resource is not up even after online completed.
Feb 15 16:30:27 dmc01 vxvm:vxconfigd: V-5-1-5760 sms_dg: dg import with I/O fence enabled
Feb 15 16:30:27 dmc01 vxvm:vxconfigd:
Feb 15 16:30:27 dmc01 vxfen: NOTICE: VCS FEN INFO V-11-1-34 The ioctl VXFEN_IOC_CLUSTSTAT returned 0
Feb 15 16:30:28 dmc01 last message repeated 2 times
Feb 15 16:30:47 dmc01 Had: VCS ERROR V-16-1-1003 (dmc02) DiskGroup:sms_DG:online:** ERROR: vxdg import (force) failed on Disk Group sms_dg
Feb 15 16:31:19 dmc01 Had: VCS ERROR V-16-1-1004 (dmc02) DiskGroup:sms_DG:online:** ERROR: vxdg import failed on Disk Group sms_dg after vxdctl enable
Feb 15 16:32:09 dmc01 Had: VCS ERROR V-16-1-1003 (dmc02) DiskGroup:dserver_DG:online:** ERROR: vxdg import (force) failed on Disk Group dserver_dg
Feb 15 16:32:40 dmc01 Had: VCS ERROR V-16-1-1004 (dmc02) DiskGroup:dserver_DG:online:** ERROR: vxdg import failed on Disk Group dserver_dg after vxdctl enable

Gene_Henriksen · ‎02-19-2006

Did you test both the data DGs and the fencing DG with vxfentsthdw? Be sure to run it with -r for read only on your data DGs.

for the fencing DG
vxfentsthdw -g

for the data DG
vxfentsthdw -rg

If it isn't in your execute path, vxfentsthdw is in /opt/VRTSvcs/vxfen/bin

It is important to verify that the disks and disk array support SCSI3 PR. Read the man pages on vxfentsthdw and vxfenadm.

See the Install Guide for VCS pages 116 and following (on 4.1) for a discussion of testing the hardware.

When we set up the VERITAS training classrooms, it took a wile for us to discover all the correct settings for the Hitachi arrays.

I searched support.veritas.com for the error message numbers you have and couldn't find anything.

Robert_Bailey_3 · ‎05-18-2006

Looks like a few issues here, but all around the i/o fencing configuration. My guess is that i/o fencing is not properly set up and that you did not set up UseFence = SCSI3, so there is no safeguard in the VCS startup sequence.

There are many ways of testing I/O fencing, but here are a few ways to see if it's a configuration at your end, or the grounds for a support case.

Check if VCS is set to restrict startup without I/O fencing running
1. haclus -value UseFence
Proper response> SCSI3
Improper config response> NONE

Check if I/O Fencing keys are on your coordinator disks
1. Get coordinator disk group:
cat /etc/vxfendg
2. Check if 3 disks in coordinator dg and in deport state:
vxdisk -o alldgs list | grep $your_coord_dg
3. Check if the correct type of key is on each your coordinator disks:
vxfenadm -g /dev/charpath/disk_name
example output>

Reading SCSI Registration Keys...

Device Name: /dev/vx/rdmp/EMC0_2
Total Number Of Keys: 2
key:
Key Value : 65,45,45,45,45,45,45,45
Key Value : A-------
key:
Key Value : 66,45,45,45,45,45,45,45
Key Value : B-------

Node id 0=A, 1=B, 2=C, etc.

Check keys on your data disks:
vxfenadm -g /dev/char/path/disk

Reading SCSI Registration Keys...

Device Name: /dev/vx/rdmp/EMC0_7
Total Number Of Keys: 2
key:
Key Value : 65,80,71,82,48,48,48,49
Key Value : APGR0001
key:
Key Value : 66,80,71,82,48,48,48,49
Key Value : BPGR0001

Node id 0=A, 1=B, 2=C, etc.

Coordinator Keys must look like A------ and Data Disk Keys must look like APGR000?
The end numbering on the data disks shows master import order of the disks disk group so i put a "?" there.

If any of these tests fail, then I/O fencing is not configured properly and it's back to the manual. If they are successful, you are probably looking at opening a support case.

Hope this helps

Mohit_Vohra · ‎07-29-2008

Hi Robert,

I agree with all your steps to test if fencing is configured properly or not except for the first one, i.e. running haclus -value UseFence. This step is valid only when you want to implement IO Fencing using VCS by editing the main.cf file, but this step isn't required if you dont want to use VCS to implement Fencing.

By simply adding the line "vxfen_mode=scsi3" in the /etc/vxfenmode file, and then restarting vxfen driver, you should be able to configure IO fencing without using VCS. The only area where VCS is required is when you want to restart vxfen driver, as at that time unless VCS and CVM are shut down, vxfen driver cannot be stopped and hence restarted.

Please let me know your views on the comments mentioned above.

Thanks and Regards,

Mohit Vohra

PROTON · ‎09-02-2009

Hi Mohit,

But then why would we use Fencing without VCS? whats the use of Fencing then.........

VOX

System Fault and Cluster Interconnect Testing with IO Fencing