cancel
Showing results for 
Search instead for 
Did you mean: 

vxfen

ousor
Level 4

hi, I wish to verify to fencing mechanism. Let`s say we have 10 nodes in vcs and we start all of the nodes.After the vcs is up and running one of those nodes write the keyregistration on coordinator device group.is right? Or maybe all the nodes write the key registration? How I verify the health of fencing DG?(Should I verify if DG has key registration from an node(all the nodes),then vxprint -ht?,what else?). When is happening an split brain,what is the mechanism that some of the 10 nodes remain in vcs and the rest crash?i.e. 6 nodes remain in vcs and 4 nodes crash. How I handle this split brain?I mean on the 6 nodes I delete the key registration,i repair the 4 nodes and join them to vcs? If I do not handle this scenario,then every of the 4 node when try to join vcs see that there is an key registration on DG fencing and will crash? tnx a million

1 ACCEPTED SOLUTION

Accepted Solutions

Ted_Summers
Level 3
Employee Accredited Certified

>>After the vcs is up and running one of those nodes write the key registration on coordinator device group.is right?
>>Or maybe all the nodes write the key registration?
A separate key will be registered on coordinator disks for every path to the disk from every host that can reach the disks.
 
So if you had 2 HBA's on 2 HOSTS in the cluster you would have a total of 4 keys on each coordinator disk
2 from the paths of the first host
2 from the paths of the second host.

>>How I verify the health of fencing DG?
(Should I verify if DG has key registration from an node(all the nodes),then vxprint -ht?,what else?).
vxprint -ht wouldn't be helpful in this case.

To check it, you can run the vxfenadm command, for example:
# vxfenadm -g -all -f (file)   <--- display keys of disks / paths in the file

For file you could actually use /etc/vxfentab for the list of the disks / paths.

>> When is happening an split brain,what is the mechanism that some of the 10 nodes remain in vcs and the rest crash?
There is a race for the coordinator disks and the keys, this is all described in the guides.
The loser of the race has it's keys removed from disk, ejected from cluster and panic (that is expected behavior)

>> i.e. 6 nodes remain in vcs and 4 nodes crash. How I handle this split brain?
Usually split brain occurs due to loss of network connectivity or system responsiveness issue.
If responsiveness, the reboot usually clears it and on boot back up, the system should rejoin the cluster.
In case of network connectivity loss, then the network connectivity needs to be resolved.

>> I mean on the 6 nodes I delete the key registration,i repair the 4 nodes and join them to vcs?
Definitely do not follow this course.
The vxfen driver when it can join properly will fix key registrations itself.
Manual changes to registration can result in the WORKING nodes panic also, bring the remaining cluster nodes down.

We can't recommend actions that would cause the whole thing to go down.
If in doubt - call support for assistance.
We fix these things on a case - by - case basis.
 

View solution in original post

5 REPLIES 5

mikebounds
Level 6
Partner Accredited

You can use the CoordPoint agent to monitor vxfen - see extract from VCS bundled agents guide:

Use the Coordination Point (CoordPoint) agent to monitor the registrations on the
different coordination points on each node.
In addition, the CoordPoint agent monitors changes to the Coordinator Disk Group
constitution, such as when a disk is accidently deleted from the Coordinator Disk
Group or if the VxVM private region of a disk is corrupted.

MIke

ousor
Level 4

Hi Mike,

If the path to DG fencing fails from all the vcs nodes,then the nodes in vcs will panic?

Marianne
Level 6
Partner    VIP    Accredited Certified

Part of a new cluster setup is to check each component (especially hardware) for Single Point Of Failure.

So, you should never get to a situation where all nodes in a cluster will lose all paths to fencing DG.

If you are scared of such a situation, you can add Coordination Point Servers.

Ted_Summers
Level 3
Employee Accredited Certified

>>After the vcs is up and running one of those nodes write the key registration on coordinator device group.is right?
>>Or maybe all the nodes write the key registration?
A separate key will be registered on coordinator disks for every path to the disk from every host that can reach the disks.
 
So if you had 2 HBA's on 2 HOSTS in the cluster you would have a total of 4 keys on each coordinator disk
2 from the paths of the first host
2 from the paths of the second host.

>>How I verify the health of fencing DG?
(Should I verify if DG has key registration from an node(all the nodes),then vxprint -ht?,what else?).
vxprint -ht wouldn't be helpful in this case.

To check it, you can run the vxfenadm command, for example:
# vxfenadm -g -all -f (file)   <--- display keys of disks / paths in the file

For file you could actually use /etc/vxfentab for the list of the disks / paths.

>> When is happening an split brain,what is the mechanism that some of the 10 nodes remain in vcs and the rest crash?
There is a race for the coordinator disks and the keys, this is all described in the guides.
The loser of the race has it's keys removed from disk, ejected from cluster and panic (that is expected behavior)

>> i.e. 6 nodes remain in vcs and 4 nodes crash. How I handle this split brain?
Usually split brain occurs due to loss of network connectivity or system responsiveness issue.
If responsiveness, the reboot usually clears it and on boot back up, the system should rejoin the cluster.
In case of network connectivity loss, then the network connectivity needs to be resolved.

>> I mean on the 6 nodes I delete the key registration,i repair the 4 nodes and join them to vcs?
Definitely do not follow this course.
The vxfen driver when it can join properly will fix key registrations itself.
Manual changes to registration can result in the WORKING nodes panic also, bring the remaining cluster nodes down.

We can't recommend actions that would cause the whole thing to go down.
If in doubt - call support for assistance.
We fix these things on a case - by - case basis.
 

ousor
Level 4

hi,

thank you very much.