Question about fencing and co-ordinator disks

TimBishop · ‎07-16-2008

We're running VCS 4.1 MP2 on Solaris. We have 4 nodes connected via a fibre fabric to 3 arrays. Each one of these arrays has a small LUN and we've put all three of these LUNs in to a fencedg disk group. This is the diskgroup we have in /etc/vxfendg.

This all works well when there are no failures. Today one of our arrays failed so we were down to only 2 disks in our fencing DG. When one of our nodes was rebooted it couldn't start it's fencing driver, which I believe was because only 2 disks were found in the fencing DG.

So I'm wondering how best to solve this. We'll shortly be adding another array so we could then have 4 disks in the DG, but 4 isn't an odd number, so that's out.

If we put 5 disks in the fencing DG (so put 2 on one array, and 1 on the other three), that'd seem to work. Unless an array fails and we get a split brain at the same time, at which point we only have 4 disks and the fight over the co-ordinator disks could tie.

So I'm not really sure what the best course of action is. Any advice?

Thanks guys.

Tim.

AmitC · ‎07-16-2008

Hi Tim,

Replies are inline to your queries.

When one of our nodes was rebooted it couldn't start it's fencing driver, which I believe was because only 2 disks were found in the fencing DG.

=> That's correct. Always three disks are required for fencing DG. Three disks provide protection against a single disk failure, and still allow a node to gain 2 of 3 disks on a membership change.

We'll shortly be adding another array so we could then have 4 disks in the DG, but 4 isn't an odd number, so that's out.

=> That's correct. Number should be always odd but the point is Adding additional disks (3+) does not add any additional capability or resiliency.

If we put 5 disks in the fencing DG (so put 2 on one array, and 1 on the other three), that'd seem to work. Unless an array fails and we get a split brain at the same time, at which point we only have 4 disks and the fight over the co-ordinator disks could tie.

=> It's not necessary to have coordinator disks split across all arrays. Another point is Mirroring the LUN’s at the hardware level will prevent a physical disk failure from affecting the cluster. At the same time, you cannot mirror the coordinator disks using VxVM, as this would present an even number of disks.

Replacing the coordinator disks online feature is also planned for the latest VCS release, so that will eliminate all the issues.

Regards,
Amit Chauhan
Veritas Certified HA Designer

Peter_Beno · ‎07-30-2008

Hi all,

my question is about dmp and coordinator disks.

2 nodes in a cluster, we have coordisks on 3 luns from emc storage DMX3.(vxfen_mode=scsi3, scsi3_disk_policy=dmp).

We need to change san configuration, so i'll see the coordinator disks(same luns) still as they are, but on the different emc ports (different wwn's). Can we then remove previous vxdmp paths online? Because paths in /etc/vxfentab which is generated dynamically will change.

It is nessecary to do something from me, or all will be transparent for vxfen due to vxdmp?

Thanx,

peter

Captain_Veritas · ‎07-30-2008

Peter,

The change should be transparent because of Volume Manager.

Since the private region on the disks contains the disk media name, the disk access name can change, but volume manager will recognize that a named disk (like coordg1) moved from /dev/dsk/blahblahblah to /dev/blahblahblue, and make the adjustment accordingly. You should not have to do anything.

Captain_Veritas · ‎07-30-2008

I need to clarify something here...

=> That's correct. Always three disks are required for fencing DG. Three disks provide protection against a single disk failure, and still allow a node to gain 2 of 3 disks on a membership change.

This isn't quite cricket. The purpose of 3 disk in the coord_dg is so that you have an odd number to break the tie. 1 disk won't do it, you want to have a "best of" race. An even number won't work, you may get a tie. Would 5 work? maybe... but this is a race. You want to determine who stays and who goes quickly, and move on. So as they sing in School House Rock, Three is the magic number.

To say that a node gains 2 of the 3 disks, indicates to me an assumption of a quorum implementation.

This is not really the case. I/O Fencing is the Veritas approach to a quorum-like solution.

But there are no votes in I/O Fencing. Just wanted to clear that up.

Captain_Veritas · ‎07-30-2008

Forgot to add, if the coordinator disks are not redundant on the disk array (and they should always be redundant on the disk array), you need to fix the failed disk ASAP if one should go south.

The worst case scenario is where you are down to 2 disks and you have to run a race.

Yeah, the winner will likely get both, but if a hiccup occurs, and each one gets one, that's all sorts of no good. This is why "they should always be redundant on the disk array".

Marianne · ‎05-12-2017

Moved:

VOX

Question about fencing and co-ordinator disks