Best practice for a two-node campus cluster

Balthier35 · ‎05-03-2014

I have set up a two-node campus cluster with the following specifications

SiteA
Node1 is a virtual machine residing on a ESXi 5.1 host in this site
Disk1 is a LUN in an enclosure in this site

SiteB
Node2 is a virtual machine residing on a ESXi 5.1 host in this site
Disk2 is a LUN in an enclosure in this site

So I install the microsoft failover cluster, and Veritas SFW. I create a dynamic disk group, that contains Disk 1 and Disk2. On this VMDG, I create a mirrored dynamic volume. I set DRL on both disks. This VMDG is then presented to the failover cluster. The quorum type on the failover cluster is a file share witness, that is located on Site C.

So the question here is, what happens to the disk group, if one of the sites go down? The failover cluster will continue to be online, because the file share witness is at Site C, but as I can read in the Admin Guide of SFW, the VMDG can not be brought online, unless its forced online, since the remaining cluster node can not reserve a majority of disks in the VMDG.

Is there any way I can circumvent this behaviour, so the remaining node brings the VMDG online automatically, in case Site A or B fails?

What are the recommended best practices for such scenarios? How does setting DRL on both disks in the VMDG play into this, will it prevent data corruption?

Wally_Heim · ‎05-04-2014

hi Balthier35,

The default behavior is that the clustered diskgroup will need to have greater than 50% to import. However, it will stay online with a minimum of 50% of the disks in the diskgroup. in other words, if you loss 1 array (50% of the disks) while the group is online it will stay online as long as you have a complete plex.

It is not recommended to alter the settings to allow the diskgroup to be imported with 50% of the disks or less. That will cause a the potential for split brain situation where the group is online on both nodes at the same time.

DRL is used for quickly resynce the mirror when the node crashes. This will only come into play if the site that is online is lost and then it comes back online. Then the DRO will be used to resync the mirror. Basically, the DRL tracks writes that are in progress and not completed.

thank you,

Wally

Wally_Heim · ‎05-04-2014

Hi Balthier35,

Since you mentioned that you have 3 sites, it might be better to configure a 3 way mirror with Site A, B and C. Then if either site A, B or C goes down, the site that can still see 2 of the 3 arrays, will be able to come online with no changes that could cause you problems with split brain.

The disk from site C does not actually have to be large. You don't have to have a mirror plex there to use that site as a try breaker type of disk. It just has to be part of the disk group.

Thank you,

Wally

VOX

Best practice for a two-node campus cluster