VxVM and VCS Stretch Cluster - Data Loss / Split-B...

bob2_skew · ‎02-15-2009

I posted a question here on the HA forum (https://forums.symantec.com/syment/board/message?board.id=1&thread.id=3173) and have been advised that my question may be better posted here.

In a VCS stretched cluster using VxVM to mirror data between the sites (v5.0), will VxVM provide a message, warning or error when the ISL's SAN fabrics between the sites are broken? How long should this notification take? The volumes have been presented as Dynamic, Mirrored Concatenated and in VCS the VMDg force import setting is set to true. When this VMDg is set to true (or 1) it does say that split-brain and data loss are possible.

Is my understanding of an ISL link failure correct with VCS/VxVM? For example, a failure situation occurs where all ISL communications are lost between sites while the application is active in the Production site. In this case, all resource groups will continue to function normal as the storage is alive at the production site – no data being mirrored to the DR site at this point as all ISL links are down.

An hour after this ISL link failure occurs, the production node (or indeed site) experiences a failure which causes its groups to fail over to the DR node. At this point, what does the DR node do? Will it bring this group online and import the disks from the DR array and thus be missing the last hour’s transactions since no ISLs where available to update the remote array?

rhanley · ‎03-31-2009

In a VCS stretched cluster using VxVM to mirror data between the sites (v5.0), will VxVM provide a message, warning or error when the ISL's SAN fabrics between the sites are broken? How long should this notification take? The volumes have been presented as Dynamic, Mirrored Concatenated and in VCS the VMDg force import setting is set to true. When this VMDg is set to true (or 1) it does say that split-brain and data loss are possible.

There are really two different products at work here, so let's try to separate them out so this makes a little more sense.

From an SFW perspective (or Volume Manager), you create a streteched mirror for all disks/volumes in your Disk Group and all is well.

If the ISL link to the secondary array fails, then you will see vxio write errors in the System Event Log on the server, and then those disks will enter a degraded state (yellow explanation point on the disk/volume in VEA). When this occurs, the mirror operation is stopped and data is only being written to the local avaialble array. This remains the case until manual intervention is performed.

This behavior is seen as soon as a single write fails to the mirrored disk. The disk enters a degraded state and the mirror is broken. Once the issue is resolved, the disk can be reactivated and this will start the mirror resynch process to get the mirrored plexes back in synch.

So, notification is immediate and can be seen in System Event Log or in VEA. Issues should not be seen in VCS by this ISL failure, assuming SFW still has access to 50% of the disks in the disk group (which will be the case if you have the same number of disks from each array configured in the Disk Group).

=============================================================

Is my understanding of an ISL link failure correct with VCS/VxVM? For example, a failure situation occurs where all ISL communications are lost between sites while the application is active in the Production site. In this case, all resource groups will continue to function normal as the storage is alive at the production site – no data being mirrored to the DR site at this point as all ISL links are down.

This is correct and the above scenerio provided fits this situation perfectly. What needs to be understood from a VCS (Cluster) perspective is that in order for a Disk Group to remain imported and available, then SFW must maintain a reservation on at least 50% of the disks. If the primary array had 10 disks that were mirrored to the secondary array where another 10 disks resided, all would be well because SFW would maintain a disk reservation on 50% of the disks...

This is where the issue with Force importing comes into play because a Disk Group can remain online with 50% reservation, but if a Disk Group is offline and you attempt to bring it online, it needs > 50% of the disks in order to import the DG. To get around this, you have to force import it, and the issue being that since we can't confirm a majority reservation, the Disk Group could be partially online on another node, thus you enter a split brain scenario. In a stretched mirror configuration with a down ISL, this could certainly occur if you force imported the Disk Group on each side, though most likely another cluster resource would fail (i.e. IP, Lanman) since those could only be online on one node in the cluster at at ime.

=============================================================

An hour after this ISL link failure occurs, the production node (or indeed site) experiences a failure which causes its groups to fail over to the DR node. At this point, what does the DR node do? Will it bring this group online and import the disks from the DR array and thus be missing the last hour’s transactions since no ISLs where available to update the remote array?

This is an excellent question, and I'll need to research this a bit more. With my explanation above, it's clear that if a fail-over did occur, the Disk Group would fail to import unless your DR side had more disks than the primary and it was able to get 51+% of the disks reserved. I suspect you have an even number of disks, so this would fail and you would need to force import it.

Upon force importing, the production luns would show as missing, and the surviving stale mirrored plexes would be in a degraded state, but that doesn't mean that writes will fail to it. I'll have to run a test with this to determine how this would be handled.

I'll provide my findings to you ASAP (hopefully tomorrow)

On a side-note, having a single LUN from a 3rd array could help avoid such split-brain issues since only one of the servers would be able to access this single lun from the 3rd array, and that would be the node that could successfully online the Disk Group. Otherwise, you always will run into issues if one of your arrays goes down and a failover needs to occur.

TomerG · ‎04-01-2009

Just FYI, in this vein, there is a utility that has been updated and better documented in the SF 5.0MP3 version: vxsplitlines. It is there to help in situations where a diskgroup is un-importable due to Serial Split Brain (SSB) which is the situation caused by a disk group getting imported simulaneously on multiple hosts when it shouldn't be... therefore ending up with 2 (usually equal) set of disks that disagree on disk group metadata - like timestamps and such.

rhanley · ‎04-02-2009

Hi Tomer,
I believe that tool is only available on the Unix side, but I just realized Bob didn't mention if this was on the Windows or Unix side, I just assumed Windows :)

As for the situation where the ISL goes down and the mirror breaks, if a fail-over were to occur and the DG was brought online on the secondary (by forcing it since it wouldn't have majority), the volume would be accessible and you would encounter an issue where your application comes up with stale data (since the ISL went down).

To make matters worse, when the ISL is brought back up, the stale volume will have the later TID (transaction ID) so the resynch will occur in the opposite direction and the changes that occurred between the ISL going down and the failover would be lost.

Unfortuantely, the best option is to allow the DG to fail because it doesn't have a majority, and then require manual intervention of the user bringing the DG online manually by forcing it online form the command line. (vxdg -g<DG_Name> -f import).

I hope this helps.

- Robert

VOX

VxVM and VCS Stretch Cluster - Data Loss / Split-Brain query