Solved: Split brain occurs when all

Zahid_Haseeb · ‎04-10-2012

As a nom we have heard that while split brain (suppose a two node cluster) both nodes mount the "shared disk resource" and write simultaneously which cause data corruption. As a reference see below:

The VCS engine on a service group that will prevent that group from becoming on-line. Auto-disable is used as a way to prevent data corruption from occurring by avoiding a situation called split-brain. Split-brain is where data is being updated by two or more hosts simultaneously

http://www.symantec.com/business/support/index?page=content&id=TECH8436

Question: I am never able to forcely mount the DiskGroup on both nodes. So how the VCS can mount the Disk resource on both nodes (which leads data corruption) ... for example : I tried to import DiskGroup on the partner/idle node of cluster on which ServiceGroup is not online

Eric_Hennessey1 · ‎04-10-2012

Split brain occurs when all cluster communications links between cluster nodes are lost. In such a case, each node thinks the other is dead and the idle node attempts to bring the service group online while the active node already has it online.

When the cluster is running normally and at least one link is active, the cluster won't allow you to use cluster services to start the service group on the idle node, for example by issing hagrp -online. You can attempt to forcibly bring resources online via their native commands (vxdg import, mount, etc.), but VCS will detect the concurrency violation and bring the resources offline on the idle node.

View solution in original post

Wally_Heim · ‎04-10-2012

Hi Zahid,

The SCSI reservations put on the disk in the disk group should prevent the disk group from being imported on more than one server at a time. However, if there is a problem with the scsi reservation process (typically hardware or driver related) the disk group might be accessible by more than one node at a time.

Keep in mind that you are pointing to a SF-HA Unix article. I know you have SFW-HA Windows clusters and I'm not sure if you have SF-HA Unix clusters. SF-HA Unix and SFW-HA clusters perform disk operations in very different manners. What is mentioned for SF-HA Unix product about disk access concerns are not going translate into concers on SFW-HA Windows product.

Thank you,

Wally

Eric_Hennessey1 · ‎04-10-2012

Split brain occurs when all cluster communications links between cluster nodes are lost. In such a case, each node thinks the other is dead and the idle node attempts to bring the service group online while the active node already has it online.

When the cluster is running normally and at least one link is active, the cluster won't allow you to use cluster services to start the service group on the idle node, for example by issing hagrp -online. You can attempt to forcibly bring resources online via their native commands (vxdg import, mount, etc.), but VCS will detect the concurrency violation and bring the resources offline on the idle node.

Zahid_Haseeb · ‎04-10-2012

Thanks all for kind words

@ Wally

I know that the article is unix related. I just wanted to elaborate you(and I also read this at multiple places) that Split Brain can cause the data corruption. I tried to do mount the Disk resource on multiple nodes but always failed. SO I confused that how this could possible that Split Brain can cause the data loss.

Is there anyway to do a test in which we can mount the Disk Resource on both Cluster Nodes

Wally_Heim · ‎04-11-2012

Hi Zahid,

If everything is working correctly you will not be able to import the cluster disk group on more than one server at a time. However, when you are in a split brain situation, things are not working correctly.

Why do you want to test import a disk group on more than one node and put your data at take high risk of corruption? I don't have a test process for this and I would advise against putting your data at risk of corruption.

Thanks,

Wally

Zahid_Haseeb · ‎04-11-2012

Wally offcourse I will never want to put my data on Risk..I want to test this situation on a test environment because I felt that this(Disk Resource Mount) is not going to happen when one Node lock the DIsk Resource on it.

Anyway thanks for your kind words Wally :)

AlexTomasson · ‎07-17-2012

Are there any other ways to prevent split brain? I have been wondering why split brain occurs in the first place. We need to find a solution to the problem of data being updated twice.

Zahid_Haseeb · ‎07-17-2012

See the below link , It may help you:

https://sort.symantec.com/public/documents/sf/5.0/solaris/html/sf_rac_install/sfrac_intro13.html

Zahid_Haseeb · ‎09-17-2012

Thanks Wally

The SCSI reservations put on the disk in the disk group should prevent the disk group from being imported on more than one server at a time.

Any way to keep the disk safe if client only have HA/VCS and dont have vxvm.

VOX

How can a single shared drive can mount on both nodes while split brain