cancel
Showing results for 
Search instead for 
Did you mean: 

automatic resolution of split-brain

hraju
Level 4
Partner Accredited Certified
Hello all,

I was thinking about, if it would be possible to automatically resolve split-brain.
I have configuration of two servers with no shared storage. I use replication between servers.
My idea - network between servers is down - split brain occurs. When network is up again one of servers automatically put offline service groups so the service groups will run only on one of servers.
My idea nr2. - what about to use ping as some kind of "steward". So before server goes up, it try to ping some adressess and if it will not reach this addresses, it will not put online service groups.
Ideal solution is to use standard steward of course.
regards

Juraj
1 ACCEPTED SOLUTION

Accepted Solutions

Wally_Heim
Level 6
Employee

Hi Juraj,
 

If all of you networks are going down at the same time, how is adding a steward that is accessed via the network going to help?  It sounds like you don't have completely independant heartbeats and you might want to isolate what is causing all of your network connections to go down at the same time.


VCS already resolves split brain when the network links are restored by running the concurency violation trigger.  This will take offline one of the service groups.  The problem is that it is a little random and it might not take down the group from the node that you are expecting it to.

You can also add a thrid node to your replicated data cluster (with the windows product - I'm not sure about the Unix/Linux versions of the product.) 

VCS also requires dual independant heartbeats and can support upto 8.  If you are concerned with multiple heartbeat failures at one time then you should add more heartbeats and think about adding a low priority heartbeat on the public network.

With our GCO conifguration we do have the ability to use a steward in a similar manner to what you describe.  But it does not work with a single cluster.

Thanks,
Wally

View solution in original post

1 REPLY 1

Wally_Heim
Level 6
Employee

Hi Juraj,
 

If all of you networks are going down at the same time, how is adding a steward that is accessed via the network going to help?  It sounds like you don't have completely independant heartbeats and you might want to isolate what is causing all of your network connections to go down at the same time.


VCS already resolves split brain when the network links are restored by running the concurency violation trigger.  This will take offline one of the service groups.  The problem is that it is a little random and it might not take down the group from the node that you are expecting it to.

You can also add a thrid node to your replicated data cluster (with the windows product - I'm not sure about the Unix/Linux versions of the product.) 

VCS also requires dual independant heartbeats and can support upto 8.  If you are concerned with multiple heartbeat failures at one time then you should add more heartbeats and think about adding a low priority heartbeat on the public network.

With our GCO conifguration we do have the ability to use a steward in a similar manner to what you describe.  But it does not work with a single cluster.

Thanks,
Wally