Forum Discussion

tanislavm's avatar
tanislavm
Level 6
10 years ago

split brain

hi,

if i have 7 nodes in vcs,node a to node e serve 4 service failover groups and node f and g serve an orac.If suddenly i have a split brain case,all the nodes lose the llt conection,what is the fastest and secure way to deal with this?

sure i should to shutdown nodes but i need the sequence.

i will start with shutdown node f or g.then i will shutdown 4 from the nodes a to e.let`s say i shutdown node b to e.i start then  node b and see what service group in online on it.then node,then node e.i make sure on every node a to e there is only a single group online.

 

tnbx a lot.

  • Shutdown will not help in split brain scenario.

    Split brain means that data corruption has probably already happened because 2 different nodes believed they had sole access to the data and wrote to volume/filesystem.

    To recover from split brain, you need to find backup tapes and start restoring.

    Best to PREVENT split brain by ensuring you have a minimum of 2 heartbeats using separate physical infrastructure and by implementing I/O Fencing.

  • If you have fencing configured, then fencing will deal with split-brain in that it will panic servers.

    If you do not have fencing configured, then as Marianne says, you will likely have data corruption.  If any 2 servers have the same failover sg online, then if you fix LLT, so that all nodes see each other, then VCS will bring down group so that it is only online on one node, but best to offline first before fixing LLT so you can choose which node you offline the group on and you may want to ungracefully kill server rather than offline group to minimise any more writes to the disk as if you offline cleanly then likely you will cause further writes to the storage which could cause further corruption.

    Mike

  • Shutdown will not help in split brain scenario.

    Split brain means that data corruption has probably already happened because 2 different nodes believed they had sole access to the data and wrote to volume/filesystem.

    To recover from split brain, you need to find backup tapes and start restoring.

    Best to PREVENT split brain by ensuring you have a minimum of 2 heartbeats using separate physical infrastructure and by implementing I/O Fencing.

  • If you have fencing configured, then fencing will deal with split-brain in that it will panic servers.

    If you do not have fencing configured, then as Marianne says, you will likely have data corruption.  If any 2 servers have the same failover sg online, then if you fix LLT, so that all nodes see each other, then VCS will bring down group so that it is only online on one node, but best to offline first before fixing LLT so you can choose which node you offline the group on and you may want to ungracefully kill server rather than offline group to minimise any more writes to the disk as if you offline cleanly then likely you will cause further writes to the storage which could cause further corruption.

    Mike