Forum Discussion

ousor's avatar
ousor
Level 4
10 years ago

vcs less nodes

hi,

a vcs could runs with less nodes then in normal configuration?i.e gabconfig seeds this vcs with 6 nodes,but in time some of the nodes left the vcs,so now we have 2 nodes in vcs.I am wondering why Symantec decided to have vcs without disk quorum like solaris cluster,safeguard,so on?

  • Hi,

    VCS can very well run with less nodes ... service groups will need to be distributed between various nodes manually  or service groups will move between the nodes basis on system list.

    Symantec decided to go with IOFencing which is a successful split brain prevention mechanism .. some thing is achieved by quorum votes in sun clusters.

     

    G

  • There is a difference in VCS cluster membership behaviour based:

    Initial startup:
    Here all cluster nodes are started up at more or less the same time (e.g. after initial installation, maintenance, complete power-down, etc). 
    The number of nodes specified in /etc/gabtab all need to join the cluster before VCS can be started and SGs brought online.

    Running cluster:
    All nodes in the cluster were up and running with Service Groups online across nodes (based on rules defined in main.cf).
    If some of the nodes now fail, SGs will failover to running nodes based on rules defined in main.cf.
    If SG1 is defined with specific nodes where it can run, e.g. node1, node2 and node3, then SG1 can only failover to one of these nodes.
    So, in your example if 4 nodes fail (e.g. node1, node2, node3, node4) it means that SG1 cannot failover to any of the remaining nodes (node5 and node6). SG1 will remain offline until one the nodes come online where SG1 is allowed to run.

    I believe that Gaurav has answered your question about disk-based protection.
    Old versions of VCS had disk-heartbeat. IO fencing has replaced this and is a lot more effective.
    Coordination Point server is another level of cluster protection.

  • Hi,

    Yes, once the IOFencing kicks in, the node which looses the race will be taken off from cluster, it will not rejoin unless the heartbeat is fixed again & nodes can communicate to each other.

     

    if you don't have fencing & split brain happens, you will land up in data corruption because nodes will force takeover the diskgroups .. if you are electric fast to react when split brain happens, before all the nodes online the groups, if you can shutdown all the nodes except one you want to keep, you may be able to save, but this is practically close to impossible to achieve.

     

    G

4 Replies

  • Hi,

    VCS can very well run with less nodes ... service groups will need to be distributed between various nodes manually  or service groups will move between the nodes basis on system list.

    Symantec decided to go with IOFencing which is a successful split brain prevention mechanism .. some thing is achieved by quorum votes in sun clusters.

     

    G

  • There is a difference in VCS cluster membership behaviour based:

    Initial startup:
    Here all cluster nodes are started up at more or less the same time (e.g. after initial installation, maintenance, complete power-down, etc). 
    The number of nodes specified in /etc/gabtab all need to join the cluster before VCS can be started and SGs brought online.

    Running cluster:
    All nodes in the cluster were up and running with Service Groups online across nodes (based on rules defined in main.cf).
    If some of the nodes now fail, SGs will failover to running nodes based on rules defined in main.cf.
    If SG1 is defined with specific nodes where it can run, e.g. node1, node2 and node3, then SG1 can only failover to one of these nodes.
    So, in your example if 4 nodes fail (e.g. node1, node2, node3, node4) it means that SG1 cannot failover to any of the remaining nodes (node5 and node6). SG1 will remain offline until one the nodes come online where SG1 is allowed to run.

    I believe that Gaurav has answered your question about disk-based protection.
    Old versions of VCS had disk-heartbeat. IO fencing has replaced this and is a lot more effective.
    Coordination Point server is another level of cluster protection.

  • Hi,

    I wish only to finish the IOfencing thing.So if we have an vcs with IOfencing then in brain split case,automatically the node that will be taken out of cluster will panic,and after reboot they will not rejoin the vcs cluster.true?

     

     

    If we do not have IOfencing,then in split brain case,we need to keep only one node,and the rest of the node to shutdown manually.true?

    tnx a million Marianne and Gaurav.With this i will close this discussion.

  • Hi,

    Yes, once the IOFencing kicks in, the node which looses the race will be taken off from cluster, it will not rejoin unless the heartbeat is fixed again & nodes can communicate to each other.

     

    if you don't have fencing & split brain happens, you will land up in data corruption because nodes will force takeover the diskgroups .. if you are electric fast to react when split brain happens, before all the nodes online the groups, if you can shutdown all the nodes except one you want to keep, you may be able to save, but this is practically close to impossible to achieve.

     

    G