Forum Discussion

lelu's avatar
lelu
Level 3
6 years ago

jeopardy

hi,

in jeopardy case the vcs automatically will set the node as jeopardy member?

in 2 node cluster what i must to perform in jeoprdy case in rder to resolv this issue?i mean i must shutdown the node with higher llt number then resolv the broken private connection?

tnx

  • Hi.

    First, determine if you still have cluster communications. The output of “gabconfig -a” provides the answer.

    If you have jepoardy, but the last link is still running, then you can just fix the network. The cluster will then fix itself.

    If the last link failed (no links working), then shutdown the node with highest LLT, repair the link, then restart the node you just rebooted. Once the host joins the cluster, you may have to clear the AutoDisable flag on service groups (if any) that were running on that node. Then, manually bring them online.

    When all links fail, NEVER fix the links while the hosts are alive. Always shut them down first. If IO fencing is running, then the losing nodes would have already crashed. Without IO fencing, they may or may not have crashed. Peform a shutdown.


    All cluster should be running IO fencing with coordinator disks (or CPS). I highly recommend that all production clusters and any cluster running CFS use IO fencing. IO fencing with SCSI3 PGR will prevent data corruption when communications are down.

    The geek diary link is a good resource. I highly recommend you also study the Admin Guide section on “About communications, membership, and data protection in the cluster“. It provides greater details on how exactly to recover when these situations occur. This is one of the hardest sections in the manual for most users, so take your time. I find the table on IO Fencing scenarios to be most helpful.

    Cheers
  • lelu's avatar
    lelu
    6 years ago

    hi,

    just to clarify.

    in link 

    Recovery
    To recover from jeopardy, just fix the link and GAB automatically detects the new link and the jeopardy membership is removed from node03.

     

    It mean that node03 will became a normal member.

    VCS will automatically set node03 as jeopardy member or normal member.right?

    tnx

  • Good day.

    For details, check your Admin Guide. There is a section on various jeopardy scenarios.

    In general, one must fix the broken communications link. Once fixed, the cluster will return to normal on its own.

    Cheers


      • CliffordB's avatar
        CliffordB
        Level 4
        Hi.

        First, determine if you still have cluster communications. The output of “gabconfig -a” provides the answer.

        If you have jepoardy, but the last link is still running, then you can just fix the network. The cluster will then fix itself.

        If the last link failed (no links working), then shutdown the node with highest LLT, repair the link, then restart the node you just rebooted. Once the host joins the cluster, you may have to clear the AutoDisable flag on service groups (if any) that were running on that node. Then, manually bring them online.

        When all links fail, NEVER fix the links while the hosts are alive. Always shut them down first. If IO fencing is running, then the losing nodes would have already crashed. Without IO fencing, they may or may not have crashed. Peform a shutdown.


        All cluster should be running IO fencing with coordinator disks (or CPS). I highly recommend that all production clusters and any cluster running CFS use IO fencing. IO fencing with SCSI3 PGR will prevent data corruption when communications are down.

        The geek diary link is a good resource. I highly recommend you also study the Admin Guide section on “About communications, membership, and data protection in the cluster“. It provides greater details on how exactly to recover when these situations occur. This is one of the hardest sections in the manual for most users, so take your time. I find the table on IO Fencing scenarios to be most helpful.

        Cheers