Forum Discussion

Chotchki's avatar
Chotchki
Level 2
13 years ago

Need Help with VCS monitoring a process as a resource

Hey Everyone,

I have a cluster main.cf file as follows:

group BATCH-SG (
        SystemList = { foo = 0, bar = 1 }
        AutoStartList = { foo }
        OnlineRetryLimit = 3
        )

        IP ip3-RES (
                Device = bond0
                Address = "0.0.0.1"
                NetMask = "255.255.254.0"
                )

        NIC nic1-RES (
                Device = bond0
                )

        ProcessOnOnly DB_Watchdog-RES (
                PathName = "/app/watchdog/bin/db-watchdog-wrapper"
                UserName = ezbatch
                )

        ProcessOnOnly NAS_Watchdog-RES (
                PathName = "/app/watchdog/bin/nas-watchdog-wrapper"
                UserName = ezbatch
                )

        requires group Cluster_Storage-SG online local firm
        ip3-RES requires nic1-RES


        // resource dependency tree
        //
        //      group BATCH-SG
        //      {
        //      ProcessOnOnly DB_Watchdog-RES
        //      ProcessOnOnly NAS_Watchdog-RES
        //      IP ip3-RES
        //          {
        //          NIC nic1-RES
        //          }
        //      }
 

I'm trying to set it up so that when either of the ProcessOnOnlys die, the group dies and VCS initiates a failover. However when either of the watchdogs exit with status 100, VCS just restarts them. Any ideas on what I'm doing wrong?

  • Ah, just seen you have OnlineRetryLimit set on group so you need to set this to zero:

     

    haconf -makerw
    hagrp -modify  BATCH-SG OnlineRetryLimit 0
    haconf -dump -makero

    Then VCS will failover group.

    Mike

3 Replies

  • Check the value of RestartLimit:

    hatype -display ProcessOnOnly -attribute RestartLimit

    If this is non-zero then you need to set it to zero if you don't want VCS to restart and failover straightaway:

    hatype -modify ProcessOnOnly RestartLimit 0

    Mike

  • Ah, just seen you have OnlineRetryLimit set on group so you need to set this to zero:

     

    haconf -makerw
    hagrp -modify  BATCH-SG OnlineRetryLimit 0
    haconf -dump -makero

    Then VCS will failover group.

    Mike