Solved: cluster behavior needed, which cfg vars to modify

pb227 · ‎03-01-2013

Hallo,

I wish to have the following behavior from a Veritas cluster, monitoring a resource (app):

resource failed, first attempt to restart it on the same node, if not, migrate it to the second node.

However, is there another monitor which forces the resource to directly migrate if it fails too many times in a given timeframe, instead on starting it again on the same node ?

When testing, I have different behaviors depending on how much time I wait between manually killing the app and I do not know exactly which configurations I have to edit. basically, the question is how much time do I have between manually failing the resource, so the cluster restarts it again on the _same_ node?

cfg so far -> ToleranceLimit = 0 RestartLimit = 1 OnlineTimeout = 300.

mikebounds · ‎03-01-2013

The attribute you are missing is

ConfInterval

When a resource has remained online for the specified time (in

seconds), previous faults and restart attempts are ignored by

the agent. (See ToleranceLimit and RestartLimit attributes for

details.)

■ Type and dimension: integer-scalar

■ Default: 600 seconds

So with default ConInterval of 600 sec (10 mins) with:

RestartLimit=1, a resource will be restarted once and if it fails again within 10 mins it will cause failover but if it fails after 10 mins then it will be restarted again

ToleranceLimit=1, a failure will be ignored the first time and if it fails again within 10 mins it will cause failover but if it fails after 10 mins then it will be ignored again.

Mike

View solution in original post

mikebounds · ‎03-01-2013