Solved: delay time before to put in fault a resource

Epi4Sym · ‎11-17-2010

Hi, I have e resource that is a NFS client that has like target 2 NFS server (one active, one standby).

If the NFS server active goes in fault I don't want the my resource on NFS client goes in fault immediately, but I want that it stays up and wait the the NSF server standby is active.

Actually My NFS client resource take 80 seconds to go in the fault after I stop the NFS active server.

I have modified the FaultOnMonitorTimeout, but nothing is changed the resource has faulted after the same time.

I need to delay this time because the switch between the NFS server is greater than 80 seconds.

Is this the attribute to change to delay the time to declare the resource in fault?

Or the are are attibutes to consider?

thanks for your help

epi

g_lee · ‎11-18-2010

epi,

If you only want to modify some resource/s of that type, you can override the attribute for that particular resource.

See VCS User's Guide -> Administering the cluster from the command line -> Administering resource types -> Overriding resource type static attributes

==========
You can override some resource type static attributes and assign them resource-specific values. When a static attribute is overriden and the configuration is saved, the main.cf file includes a line in the resource definition for the static attribute and its overriden value.
To override a type’s static attribute
# hares -override resource static_attribute
To restore default settings to a type’s static attribute
# hares -undo_override resource static_attribute
==========

eg: say you have 3 local mounts (mountA, mountB, mountC), and 1 mount that relies on NFS (mountD). You want ToleranceLimit to be the default (0) for the local mounts, but you want to set ToleranceLimit=4 for mountD only

Set ToleranceLimit back to default for the Mount resource type:

# hatype -modify Mount ToleranceLimit 0

Override ToleranceLimit for mountD resource:

# hares -override mountD ToleranceLimit

Set ToleranceLimit to 4 for mountD only:

# hares -modify mountD ToleranceLimit 4

Hope that helps,

Grace

View solution in original post

g_lee · ‎11-17-2010

epi,

The FaultOnMonitorTimeout doesn't change the time taken before it declares a resource faulted, it "defines whether VCS interprets a Monitor function timeout as a resource fault." [...] "By default, the FaultOnMonitorTimeouts attribute is set to 4. This means that the Monitor function must time out four times in a row before the resource is marked faulted."

From your description, it sounds as though the NFS server is an external resource (ie: not part of the same cluster), and the NFS client resource that you're trying to modify is a Mount resource - is this correct?

If so, then you're saying that if the NFS active server stops, it takes some time to switch to the standby server. In the meantime, the NFS Mount (client) resource is faulting. You don't want this to cause a failover, but you want it to wait to retry for the standby server to come online.

Look at the ToleranceLimit attribute (from VCS Users Guide -> Controlling VCS behaviour at the resource level -> About resource type attributes that control resource behaviour):

----------
About the ToleranceLimit attribute
The ToleranceLimit attribute defines the number of times the Monitor routine should return an offline status before declaring a resource offline. This attribute is typically used when a resource is busy and appears to be offline. Setting the attribute to a non-zero value instructs VCS to allow multiple failing monitor cycles with the expectation that the resource will eventually respond. Setting a non-zero ToleranceLimit also extends the time required to respond to an actual fault.
----------

So, if you're saying it takes ~80 seconds for the NFS server switch to take place, assuming MonitorInterval=60, you could try setting ToleranceLimit to 2 or 3 - so the Monitor can return an offline status twice/three times (ie: will probe twice or three times = 120 or 180 seconds) before it declares the resource as faulted

Also, see the section "How VCS handles resource faults" in the VCS Users Guide for more details about how the attributes work together.

If this isn't the scenario you're describing, please provide further details to explain what you're trying to achieve.

Hope that helps,

Grace

Epi4Sym · ‎11-18-2010

Hi Grace, you are right is mount resource.

I have modified the Tolerancelimit and the behaviour is changed. Thanks!

With the attribute equal to 4 i.e the fault is delayed.

Now I have another issue.

I have 2 service groups and on both I have 2 different mount type.

I want change the attribute only on one of them, but i have seen that the parameter was changed on both service groups for the mount type.

Let me know if my question is clear or do you need more information.

PS. I'm executing the tests on simulated enviroment.

regards

epi

g_lee · ‎11-18-2010

epi,

If you only want to modify some resource/s of that type, you can override the attribute for that particular resource.

See VCS User's Guide -> Administering the cluster from the command line -> Administering resource types -> Overriding resource type static attributes

==========
You can override some resource type static attributes and assign them resource-specific values. When a static attribute is overriden and the configuration is saved, the main.cf file includes a line in the resource definition for the static attribute and its overriden value.
To override a type’s static attribute
# hares -override resource static_attribute
To restore default settings to a type’s static attribute
# hares -undo_override resource static_attribute
==========

eg: say you have 3 local mounts (mountA, mountB, mountC), and 1 mount that relies on NFS (mountD). You want ToleranceLimit to be the default (0) for the local mounts, but you want to set ToleranceLimit=4 for mountD only

Set ToleranceLimit back to default for the Mount resource type:

# hatype -modify Mount ToleranceLimit 0

Override ToleranceLimit for mountD resource:

# hares -override mountD ToleranceLimit

Set ToleranceLimit to 4 for mountD only:

# hares -modify mountD ToleranceLimit 4

Hope that helps,

Grace

Epi4Sym · ‎11-19-2010

Hi Grace, it works, thanks a lot.

bye

epi

VOX

delay time before to put in fault a resource