Cheat sheet : VCS Event Triggers

Sunil_Yadav · ‎06-26-2015

VCS event triggers let you invoke user-defined scripts for specified events in a cluster. Triggers can be broadly categories into 2 categories:

1. Internal triggers: Internal triggers are non-configurable and always enabled. These triggers reside in $VCS_HOME/bin/internal_triggers directory. By default, $VCS_HOME = /opt/VRTSvcs/

# ls -l /opt/VRTSvcs/bin/internal_triggers/
-rwxr-x--- 1 root root 2301 Oct 17  2014 cpuusage
-rwxr-x--- 1 root root 2343 Oct 17  2014 dump_tunables
-rwxr-x--- 1 root root 2349 Oct 17  2014 globalcounter_not_updated
-rwxr-x--- 1 root root 7551 Oct 17  2014 violation

2. Custom triggers: Custom triggers are configurable at different level. With installation, VCS provides a sample Perl script for each event trigger in $VCS_HOME/bin/sample_triggers/VRTSvcs directory.

# ls -l /opt/VRTSvcs/bin/sample_triggers/VRTSvcs/
-rwxr--r-- 1 root root  2471 Oct 17  2014 cpuusage
-rwxr--r-- 1 root root  3026 Oct 17  2014 injeopardy
-rwxr--r-- 1 root root  2836 Oct 17  2014 loadwarning
-rwxr--r-- 1 root root  2458 Oct 17  2014 nofailover
-rwxr--r-- 1 root root  2496 Oct 17  2014 postoffline
-rwxr--r-- 1 root root  2483 Oct 17  2014 postonline
-rwxr--r-- 1 root root  3324 Oct 17  2014 postonline_rhev
-rwxr--r-- 1 root root  5109 Oct 17  2014 preonline
-rwxr----- 1 root root 10099 Oct 17  2014 preonline_ipc
-rwxr--r-- 1 root root  2841 Oct 17  2014 preonline_rhev
-rwxr----- 1 root root  5377 Oct 17  2014 preonline_vvr
-rwxr--r-- 1 root root  2865 Oct 17  2014 resadminwait
-rwxr--r-- 1 root root  2605 Oct 17  2014 resfault
-rwxr--r-- 1 root root  2744 Oct 17  2014 resnotoff
-rwxr--r-- 1 root root  3264 Oct 17  2014 resrestart
-rwxr--r-- 1 root root  3226 Oct 17  2014 resstatechange
-rwxr--r-- 1 root root  2605 Oct 17  2014 sysjoin
-rwxr--r-- 1 root root  2846 Oct 17  2014 sysoffline
-rwxr--r-- 1 root root  2592 Oct 17  2014 sysup
-rwxr--r-- 1 root root  2690 Oct 17  2014 unable_to_restart_agent
-rwxr--r-- 1 root root  4037 Oct 17  2014 unable_to_restart_had

You can tailor these sample triggers for customized actions according to your requirements. You may choose to write your own Perl scripts. Some custom triggers are configurable(e.g. preonline) while some are non-configurable(e.g. injeopardy). Move the modified trigger script to $VCS_HOME/bin/triggers on each node. To enable non-configurable custom triggers, place the script in $VCS_HOME/bin/triggers directory. To disable non-configurable custom triggers, remove the files associated with the trigger from the $VCS_HOME/bin/triggers directory. For configurable custom triggers, configure other attributes(e.g. TriggersEnabled) that may be required to enable the triggers.

It is advised to not put customized trigger scripts in the $VCS_HOME/bin/sample_triggers/VRTSvcs directory or in the $VCS_HOME/bin/internal_triggers directory. If you install customized triggers in these directories, you might face issues while upgrading VCS.

How triggers are enabled?

TriggersEnabled attribute is used to enable/disable by triggers. Triggers are disabled by default. You can enable specific triggers on all nodes or only on selected nodes. This attribute is available on Resource level and Service group level too. For Resource level TriggersEnabled attribute, valid values are RESFAULT, RESNOTOFF, RESSTATECHANGE, RESRESTART, and RESADMINWAIT. For Service Group level TriggersEnabled attribute, valid values are VIOLATION, NOFAILOVER, PREONLINE, POSTONLINE, POSTOFFLINE, RESFAULT, RESSTATECHANGE, and RESRESTART. This attribute is a string keylist. As same attribute is used on Resource and Service Group level, steps for enabling/disabling are similar for Resource/Service Group.

Enabling triggers using CLI

# hares -modify test_res TriggersEnabled RESFAULT RESNOTOFF RESSTATECHANGE RESADMINWAIT

# hares -display test_res -attribute TriggersEnabled
#Resource    Attribute              System                 Value
test_res     TriggersEnabled        localclus              RESFAULT     RESNOTOFF       RESSTATECHANGE  RESADMINWAIT

# hagrp -modify test_sg TriggersEnabled PREONLINE POSTONLINE POSTOFFLINE

# hagrp -display test_sg -attribute TriggersEnabled
#Group       Attribute             System                 Value
test_sg      TriggersEnabled       localclus              PREONLINE     POSTONLINE      POSTOFFLINE

Enabling triggers using main.cf

Application test_res (
	.
	TriggersEnabled = { RESFAULT, RESNOTOFF, RESSTATECHANGE, RESADMINWAIT }
	.
	)

group test_sg (
	.
	TriggersEnabled = { PREONLINE, POSTONLINE, POSTOFFLINE }
	.
	)

Custom trigger location

If a trigger is enabled but the trigger path is "" (default), VCS invokes the trigger from the $VCS_HOME/bin/triggers directory. You can also relocate this triggers and update TriggerPath accordingly. If you specify an alternate directory, VCS invokes the trigger from that path.

How triggers are invoked?

Triggers are executed by hatrigger script located at $VCS_HOME/bin/hatrigger. VCS determines if the event is enabled and invokes the hatrigger script, and also passes the name of the event trigger and associated parameters.

E.g. Preonline trigger in invoked before bringing a service group online. For executing preonline trigger, VCS invokes following command:

hatrigger preonline system service_group whyonlining [system_where_group_faulted]

Arguments’ details is also available in sample trigger. E.g. snippet from sample preonline script.

# Usage:
# preonline <system> <group> <whyonlining> <systemwheregroupfaulted>
#
# <system>: is the name of the system where group is to be onlined.
# <group>: is the name of the group that is to be onlined.
# <whyonlining>: is "SYSFAULT" or "FAULT" or "MANUAL".
#               "SYSFAULT" corresponds to failover when system is faulted;
#               "FAULT" corresponds to failover;
#               "MANUAL" corresponds to manual online and switch;
# <systemwheregroupfaulted>: When preonline is invoked due to failover
#               this argument is the name of the system where group
#               was online before.
#               When preonline is invoked due to group online
#               command issued with -checkpartial option,
#               this argument is the name of system specified
#               for this option.
#

You can utilize the arguments passed for customizing actions. VCS does not wait for the trigger to complete execution. VCS calls the trigger and continues normal operation.

List of Internal event triggers

violation trigger

This trigger is invoked only on the system that caused the concurrency violation. Specifically, it takes the service group offline on the system where the trigger was invoked. Note that this trigger applies to failover groups only. The default trigger takes the service group offline on the system that caused the concurrency violation.

Arguments:

system — represents the name of the system.

service_group — represents the name of the service group that was fully or partially online.

dumptunables trigger

The dumptunables trigger is invoked when HAD goes into the RUNNING state. When this trigger is invoked, it uses the HAD environment variables that it inherited, and other environment variables to process the event. Depending on the value of the to_log parameter, the trigger then redirects the environment variables to either stdout or the engine log. This trigger is not invoked when HAD is restarted by hashadow.

Arguments:

system—represents the name of the system on which the trigger is invoked.

to_log—represents whether the output is redirected to engine log (to_log=1) or stdout (to_log=0).

globalcounter_not_updated trigger

On the system having lowest NodeId in the cluster, VCS periodically broadcasts an update of GlobalCounter. If a node does not receive the broadcast for an interval greater than CounterMissTolerance, it invokes the globalcounter_not_updated trigger if CounterMissAction is set to Trigger. This event is considered critical since it indicates a problem with underlying cluster communications or cluster interconnects. Use this trigger to notify administrators of the critical events.

Arguments:

system—represents the system which did not receive the update of GlobalCounter.

global_counter—represents the value of GlobalCounter.

cpuusage trigger

Invoked when CPU Usage of the system exceeds the ActionThreshold for a continuous time of ActionTimeLimit. cpuusage is invoked on the node for which CPU Usage has exceeded. If you want this trigger to be turned off specify Action = "NONE". Please refer to System level attribute CPUUsageMonitoring for details of ActionThreshold, ActionTimeLimit, Action.

Arguments:

system - is the name of the system where CPU Usage exceeded.

cpuusage - is the CPU percentage utilization of the system.

List of Custom event triggers

injeopardy trigger

Invoked when a system is in jeopardy. Specifically, this trigger is invoked when a system has only one remaining link to the cluster, and that link is a network link (LLT). This event is considered critical because if the system loses the remaining network link, VCS does not fail over the service groups that were online on the system. Use this trigger to notify the administrator of the critical event. The administrator can then take appropriate action to ensure that the system has at least two links to the cluster. This event trigger is non-configurable.

Arguments:

system — represents the name of the system.

system_state — represents the value of the State attribute.

loadwarning trigger

Invoked when a system becomes overloaded because the load of the system’s online groups exceeds the system’s LoadWarningLevel attribute for an interval exceeding the LoadTimeThreshold attribute. Use this trigger to notify the administrator of the critical event. The administrator can then switch some service groups to another system, ensuring that no one system is overloaded.

Arguments:

system — represents the name of the system.

available_capacity — represents the system’s AvailableCapacity attribute. (AvailableCapacity = Capacity - sum of Load for system’s online groups.)

nofailover trigger

Invoked from the lowest-numbered system in RUNNING state when a service group cannot fail over.

Arguments:

system — represents the name of the last system on which an attempt was made to bring the service group online.

service_group — represents the name of the service group.

postoffline trigger

Invoked on the system where the group went offline from a partial or fully online state. This trigger is invoked when the group faults, or is taken offline manually.

Arguments:

system — represents the name of the system.

service_group — represents the name of the service group that went offline.

preonline trigger

Invoked before bringing a service group online. If the trigger does not exist OR script returns 0 without an exit code, VCS continues to bring the group online. To enable the trigger, set the PreOnline attribute in the service group definition to 1(and vice versa to disable the trigger). You can set a local (per-system) value for the attribute to control behavior on each node in the cluster.

Arguments:

system — represents the name of the system.

service_group — represents the name of the service group on which the was attempted online.

whyonlining — represents three values:

FAULT: Indicates that the group was brought online in response to a group failover.

MANUAL: Indicates that group was brought online or switched manually on the system that is represented by the variable system.

SYSFAULT: Indicates that the group was brought online in response to a sytem fault.

system_where_group_faulted — represents the name of the system on which the group has faulted or switched. This variable is optional and set when the engine invokes the trigger during a failover or switch.

resadminwait trigger

Invoked when a resource enters ADMIN_WAIT state. When VCS sets a resource in the ADMIN_WAIT state, it invokes the

resadminwait trigger according to the reason the resource entered the state.

Arguments:

system—represents the name of the system.

resource—represents the name of the faulted resource.

adminwait_reason—represents the reason the resource entered the ADMIN_WAIT state. Values range from 0-5:

0 = The offline function did not complete within the expected time.

1 = The offline function was ineffective.

2 = The online function did not complete within the expected time.

3 = The online function was ineffective.

4 = The resource was taken offline unexpectedly.

5 = The monitor function consistently failed to complete within the expected time.

resfault trigger

Invoked on the system where a resource has faulted. Note that when a resource is faulted, resources within the upward path of the faulted resource are also brought down.

Arguments:

system — represents the name of the system.

resource — represents the name of the faulted resource.

previous_state — represents the resource’s previous state.

resnotoff trigger

Invoked on the system if a resource in a service group does not go offline even after issuing the offline command to the resource.

Arguments:

system — represents the system on which the resource is not going offline.

resource — represents the name of the resource.

resrestart trigger

This trigger is invoked when a resource is restarted by an agent because resource faulted and RestartLimit was greater than 0.

Arguments:

system—represents the name of the system.

resource—represents the name of the resource.

resstatechange trigger

This trigger is invoked under the following conditions:

Resource goes from OFFLINE to ONLINE.

Resource goes from ONLINE to OFFLINE.

Resource goes from ONLINE to FAULTED.

Resource goes from FAULTED to OFFLINE. (When fault is cleared on non-persistent resource.)

Resource goes from FAULTED to ONLINE. (When faulted persistent resource goes online or faulted non-persistent resource is brought online outside VCS control.)

Arguments:

system — represents the name of the system.

resource — represents the name of the resource.

previous_state — represents the resource’s previous state.

new_state — represents the resource’s new state.

sysoffline trigger

Invoked from the lowest-numbered system in RUNNING state when a system leaves the cluster.

Arguments:

system — represents the name of the system.

system_state — represents the value of the State attribute.

sysup trigger

The sysup trigger is invoked when the first node joins the cluster.

Arguments:

system — represents the system name.

systemstate — represents the state of the system.

sysjoin trigger

The sysjoin trigger is invoked when a peer node joins the cluster.

Arguments:

system — represents the system name.

systemstate — represents the state of the system.

unable_to_restart_agent event trigger

This trigger is invoked when an agent faults more than a predetermined number of times within an hour. When this occurs, VCS gives up trying to restart the agent. VCS invokes this trigger on the node where the agent faults. You can use this trigger to notify the administrators that an agent has faulted, and that VCS is unable to restart the agent. The administrator can then take corrective action.

Arguments:

system — represents the name of the system.

resource_type — represents the resource type associated with the agent.

unable_to_restart_had trigger

Invoked by hashadow when hashadow cannot restart HAD on a system. If HAD fails to restart after six attempts, hashadow invokes the trigger on the system. The default behavior of the trigger is to reboot the system. However, you can customize it as per your requirement. This event trigger is non-configurable and has no arguments.

VOX