VCS action entry point gets killed by scheduled monitor timer
Here is the set up:
-$ hatype -display foodo -attribute MonitorInterval ActionTimeout SupportedActions #Type Attribute Value foodo ActionTimeout 30 foodo MonitorInterval 10 foodo SupportedActions touch rmSleepAndRecreate
The 'rmSleepAndRecreate' action performs a sleep for 15 secounds so that it is greater than the MonitorInterval (10 seconds) period.
This is on purpose to see what would happen in the event a Supported-Action was invoked just before the MonitorInterval timer fired. It seems to me that this is something that can quite easily happen in the real world, and therefore should be well understood.
To my great disappointment, the VCS engine (or Agent) kills the action entry point thread when the monitor entry point is invoked:
-$ hares -action kjb_foodo rmSleepAndRecreate -sys centos57x64-01 VCS WARNING V-16-13343 Action entrypoint action timed out for resource(kjb_foodo) ...and in the VCS log: 2012/02/20 13:44:31 VCS INFO V-16-2-13003 (centos57x64-01) Resource(kjb_foodo): Output of the timed out operation (actions) <<<this is just stdout from the 'rmSleepAndRecreate' action entry point>>> ****DEBUG**** allArgs=[kjb_foodo PathName 1 /tmp/kjb_foodo ACTION_ARGS 0] resource_name=[kjb_foodo] PathName=[/tmp/kjb_foodo] <<<END STDOUT>>> ...and in the /var/VRTSvcs/log/foodo_A.log: 2012/02/20 13:44:30 VCS WARNING V-16-2-13139 Thread(4154313616) Canceling thread (4153260944) 2012/02/20 13:44:31 VCS WARNING V-16-2-13343 Thread(4150262672) Action entrypoint action timed out for resource(kjb_foodo)
This is NOT a good implementation; what if the action procedure was in the middle of performing something important or particularly sensitive?
I could find no documentation warning about this behaviour. If this rather dodgey implementation is "as designed", then the documentation describing the action entry point needs to have big danger signs all over it about how you must not implement a procedure to be performed by an action that would not mind getting killed in mid-flight!
I believe a much better implementation would be one where a scheduled monitor entry point does not get invoked until an action procedure has completed (or has been timed out via the ActionTimeout timer).
But hey, maybe this is particular to my Linux setup? Here's the particulars on that:
centos57x64-01.localdomain(root) /root: -$ printVRTSreleaseLevels Name : VRTSperl Release : RHEL5.3 Source RPM : VRTSperl-5.10.0.7-RHEL5.3.src.rpm Name : VRTSatClient Release : 0 Source RPM : VRTSatClient-5.0.32.0-0.src.rpm Name : VRTSvxfs Release : SP1RP2_RHEL5 Source RPM : VRTSvxfs-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSvcsag Release : SP1RP2_RHEL5 Source RPM : VRTSvcsag-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSaslapm Release : SP1_RHEL5 Source RPM : VRTSaslapm-5.1.100.000-SP1_RHEL5.src.rpm Name : VRTSgab Release : SP1RP2_RHEL5 Source RPM : VRTSgab-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSdbed Release : SP1RP2_RHEL5 Source RPM : VRTSdbed-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSlvmconv Release : SP1RP2_RHEL5 Source RPM : VRTSlvmconv-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSllt Release : SP1RP2_RHEL5 Source RPM : VRTSllt-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSvcs Release : SP1RP2_RHEL5 Source RPM : VRTSvcs-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSvcsea Release : SP1RP2_RHEL5 Source RPM : VRTSvcsea-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSobgui Release : 0 Source RPM : VRTSobgui-3.4.15.0-0.src.rpm Name : VRTSspt Release : GA Source RPM : VRTSspt-5.5.000.005-GA.src.rpm Name : VRTSatServer Release : 0 Source RPM : VRTSatServer-5.0.32.0-0.src.rpm Name : VRTSob Release : 0 Source RPM : VRTSob-3.4.312-0.src.rpm Name : VRTSfssdk Release : SP1RP2_RHEL5 Source RPM : VRTSfssdk-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSamf Release : SP1RP2_RHEL5 Source RPM : VRTSamf-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSvcsdr Release : SP1RP2_RHEL5 Source RPM : VRTSvcsdr-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTScscm Release : Linux_GENERIC Source RPM : VRTScscm-5.1.00.20-Linux_GENERIC.src.rpm Name : VRTSvxvm Release : SP1RP2_RHEL5 Source RPM : VRTSvxvm-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSvxfen Release : SP1RP2_RHEL5 Source RPM : VRTSvxfen-5.1.132.000-SP1RP2_RHEL5.src.rpm Name : VRTSodm Release : RP1_RHEL5 Source RPM : VRTSodm-5.1.101.000-RP1_RHEL5.src.rpm Name : VRTSvlic Release : 0 Source RPM : VRTSvlic-3.02.51.010-0.src.rpm Name : VRTSsfmh Release : 0 Source RPM : VRTSsfmh-3.1.830.0-0.src.rpm Name : VRTScps Release : SP1RP2_RHEL5 Source RPM : VRTScps-5.1.132.000-SP1RP2_RHEL5.src.rpm
Do others agree that this is a potentially dangerous VCS bug, or simply "works as designed, deal with it!".
This behavior is documented in the Agent Developer Guide.
ActionTimeout
After the hares -action command has instructed the agent to perform a specified action, the action entry point has the time specified by the ActionTimeout attribute (scalar-integer) to perform the action. The value of ActionTimeout may be set for individual resources, if overridden.
Whether overridden or not, no matter what value is specified for ActionTimeout, the value is internally limited to 0.5 * MonitorInterval. You can extend this value by using the VCSAgSetResEPTimeout (for C/C++ entry point)/VCSAG_SET_RES_EP_TIMEOUT (for script entry point). The default is 30 seconds. The ActionTimeout attribute value can be overridden.