Trigger after failed cleanup script
- 10 years ago
Hi,
Cleanup script(AKA clean entry point) is invoked in different scenarios. There are different state transitions based on success/failure of clean entry point. As you specifically mentioned("I have a system where the cleanup script can fail/timeout"), we will elaborate only failure scenarios of clean entry point.
Scenario # 1
Resource ONLINE --> Resource attempting OFFLINE --> Resource fails to go OFFLINE --> Clean entry point invoked --> Clean entry point fails --> Resource moves to ONLINE|UNABLE TO OFFLINE
Scenario # 2
Resource ONLINE --> Resource unexpectedly went OFFLINE --> Clean entry point invoked --> Clean entry point fails --> If Type:: CleanRetryLimit == 0, clean entry point is retired infinitely --> Till clean entry point succeeds, resource remains ONLINE
Scenario # 3
Resource ONLINE --> Resource unexpectedly went OFFLINE --> Clean entry point invoked --> Clean entry point fails --> If Type:: CleanRetryLimit != 0, clean entry point is retried for CleanRetryLimit times --> If it still fails, resource moves to ONLINE| ADMIN_WAIT
Scenario # 4
Resource OFFLINE --> Resource attempting ONLINE --> Resource fails to go ONLINE --> Clean entry point invoked --> Clean entry point fails --> Resource moves to OFFLINE|ADMIN_WAIT
RESNOTOFF is invoked on the system if a resource in a service group does not go offline even after issuing the offline command to the resource. This event trigger only covers scenario # 1. That you also verifying in your test environment.
As per your description, you are either hitting scenario # 2 or # 3. RESNOTOFF won’t be executed in this scenarios. This is expected behavior. You needn’t worry about scenario # 2. In scenario # 2, clean entry point will be retried infinitely. Eventually, at some of time, clean entry point will succeed.
For scenarios # 3 and # 4, you can use RESADMINWAIT trigger. RESADMINWAIT trigger is invoked when a resource enters ADMIN_WAIT state.
To cover all possible failure scenarios of clean entry point, you should use RESNOTOFF and RESADMINWAIT triggers.
Thanks & Regards,
Sunil Y