Resource Fault question
When Resource goes offline unexpectedly, agent monitor the resource and run clean entry point to bring resource offline and make it into Faulted state. My quesitons is before it brings it into Faulted state, can we restrat the resouce?
Hi,
You very well can but would you be able to get the specific time window to restart the resource ? Resource offline, clean will happen in within of minutes (depending on how MonitorInterval, MonitorTimeout, RestartLimit) is set.
If a resource is in transitioning state (onlining or offlining), you can flush (hagrp -flush) the service group so that tranistioning of resources stops & then you can take the manual action.
G
I agree with Gaurav.
Increase RestartLimit. The default for most resource types is 0.
You may want to read through this section in VCS Admin Guide:
Controlling VCS behavior at the resource level
Extract:
About the RestartLimit attributeThe RestartLimit attribute defines whether VCS attempts to restart a failedresource before informing the engine of the fault.If the RestartLimit attribute is set to a non-zero value, the agent attempts torestart the resource before declaring the resource as faulted. When restarting afailed resource, the agent framework calls the Clean function before calling theOnline function. However, setting the ManageFaults attribute to NONE preventsthe Clean function from being called and prevents the Online function from beingretried.(VCS Admin Guide and other manuals can be found here: http://sort.symantec.com/documents )I'm not sure if you are asking if "you" can start it or if "VCS" can restart it:
If "you" restart resource before VCS detects it is down, then resource will not be marked as faulted, but if you are intentionally restarting, you should "freeze" service group so VCS does not interfere with your restart
If RestartLimit is set to greather than zero, then VCS will restart resource and will not mark as faulted unless all restarts fail.
Mike
First the agent will call clean entry point to make ensure that the resource is completely offline. After that the agent will call the online entry point to restart the resource as per the RestartLimit attribute.
This is also mentioned in the description of RestartLimit attribute pasted by Marianne above.
Thank you very much for all of your input. Sorry for asking stupid basic queston.
We don't mind basic questions - all of us were new at one stage and back then there was no Symantec Connect to ask. So, we had to read manuals.
We do hope that you will read manuals when we point out the name of a manual and the relevant section.
You will see that I quoted from the manual 2 days ago:
If the RestartLimit attribute is set to a non-zero value, the agent attempts torestart the resource before declaring the resource as faulted. When restarting afailed resource, the agent framework calls the Clean function before calling theOnline function.This means that when a resource 'goes offline unexpectedly' (normally because someone has killed/offline the process manually outside of cluster), the agent will run the Clean function (to be 100% sure processes are down) and the run the Online function.Best to educate dba's, users, etc... to use ha commands to offline resources...