cancel
Showing results for 
Search instead for 
Did you mean: 

Resource Fault question

mokkan
Level 6
Certified

When Resource goes offline unexpectedly,  agent monitor the resource and run clean entry point to bring resource offline and make it into Faulted state.  My quesitons is before it brings it into Faulted state, can we restrat the resouce?

5 ACCEPTED SOLUTIONS

Accepted Solutions

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

You very well can but would you be able to get the specific time window to restart the resource ? Resource offline, clean will happen in within  of minutes (depending on how MonitorInterval, MonitorTimeout, RestartLimit) is set.

If a resource is in transitioning state (onlining or offlining), you can flush (hagrp -flush) the service group so that tranistioning of resources stops & then you can take the manual action.

 

G

View solution in original post

Marianne
Level 6
Partner    VIP    Accredited Certified

I agree with Gaurav.

Increase RestartLimit. The default for most resource types is 0.

You may want to read through this section in VCS Admin Guide:

Controlling VCS behavior at the resource level

Extract:

About the RestartLimit attribute
The RestartLimit attribute defines whether VCS attempts to restart a failed
resource before informing the engine of the fault.
If the RestartLimit attribute is set to a non-zero value, the agent attempts to
restart the resource before declaring the resource as faulted. When restarting a
failed resource, the agent framework calls the Clean function before calling the
Online function. However, setting the ManageFaults attribute to NONE prevents
the Clean function from being called and prevents the Online function from being
retried.
 
(VCS Admin Guide and other manuals can be found here: http://sort.symantec.com/documents )

 

View solution in original post

mikebounds
Level 6
Partner Accredited

I'm not sure if you are asking if "you" can start it or if "VCS" can restart it:

If "you" restart resource before VCS detects it is down, then resource will not be marked as faulted, but if you are intentionally restarting, you should "freeze" service group so VCS does not interfere with your restart

If RestartLimit is set to greather than zero, then VCS will restart resource and will not mark as faulted unless all restarts fail.

Mike

View solution in original post

Setu_Gupta
Level 3
Accredited

First the agent will call clean entry point to make ensure that the resource is completely offline. After that the agent will call the online entry point to restart the resource as per the RestartLimit attribute.

This is also mentioned in the description of RestartLimit attribute pasted by Marianne above.

View solution in original post

Marianne
Level 6
Partner    VIP    Accredited Certified

Thank you very much for all of your input. Sorry for asking stupid basic queston.

We don't mind basic questions - all of us were new at one stage and back then there was no Symantec Connect to ask. So, we had to read manuals.

We do hope that you will read manuals when we point out the name of a manual and the relevant section.

You will see that I quoted from the manual 2 days ago:

If the RestartLimit attribute is set to a non-zero value, the agent attempts to
restart the resource before declaring the resource as faulted. When restarting a
failed resource, the agent framework calls the Clean function before calling the
Online function.
 
This means that when a resource 'goes offline unexpectedly' (normally because someone has killed/offline the process manually outside of cluster), the agent will run the Clean function (to be 100% sure processes are down) and the run the Online function.
 
Best to educate dba's, users, etc... to use ha commands to offline resources...

 

View solution in original post

7 REPLIES 7

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

You very well can but would you be able to get the specific time window to restart the resource ? Resource offline, clean will happen in within  of minutes (depending on how MonitorInterval, MonitorTimeout, RestartLimit) is set.

If a resource is in transitioning state (onlining or offlining), you can flush (hagrp -flush) the service group so that tranistioning of resources stops & then you can take the manual action.

 

G

Marianne
Level 6
Partner    VIP    Accredited Certified

I agree with Gaurav.

Increase RestartLimit. The default for most resource types is 0.

You may want to read through this section in VCS Admin Guide:

Controlling VCS behavior at the resource level

Extract:

About the RestartLimit attribute
The RestartLimit attribute defines whether VCS attempts to restart a failed
resource before informing the engine of the fault.
If the RestartLimit attribute is set to a non-zero value, the agent attempts to
restart the resource before declaring the resource as faulted. When restarting a
failed resource, the agent framework calls the Clean function before calling the
Online function. However, setting the ManageFaults attribute to NONE prevents
the Clean function from being called and prevents the Online function from being
retried.
 
(VCS Admin Guide and other manuals can be found here: http://sort.symantec.com/documents )

 

mikebounds
Level 6
Partner Accredited

I'm not sure if you are asking if "you" can start it or if "VCS" can restart it:

If "you" restart resource before VCS detects it is down, then resource will not be marked as faulted, but if you are intentionally restarting, you should "freeze" service group so VCS does not interfere with your restart

If RestartLimit is set to greather than zero, then VCS will restart resource and will not mark as faulted unless all restarts fail.

Mike

mokkan
Level 6
Certified

Thank you very much for all of your input. Sorry for asking stupid basic queston.

 

When Resource goes offline unexpectedly, agent call clean function to make offline. If we set RestartLimit non zero value. Which one will be caled first? Clean action or restart?  What I am trying to understand is that after agent make faulted, then agent call Restart?

Setu_Gupta
Level 3
Accredited

First the agent will call clean entry point to make ensure that the resource is completely offline. After that the agent will call the online entry point to restart the resource as per the RestartLimit attribute.

This is also mentioned in the description of RestartLimit attribute pasted by Marianne above.

Marianne
Level 6
Partner    VIP    Accredited Certified

Thank you very much for all of your input. Sorry for asking stupid basic queston.

We don't mind basic questions - all of us were new at one stage and back then there was no Symantec Connect to ask. So, we had to read manuals.

We do hope that you will read manuals when we point out the name of a manual and the relevant section.

You will see that I quoted from the manual 2 days ago:

If the RestartLimit attribute is set to a non-zero value, the agent attempts to
restart the resource before declaring the resource as faulted. When restarting a
failed resource, the agent framework calls the Clean function before calling the
Online function.
 
This means that when a resource 'goes offline unexpectedly' (normally because someone has killed/offline the process manually outside of cluster), the agent will run the Clean function (to be 100% sure processes are down) and the run the Online function.
 
Best to educate dba's, users, etc... to use ha commands to offline resources...

 

mokkan
Level 6
Certified

Thank you very much all of you.