cancel
Showing results for 
Search instead for 
Did you mean: 

Troubleshooting when an application hangs on an node within VCS

tanislavm
Level 6

Hi, If an application hang on an node within vcs,i like to verify how I troubleshoot this issue. Should I use the stop script in main.cf to cleanly stop this application?Or kill -9 the application processes?Then start the group hagrp -online? thanks so much.

 

<<title edited by admin to add further descriptiveness>>

1 ACCEPTED SOLUTION

Accepted Solutions

mikebounds
Level 6
Partner Accredited

If VCS detects the application is hung, then VCS will call a clean which will forcabily stop the application and then take further action depending on how you have configured VCS - so for instance if you have set RestartLimit on the resource type then VCS will restart the application, else in other configurations will failover the group to another system.

If VCS does NOT detect the application is hung, then if you have set RestartLimit on the resource type then you could kill -9 the application processes and then VCS will restart, but if RestartLimit is not set and you don't application to failover to another system, then you could offline using VCS (hares -offline or use GUI) an this will try to gracefully stop the application and of this doesn't work, VCS will call a clean.  Alternatively, you freeze service group (hagrp -freeze or use GUI) and kill -9 the application processes (freezing group means VCS will not take action when it sees process dies) and then restart application manually or using VCS.

Mike

View solution in original post

5 REPLIES 5

mikebounds
Level 6
Partner Accredited

If VCS detects the application is hung, then VCS will call a clean which will forcabily stop the application and then take further action depending on how you have configured VCS - so for instance if you have set RestartLimit on the resource type then VCS will restart the application, else in other configurations will failover the group to another system.

If VCS does NOT detect the application is hung, then if you have set RestartLimit on the resource type then you could kill -9 the application processes and then VCS will restart, but if RestartLimit is not set and you don't application to failover to another system, then you could offline using VCS (hares -offline or use GUI) an this will try to gracefully stop the application and of this doesn't work, VCS will call a clean.  Alternatively, you freeze service group (hagrp -freeze or use GUI) and kill -9 the application processes (freezing group means VCS will not take action when it sees process dies) and then restart application manually or using VCS.

Mike

tanislavm
Level 6

Hi Mike,

Thanks so much.Could i start safely the application using hagrp -online,if the other group resources are online?Or better to stop whole group hagrp -offline and then start the group?

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

you would do "hagrp -online" if the group is either offline or partially online. If in case the application resource had issue & application resource had faulted, if the apps resource was not critical, rest of resource within the service groups will still be online. In this state, the service group would be in partially online state. If you are sure that application fault has been fixed, then you can trigger hagrp -online to start the application resources. VCS will auto detect that rest of resources in the service group are already online & only application resource needs to be restarted.

To answer your original question, I would agree with Mike on approach of freezing the service group, troubleshooting the application, unfreezing then.

 

G

Gaurav_singh
Level 4

@Mike.. I small doubt I have..Clould you please clear to me.

a) 1st case: If i take this case, What configurations required to failover the application to the second node.. If                       application hangs down in a node.

b) 2nd case: What restart limit need to put so that vcs restarts the apps?

Kinldy assist..

 

Thanks,

Gary

 

 

mikebounds
Level 6
Partner Accredited

If VCS detects the application issue, then if resource has Critical atribute of 1 (or any dependent resource is critical), then it will cause group to failover, else, it will not.

If VCS doe not detect the application issue, then if only need to kill application once to fix it, then you could use a RestartLimit of 1, but if you need to restart applications more than once, then you need to set RestartLimit accordingly.  But I would freeze service group (see first post) and then kill application, rather than using RestartLimit as this is what freezing a group is really designed for.

Mike