Forum Discussion

mkruer's avatar
mkruer
Level 4
13 years ago

VCS AutoStartList ungracefully failover part2

I have two system cms-app-49-51 and cms-app-49-52 that I have been testing an ungraceful shutdown between systems (hard power off)

After failing over from cms-app-49-52 to cms-app-49-51 I brought cms-app-49-52 back up and cleared all error (or so I thought.)
I checked  the system and everything looked to be fine all faults cleared.

[root@cms-app-49-52 ~]# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen

A  cms-app-49-51        FAULTED              0
A  cms-app-49-52        RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  CMSApp_Cluster  cms-app-49-52        Y          N               OFFLINE
B  CMSApp_Notifier cms-app-49-52        Y          N               ONLINE
[root@cms-app-49-52 ~]# hagrp -display -attribute AutoDisabled
#Group          Attribute             System        Value
CMSApp_Cluster  AutoDisabled          cms-app-49-51 0
CMSApp_Cluster  AutoDisabled          cms-app-49-52 0
CMSApp_Notifier AutoDisabled          cms-app-49-51 0
CMSApp_Notifier AutoDisabled          cms-app-49-52 0

However CMSApp_Cluster  on cms-app-49-52 never started automatically until I "hagrp -online CMSApp_Cluster -sys cms-app-49-52"

Attached is the logfile from both systems, please look around 2012/12/06 14:58:38

  • If you down 49-51, then CMSApp_Cluster will only failover if:

    1. It is online on 49-51 when 49-51 goes down
    2. It is not in a faulted state on 49-52 - i.e if the CMSApp_Cluster service group previously faulted on 49-52, you need to clear the fault.
    3. It is fully probed on 49-52

    There should never be a need to restart VCS after clearing any faults.

    Mike

  • The engine log shows:

     

    2012/12/06 11:40:42 VCS NOTICE V-16-1-10446 Group CMSApp_Cluster is offline on system cms-app-49-51
     
    and CMSApp_Cluster does not go online on system cms-app-49-51 before it goes down at
     
    2012/12/06 14:58:39 VCS ERROR V-16-1-10322 System cms-app-49-51 (Node '0') changed state from RUNNING to FAULTED
     
    So as CMSApp_Cluster is offline on cms-app-49-51 when it goes down, it will online on cms-app-49-52, as only ONLINE groups will failover.
     
    Looking at your main.cf you seem to be controlling services that start though the RC scripts and then when VCS sees them you get messages like:
      
    2012/12/06 13:02:14 VCS INFO V-16-1-10297 Resource autorsync (Owner: Unspecified, Group: CMSApp_Cluster) is online on cms-app-49-52 (First probe)
    2012/12/06 13:02:14 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group CMSApp_Cluster
    ...
    2012/12/06 13:02:14 VCS WARNING V-16-6-15034 (cms-app-49-52) violation:Offlining group CMSApp_Cluster on system cms-app-49-52
    2012/12/06 13:02:14 VCS INFO V-16-1-50135 User root fired command: hagrp -offline CMSApp_Cluster  cms-app-49-52  from localhost
    2012/12/06 13:02:14 VCS NOTICE V-16-1-10167 Initiating manual offline of group CMSApp_Cluster on system cms-app-49-52
    2012/12/06 13:02:14 VCS NOTICE V-16-1-10300 Initiating Offline of Resource autorsync (Owner: Unspecified, Group: CMSApp_Cluster) on System cms-app-49-52
     
    So services that should ONLY run on ONE node, should not be started at boot time in an RC script and if you have services that normally start on both nodes and you want VCS to monitor and restart services then you should put these in a parallel group.
     
    Mike
     
     

     

     

  • Mike

    I cleared out the error. and VCS then showed eveythign was offline on 49.52 before faulting 49.51. So i guess the next questions is do i need to clear the error on 52 then restart vcs on 52 before tring to failover again?

  • If you down 49-51, then CMSApp_Cluster will only failover if:

    1. It is online on 49-51 when 49-51 goes down
    2. It is not in a faulted state on 49-52 - i.e if the CMSApp_Cluster service group previously faulted on 49-52, you need to clear the fault.
    3. It is fully probed on 49-52

    There should never be a need to restart VCS after clearing any faults.

    Mike