Forum Discussion

tgenova's avatar
tgenova
Level 4
10 years ago

fault of primary node in a global cluster

Hi Guy. I have a global cluster formed by 2 mini-cluster of one node only each one synchronized in asynchronous mode. I wanted to simulate a fault of primary node with solaris command 'halt' root@M...
  • Sunil_Yadav's avatar
    10 years ago

    Hi,

     

    We tried analyzing config and log files. Our findings are as below:

     

    1.    We observed testing activities. For service groups’ failures, it seems working correctly.

    On MILWB02S:

    2015/05/29 11:19:15 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
    2015/05/29 11:42:55 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
    2015/05/29 12:46:29 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
    2015/06/04 11:09:51 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
    2015/06/09 10:41:39 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
    2015/06/09 14:19:44 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
    

    On MILWB03S:

    2015/05/29 11:19:15 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
    2015/05/29 11:42:55 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
    2015/05/29 12:46:29 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
    2015/06/04 11:09:51 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
    2015/06/09 10:41:39 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
    2015/06/09 14:19:44 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
    

     


    2.    There is no issue with ClusterFailoverPolicy. Value of “ClusterFailoverPolicy = Auto” is same across clusters. However, ClusterList is not same across clusters.

    On MILWB02S 
        ClusterList = { MILWB03SCluster = 0, MILWB02SCluster = 1 }

    On MILWB03S 
        ClusterList = { MILWB02SCluster = 0, MILWB03SCluster = 1 }

    This value too must be same across clusters. Otherwise, it can create concurrency violation during Auto-Start. However, this discrepancy haven’t created any issue till now. You should rectify this to avoid any future issues.

     

     

    3.    Cross cluster failover didn’t happen when MILWB02S went down?

    On MILWB03S, we observed that MILWB02SCluster has exited(not faulted).

    2015/05/29 13:31:58 VCS NOTICE V-16-1-50514 Remote cluster 'MILWB02SCluster' has exited
    2015/05/29 13:31:58 VCS INFO V-16-3-18309 (MILWB03S) Cluster MILWB02SCluster exited
    2015/05/29 13:42:09 VCS ERROR V-16-3-18211 (MILWB03S) Cluster MILWB03SCluster lost heartbeat Icmp to cluster MILWB02SCluster

    Same state transition was confirmed by “hasys –state”

    # hasys -state
    # System                    Attribute    Value
    MILWB02SCluster:MILWB02S    SysState    EXITED
    localclus:MILWB03S          SysState    RUNNING

    Cross cluster failover happens only in case of cluster FAULT. In this case, cluster didn’t faulted, it EXITED. In case of cluster fault, expected log message is:

    9999/99/99 23:59:59 VCS CRITICAL V-16-1-50513 Remote cluster 'Xxxx' has faulted

    As there wasn’t cluster fault, there wasn’t cross cluster failover.

     

     

    4.    Automated switchover of AppService from MILWB03S to MILWB02S?


    We verified all ocassions when AppService went online on MILWB02S. Everytime, it was user iniitated action. AppService never automatically switched-over from MILWB03S to MILWB02S

    # 1
    2015/05/29 14:33:32 VCS INFO V-16-1-50135 User root fired command: hagrp -online AppService  MILWB02S  from localhost
    .
    .
    .
    2015/05/29 14:35:11 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S
    
    # 2
    2015/05/29 15:49:30 VCS INFO V-16-1-50135 User root fired command: hagrp -online AppService  MILWB02S  from localhost
    .
    .
    .
    2015/05/29 15:50:58 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S
    
    # 3
    2015/06/01 08:39:34 VCS INFO V-16-1-50135 User root fired command: hagrp -switch AppService  MILWB02S  MILWB02SCluster  from localhost
    2015/06/01 08:39:34 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S
    .
    .
    .
    2015/06/01 08:42:00 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S
    
    # 4
    2015/06/03 11:09:32 VCS INFO V-16-1-50135 User root fired command: hagrp -flush AppService  MILWB02S  0  from localhost
    .
    .
    .
    2015/06/03 11:10:58 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S
    
    # 5
    2015/06/03 16:38:36 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S
    .
    .
    .
    2015/06/03 16:45:10 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S
    
    # 6
    2015/06/04 10:44:50 VCS INFO V-16-1-50135 User root fired command: hagrp -online AppService  MILWB02S  from localhost.
    .
    .
    .
    2015/06/04 10:46:24 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S
    
    # 7
    2015/06/09 09:38:40 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S
    .
    .
    .
    2015/06/09 09:41:07 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S
    
    # 8
    2015/06/12 08:11:00 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S
    .
    .
    .
    2015/06/12 08:13:25 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S

     

    Hopefully, we have addressed all your queries. Please let us if any further assistance needed.

    Thanks & Regards,
    Sunil Y