fault of primary node in a global cluster
Hi,
We tried analyzing config and log files. Our findings are as below:
1. We observed testing activities. For service groups’ failures, it seems working correctly.
On MILWB02S:
2015/05/29 11:19:15 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto] 2015/05/29 11:42:55 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto] 2015/05/29 12:46:29 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto] 2015/06/04 11:09:51 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto] 2015/06/09 10:41:39 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto] 2015/06/09 14:19:44 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
On MILWB03S:
2015/05/29 11:19:15 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster 2015/05/29 11:42:55 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster 2015/05/29 12:46:29 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster 2015/06/04 11:09:51 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster 2015/06/09 10:41:39 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster 2015/06/09 14:19:44 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
2. There is no issue with ClusterFailoverPolicy. Value of “ClusterFailoverPolicy = Auto” is same across clusters. However, ClusterList is not same across clusters.On MILWB02S
ClusterList = { MILWB03SCluster = 0, MILWB02SCluster = 1 }On MILWB03S
ClusterList = { MILWB02SCluster = 0, MILWB03SCluster = 1 }This value too must be same across clusters. Otherwise, it can create concurrency violation during Auto-Start. However, this discrepancy haven’t created any issue till now. You should rectify this to avoid any future issues.
3. Cross cluster failover didn’t happen when MILWB02S went down?
On MILWB03S, we observed that MILWB02SCluster has exited(not faulted).
2015/05/29 13:31:58 VCS NOTICE V-16-1-50514 Remote cluster 'MILWB02SCluster' has exited 2015/05/29 13:31:58 VCS INFO V-16-3-18309 (MILWB03S) Cluster MILWB02SCluster exited 2015/05/29 13:42:09 VCS ERROR V-16-3-18211 (MILWB03S) Cluster MILWB03SCluster lost heartbeat Icmp to cluster MILWB02SCluster
Same state transition was confirmed by “hasys –state”
# hasys -state # System Attribute Value MILWB02SCluster:MILWB02S SysState EXITED localclus:MILWB03S SysState RUNNING
Cross cluster failover happens only in case of cluster FAULT. In this case, cluster didn’t faulted, it EXITED. In case of cluster fault, expected log message is:
9999/99/99 23:59:59 VCS CRITICAL V-16-1-50513 Remote cluster 'Xxxx' has faulted
As there wasn’t cluster fault, there wasn’t cross cluster failover.
4. Automated switchover of AppService from MILWB03S to MILWB02S?
We verified all ocassions when AppService went online on MILWB02S. Everytime, it was user iniitated action. AppService never automatically switched-over from MILWB03S to MILWB02S# 1 2015/05/29 14:33:32 VCS INFO V-16-1-50135 User root fired command: hagrp -online AppService MILWB02S from localhost . . . 2015/05/29 14:35:11 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S # 2 2015/05/29 15:49:30 VCS INFO V-16-1-50135 User root fired command: hagrp -online AppService MILWB02S from localhost . . . 2015/05/29 15:50:58 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S # 3 2015/06/01 08:39:34 VCS INFO V-16-1-50135 User root fired command: hagrp -switch AppService MILWB02S MILWB02SCluster from localhost 2015/06/01 08:39:34 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S . . . 2015/06/01 08:42:00 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S # 4 2015/06/03 11:09:32 VCS INFO V-16-1-50135 User root fired command: hagrp -flush AppService MILWB02S 0 from localhost . . . 2015/06/03 11:10:58 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S # 5 2015/06/03 16:38:36 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S . . . 2015/06/03 16:45:10 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S # 6 2015/06/04 10:44:50 VCS INFO V-16-1-50135 User root fired command: hagrp -online AppService MILWB02S from localhost. . . . 2015/06/04 10:46:24 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S # 7 2015/06/09 09:38:40 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S . . . 2015/06/09 09:41:07 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S # 8 2015/06/12 08:11:00 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S . . . 2015/06/12 08:13:25 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S
Hopefully, we have addressed all your queries. Please let us if any further assistance needed.
Thanks & Regards,
Sunil Y