fault of primary node in a global cluster

Hi Guy. I have a global cluster formed by 2 mini-cluster of one node only each one synchronized in asynchronous mode. I wanted to simulate a fault of primary node with solaris command 'halt' root@MILWB02S # hagrp -state AppService #Group Attribute System Value AppService State MILWB03SCluster:MILWB03S |OFFLINE| AppService State localclus:MILWB02S |ONLINE| after 'halt' on primary node MILWB02S, we have: root@MILWB03S # hagrp -state AppService #Group Attribute System Value AppService State MILWB02SCluster:MILWB02S |OFFLINE| AppService State localclus:MILWB03S |OFFLINE| root@MILWB03S # hasys -state #System Attribute Value MILWB02SCluster:MILWB02S SysState EXITED localclus:MILWB03S SysState RUNNING root@MILWB03S # vradmin -g datadg repstatus datarvg VxVM VVR vradmin INFO V-5-52-1205 Primary is unreachable or RDS has configuration error. Displayed status information is from Secondary and can be out-of-date. Replicated Data Set: datarvg Primary: Host name: 10.66.28.53 RVG name: datarvg DG name: datadg RVG state: enabled for I/O Data volumes: 1 VSets: 0 SRL name: srl_vol SRL size: 1.00 G Total secondaries: 1 Secondary: Host name: 10.66.28.54 RVG name: datarvg DG name: datadg Data status: consistent, up-to-date Replication status: paused due to network disconnection Current mode: asynchronous Logging to: SRL (0 updates behind, last update ID 5730.50511) Timestamp Information: behind by 0h 0m 0s Last Update on Primary: May 29 13:32:06 Secondary up-to-date as of: May 29 13:32:06 Config Errors: 10.66.28.53: Pri or Sec IP not available or vradmind not running, stale information is this situation correct ? I decided to manually start the service (AppService) on secondary node, because MILWB02S is down root@MILWB03S # hagrp -online -force AppService -sys MILWB03S root@MILWB03S # hagrp -state AppService #Group Attribute System Value AppService State MILWB02SCluster:MILWB02S |OFFLINE| AppService State localclus:MILWB03S |ONLINE| root@MILWB03S # vradmin -g datadg repstatus datarvg Replicated Data Set: datarvg Primary: Host name: 10.66.28.54 RVG name: datarvg DG name: datadg RVG state: enabled for I/O Data volumes: 1 VSets: 0 SRL name: srl_vol SRL size: 1.00 G Total secondaries: 1 Config Errors: 10.66.28.53: Pri or Sec IP not available or vradmind not running after a lot of time, I booted the server down, and I noted a automatic switch of service from MILWB03S to MILWB02S root@MILWB02S # hagrp -state AppService #Group Attribute System Value AppService State MILWB03SCluster:MILWB03S |OFFLINE| AppService State localclus:MILWB02S |ONLINE| root@MILWB02S # vradmin -g datadg repstatus datarvg Replicated Data Set: datarvg Primary: Host name: 10.66.28.53 RVG name: datarvg DG name: datadg RVG state: enabled for I/O Data volumes: 1 VSets: 0 SRL name: srl_vol SRL size: 1.00 G Total secondaries: 1 Config Errors: 10.66.28.54: Primary-Primary configuration Is thi situation correct ? Why the cluster switched the service ?

business continuity

Cluster Server

Solaris

Troubleshooting

Sunil_Yadav

10 years ago

Hi,

We tried analyzing config and log files. Our findings are as below:

1. We observed testing activities. For service groups’ failures, it seems working correctly.

On MILWB02S:

2015/05/29 11:19:15 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
2015/05/29 11:42:55 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
2015/05/29 12:46:29 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
2015/06/04 11:09:51 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
2015/06/09 10:41:39 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]
2015/06/09 14:19:44 VCS WARNING V-16-1-50911 Unable to fail over global group AppService in local cluster. Attempting to fail group over to a remote cluster [ClusterFailoverPolicy = Auto]

On MILWB03S:

2015/05/29 11:19:15 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
2015/05/29 11:42:55 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
2015/05/29 12:46:29 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
2015/06/04 11:09:51 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
2015/06/09 10:41:39 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster
2015/06/09 14:19:44 VCS INFO V-16-1-50925 Proceeding to online group AppService on the best possible system in the local cluster

2. There is no issue with ClusterFailoverPolicy. Value of “ClusterFailoverPolicy = Auto” is same across clusters. However, ClusterList is not same across clusters.

On MILWB02S
ClusterList = { MILWB03SCluster = 0, MILWB02SCluster = 1 }

On MILWB03S
ClusterList = { MILWB02SCluster = 0, MILWB03SCluster = 1 }

This value too must be same across clusters. Otherwise, it can create concurrency violation during Auto-Start. However, this discrepancy haven’t created any issue till now. You should rectify this to avoid any future issues.

3. Cross cluster failover didn’t happen when MILWB02S went down?

On MILWB03S, we observed that MILWB02SCluster has exited(not faulted).

2015/05/29 13:31:58 VCS NOTICE V-16-1-50514 Remote cluster 'MILWB02SCluster' has exited
2015/05/29 13:31:58 VCS INFO V-16-3-18309 (MILWB03S) Cluster MILWB02SCluster exited
2015/05/29 13:42:09 VCS ERROR V-16-3-18211 (MILWB03S) Cluster MILWB03SCluster lost heartbeat Icmp to cluster MILWB02SCluster

Same state transition was confirmed by “hasys –state”

# hasys -state
# System                    Attribute    Value
MILWB02SCluster:MILWB02S    SysState    EXITED
localclus:MILWB03S          SysState    RUNNING

Cross cluster failover happens only in case of cluster FAULT. In this case, cluster didn’t faulted, it EXITED. In case of cluster fault, expected log message is:

9999/99/99 23:59:59 VCS CRITICAL V-16-1-50513 Remote cluster 'Xxxx' has faulted

As there wasn’t cluster fault, there wasn’t cross cluster failover.

4. Automated switchover of AppService from MILWB03S to MILWB02S?

We verified all ocassions when AppService went online on MILWB02S. Everytime, it was user iniitated action. AppService never automatically switched-over from MILWB03S to MILWB02S

# 1
2015/05/29 14:33:32 VCS INFO V-16-1-50135 User root fired command: hagrp -online AppService  MILWB02S  from localhost
.
.
.
2015/05/29 14:35:11 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S

# 2
2015/05/29 15:49:30 VCS INFO V-16-1-50135 User root fired command: hagrp -online AppService  MILWB02S  from localhost
.
.
.
2015/05/29 15:50:58 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S

# 3
2015/06/01 08:39:34 VCS INFO V-16-1-50135 User root fired command: hagrp -switch AppService  MILWB02S  MILWB02SCluster  from localhost
2015/06/01 08:39:34 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S
.
.
.
2015/06/01 08:42:00 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S

# 4
2015/06/03 11:09:32 VCS INFO V-16-1-50135 User root fired command: hagrp -flush AppService  MILWB02S  0  from localhost
.
.
.
2015/06/03 11:10:58 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S

# 5
2015/06/03 16:38:36 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S
.
.
.
2015/06/03 16:45:10 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S

# 6
2015/06/04 10:44:50 VCS INFO V-16-1-50135 User root fired command: hagrp -online AppService  MILWB02S  from localhost.
.
.
.
2015/06/04 10:46:24 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S

# 7
2015/06/09 09:38:40 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S
.
.
.
2015/06/09 09:41:07 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S

# 8
2015/06/12 08:11:00 VCS INFO V-16-1-50803 Received request to switch group AppService from remote system MILWB03S to local system MILWB02S
.
.
.
2015/06/12 08:13:25 VCS NOTICE V-16-1-10447 Group AppService is online on system MILWB02S

Hopefully, we have addressed all your queries. Please let us if any further assistance needed.

Thanks & Regards,
Sunil Y

Forum Discussion

fault of primary node in a global cluster

13 Replies

Related Content

WebUI does not update bp.conf on the offline node of the clustered Primary server

Global Cluster Setup

Primary server EEB installation from WebUI

Global Cluster or HA/DR functionality

VCS global cluster configuration

Recent Discussions

Configure two Mount type resources of nfs FStype attribute using the same share

order

key registration and reservation

Verifying that primary and dr clusters replication is synced

vcs can create logical nic