Query about ICMP in GCO

Hi  Mates,

 

I just want to know how will the cluster know about the service group status in remote cluster.

I know it will be by the use of ICMP adnd wac process.

But , I am confused with below output of hastatus -summ from one site. when secondary site is isolated due to network/router issue.

 

hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen

A  adm1area2             RUNNING              0
A  adm2area2             RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  BkupLan         adm1area2             Y          N               ONLINE
B  BkupLan         adm2area2             Y          N               ONLINE
B  ClusterService  adm1area2             Y          N               OFFLINE|FAULTED
B  ClusterService  adm2area2             Y          N               OFFLINE|FAULTED
B  app2Mon          adm1area2             Y          N               ONLINE
B  app2Mon          adm2area2             Y          N               ONLINE
B  app1             adm1area2             Y          N               OFFLINE
B  app1             adm2area2             Y          N               OFFLINE
B  app1fs           adm1area2             Y          N               OFFLINE|FAULTED
B  app1fs           adm2area2             Y          N               OFFLINE|FAULTED
B  PrivLan         adm1area2             Y          N               ONLINE
B  PrivLan         adm2area2             Y          N               ONLINE
B  PubLan          adm1area2             Y          N               OFFLINE|FAULTED
B  PubLan          adm2area2             Y          N               OFFLINE|FAULTED
B  Site            adm1area2             Y          N               OFFLINE
B  Site            adm2area2             Y          N               OFFLINE
B  StorLan         adm1area2             Y          N               ONLINE
B  StorLan         adm2area2             Y          N               ONLINE
B  assp31         adm1area2             Y          N               OFFLINE|FAULTED
B  assp31         adm2area2             Y          N               OFFLINE|FAULTED

-- RESOURCES FAILED
-- Group           Type                 Resource             System

D  ClusterService  IPMultiNICB          wac_mip              adm1area2
D  ClusterService  IPMultiNICB          wac_mip              adm2area2
D  app1fs           Proxy                app1fs_p1             adm1area2
D  app1fs           Proxy                app1fs_p1             adm2area2
D  PubLan          MultiNICB            pub_mnic             adm1area2
D  PubLan          MultiNICB            pub_mnic             adm2area2
D  assp31         Proxy                syb1_p1              adm1area2
D  assp31         Proxy                syb1_p1              adm2area2

-- WAN HEARTBEAT STATE
-- Heartbeat       To                   State

M  Icmp            area1app1rc_cluster    DOWN

-- REMOTE CLUSTER STATE
-- Cluster         State

N  area1app1rc_cluster FAULTED

-- REMOTE SYSTEM STATE
-- cluster:system       State                Frozen

O  area1app1rc_cluster:adm1area1 FAULTED              0
O  area1app1rc_cluster:adm2area1 FAULTED              0

-- REMOTE GROUP STATE
-- Group           cluster:system       Probed     AutoDisabled    State

P  app1             area1app1rc_cluster:adm1area1 Y          N               OFFLINE
P  app1             area1app1rc_cluster:adm2area1 Y          N               OFFLINE
P  app1fs           area1app1rc_cluster:adm1area1 Y          N               ONLINE
P  app1fs           area1app1rc_cluster:adm2area1 Y          N               OFFLINE
P  Site            area1app1rc_cluster:adm1area1 Y          N               ONLINE
P  Site            area1app1rc_cluster:adm2area1 Y          N               OFFLINE
P  assp31         area1app1rc_cluster:adm1area1 Y          N               OFFLINE
P  assp31         area1app1rc_cluster:adm2area1 Y          N               ONLINE


 

When ICMP is showing down, howcome its still seeing the status of remote cluster and remote service groups.

 

 

Regards

S

 

Tags (3)
9 Replies
Highlighted

Re: Query about ICMP in GCO

Is it still showing the summary like this or has it changed to OFFLINE for the remote resources?

Highlighted

Re: Query about ICMP in GCO

Hi,

 

ICMP is alive and ok now.

But I need to understand when ICMP is  showing down here , it means heartbeat between two cluster shuld be down .

 

then what should be the status of remote cluster and service groups in haststus -summ

 

Highlighted

Re: Query about ICMP in GCO

Any one reply please?

Highlighted

Re: Query about ICMP in GCO

 when ICMP is  showing down here , it means heartbeat between two cluster shuld be down

- Correct/   

then what should be the status of remote cluster and service groups in haststus -summ

- Unknown   or

- Exited if the remote cluster was gracefully shutdown   or

- Faulted if the rebote cluster was doen due to failure

Highlighted

Re: Query about ICMP in GCO

Correction:

these two lines below

- Exited if the remote cluster was gracefully shutdown   or

- Faulted if the rebote cluster was doen due to failure

 

should be corrected as

- Exited if the remote cluster was gracefully shutdown, then inter-cluster HB was lost

- Faulted if the rebote cluster was doen due to failure, then inter-cluster HB was lost

Highlighted

Re: Query about ICMP in GCO

Well if it shows remote resources online when the cluster is faulted its a bit weird. That's why I asked if it was still showing that.

Highlighted

Re: Query about ICMP in GCO

the state of the resource in the remote cluster is "handled" by the had daemon on the remote cluster and communicated to the local cluster via the inter cluster heartbeat.  Since the remote clusetr is down, so are the resources the remote cluster.

If gthe remote cluster is down, to check the state of the resources in the remote cluster, run the command below

#hares -state -clus <remote_cluster>

If you suspect ha* commands do not show resource and service group state correctly, you can try a simple work around below

#hastop -all -force      <<< run this command on one node

#hastart            <<< run this command on each node

There are some know defects with some early VCS releases.  Make sure to patch up your VCS

Can you run the command below and post the output here?

hasys -display | grep -i vers

Highlighted

Re: Query about ICMP in GCO

Hello Frank,

 

I understood the situation ..thanks for your help.

 

 

On one node I am geeting below message on running hastatus -summ

hastatus -sum
VCS ERROR V-16-1-10600 Cannot connect to VCS engine
VCS WARNING V-16-1-11046 Local system not available

However , on other node this system is showing as running and service groups online on both nodes.

 

So, situation is hastatus -summ showing  vcs engine not running ..but had process is running and gab port h is 01 in gabconfig -a



ps -ef | grep -i had
    root  5673     1   0   Oct 09 ?           0:00 /opt/VRTSvcs/bin/hashadow
    root  5660     1   0   Oct 09 ?         117:48 /opt/VRTSvcs/bin/had


on hastart
below error is seen

Nov 15 15:20:32 xxxxx syslog[29067]: [ID 702911 daemon.notice] VCS ERROR V-16-1-11103 VCS exited. It will restart
Nov 15 15:21:02 xxxxx Had[4808]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10619 'HAD' starting on: xxxxx
Nov 15 15:21:02 xxxxx Had[4808]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10620 Waiting for local cluster configuration status
Nov 15 15:21:02 xxxxx Had[4808]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10625 Local cluster configuration valid
Nov 15 15:21:02 xxxxx Had[4808]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11034 Registering for cluster membership
Nov 15 15:21:02 xxxxx genunix: [ID 159711 kern.notice] GAB ERROR V-15-1-20054 Port h registration failed, device busy
Nov 15 15:21:02 xxxxx Had[4808]: [ID 702911 daemon.notice] VCS ERROR V-16-1-11032 Registration failed. Exiting
Nov 15 15:21:02 xxxxx Had[4808]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10116 GabHandle:Smiley Surprisedpen failed errno = 16
Nov 15 15:21:02 xxxxx Had[4808]: [ID 702911 daemon.notice] VCS ERROR V-16-1-11033 GAB open failed. Exiting
Nov 15 15:21:07 xxxxx syslog[29067]: [ID 702911 daemon.notice] VCS ERROR V-16-1-11103 VCS exited. It will restart
Nov 15 15:21:47 xxxxx Had[6870]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10619 'HAD' starting on: xxxxx
Nov 15 15:21:47 xxxxx Had[6870]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10620 Waiting for local cluster configuration status
Nov 15 15:21:47 xxxxx Had[6870]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10625 Local cluster configuration valid
Nov 15 15:21:47 xxxxx Had[6870]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11034 Registering for cluster membership



Is it due to h port is already formed membership in gabconfig.


 

Regards

S.

Highlighted

Re: Query about ICMP in GCO

had is monitored by another daemon call hashadow.  when had is down (abnormally), hashadow will restart it up.   had rcords this error (V-16-1-11103) if there is an issue when data is transferring between itself and GAB.  Since the issue is within the daemon. the system restart is needed.

To fix GAB/HAD communication issue, you can also stop had and close all GAB ports and restart GAB then had.  not sure how familiar you are with VCS so my advise is to get a maintenance window to restart the systems (both nodes.  do a cluster reboot meaning down all the nodes then start them up)

also keep a close eye on the cluster load.  if the load on one system is always very high, move some load to the other node.  if all the nodes in the cluster are on high load on a regular basis, h/w upgrade is needed.