cancel
Showing results for 
Search instead for 
Did you mean: 

Geo clusters cannot communicate with each other? (LOST_CONN)

prupp62
Level 3

Hello,

This is my first time using this forum, so wish me (and You) luck!

We are having a problem with VCS.  Essentially,  we have two 1-node clusters that are cooperating as a Global/Geo cluster pair.  For some reason, either cluster  says it has "Lost Connection" with their opposite partner.    This is visible when running 'haclus -state'  and/or 'hastatus -sum' on each cluster node (see details below).

However, it appears the "wac" process, (which  is used by the clusters to communicate with remote clusters) is actively running in both clusters, and is listening on TCP port 14155.   I can use telnet to connect to either one, from each host.    I can use the 'strace' command on the 'wac' process to show that they are indeed receiving my connection from telnet.   In other words, the TCP communication seems fine between them, and I am able to connect to the each server at port 14155.

I read somewhere that 'wac' can be started in Secure vs. Non-secure mode, and that if one is operating in one mode, and the other cluster is operating in another, this could precipitate a LOST_CONN message too.   I have no clue what these 'secure modes' are, or how to change them in the cluster configuration.

 

Any advice or help is greatly appreciated.

Sincerely,

Peter

 

 

# uname -a
Linux CO1LP0000UTL 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
# haclus -state
#Cluster      Attribute              Value
co1lp0000vs1  ClusState              RUNNING
IL1LP0000VS1  ClusState              LOST_CONN

 

and....

# uname -a
Linux IL1LP0000UTL 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
# haclus -state
#Cluster      Attribute              Value
IL1LP0000VS1  ClusState              RUNNING
co1lp0000vs1  ClusState              LOST_CONN
 


# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen

A  CO1LP0000UTL         RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  ClusterService  CO1LP0000UTL         Y          N               ONLINE
B  nbu_group       CO1LP0000UTL         Y          N               ONLINE

-- WAN HEARTBEAT STATE
-- Heartbeat       To                   State

M  Icmp            IL1LP0000VS1         ALIVE

-- REMOTE CLUSTER STATE
-- Cluster         State

N  IL1LP0000VS1    LOST_CONN

-- REMOTE SYSTEM STATE
-- cluster:system       State                Frozen

O  IL1LP0000VS1:IL1LP0000UTL RUNNING              0

-- REMOTE GROUP STATE
-- Group           cluster:system       Probed     AutoDisabled    State

P  nbu_group       IL1LP0000VS1:IL1LP0000UTL Y          N               OFFLINE

 

# uname -a
Linux IL1LP0000UTL 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen

A  IL1LP0000UTL         RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  ClusterService  IL1LP0000UTL         Y          N               ONLINE
B  nbu_group       IL1LP0000UTL         Y          N               OFFLINE

-- WAN HEARTBEAT STATE
-- Heartbeat       To                   State

M  Icmp            co1lp0000vs1         ALIVE

-- REMOTE CLUSTER STATE
-- Cluster         State

N  co1lp0000vs1    LOST_CONN

-- REMOTE SYSTEM STATE
-- cluster:system       State                Frozen

O  co1lp0000vs1:CO1LP0000UTL RUNNING              0

-- REMOTE GROUP STATE
-- Group           cluster:system       Probed     AutoDisabled    State

P  nbu_group       co1lp0000vs1:CO1LP0000UTL Y          N               ONLINE

1 ACCEPTED SOLUTION

Accepted Solutions

mikebounds
Level 6
Partner Accredited

Was this working at some point and then lost connection or has this never worked?

A few things to check out:

  1. Make sure cluster IP address resolves to a virtual hostname
  2. Make sure clusters IPs in both clusters tie up correctly.  These are defined in 4 places - if you look in /etc/VRTSvcs/conf/config/main.cf these are:
    ClusterAddress in cluster definition
    ClusterAddress in remotecluster definition
    Arguments in heartbeat definition
    Address in IP resource in ClusterService sevice group

Mike

View solution in original post

1 REPLY 1

mikebounds
Level 6
Partner Accredited

Was this working at some point and then lost connection or has this never worked?

A few things to check out:

  1. Make sure cluster IP address resolves to a virtual hostname
  2. Make sure clusters IPs in both clusters tie up correctly.  These are defined in 4 places - if you look in /etc/VRTSvcs/conf/config/main.cf these are:
    ClusterAddress in cluster definition
    ClusterAddress in remotecluster definition
    Arguments in heartbeat definition
    Address in IP resource in ClusterService sevice group

Mike