cancel
Showing results for 
Search instead for 
Did you mean: 

My service group doesn't failover properly, what am I doing wrong?

Xendrill
Not applicable

Hi,

my setup is two RHEL5 servers with VCS5.1.

eth0,eth1 are bonded into bond0

eth2, eth3 serve as LLT hi-pri links

bond0 was designated as low-pri and public link

I have a service group which dependencies are bond0->floating IP->NFS mounts, currently operating on server1, server2 is the failover

The servers are Blades.

When I go into the Enclosure management and drop both the Virtual Connects for eth0,eth1 (thereby severing server1's bond0 completely), I expect it to failover.

hastatus -summ shows it is indeed faulted, but the failover doesn't occur until the Virtual Connects are brought back up.

What have I configured wrong? There's no split brain, the servers are communicating.. I see no reason why the failover process would hang like this.

Thank you very much!

3 REPLIES 3

TonyGriffiths
Level 6
Employee Accredited Certified

Hi,

Does the Virtual Connect disconnection event also affect the bond interface on server2 ?

Are you able to provide an extract of the VCS engine log for the time of the failure ?

joseph_dangelo
Level 6
Employee Accredited

If I am understanding you correctly,  you are disabling only the low-priority link for the cluster.  Are your NIC and IP resources set to critical?  Also, like Tony mentioned above,  by severing BOND0, please ensure that you are not inadvertently causing the same disconnection on Server2. I pretty familiar with the CSeries OA and have in the past been forced to use passthru Ethernet to ensure that proper failure detection occurs. 

Please test failover by using the OA remote management console on Server1 and "downing bond0".  This should tell you whether it is a physical connection issue of resource configuration in the Service Group.

Hope this helps,

Joe D

Tmy_70
Level 5
Partner Accredited Certified

can you give me the /etc/hosts and messages over server?