Failover to secondary system upon ungraceful shutdown
I have configured a cluster with monitored services and it works as intended with one exception. If I hard/ungraceful power down, cluster does not fail over to the working system and start the services automatically. Below is the script that I use to create the cluster. I am hoping I just missed a setting.
hastatus displayes the following last message
App_Cluster SYSTEM_1 *FAULTED* OFFLINE
If i bring the downed system backup up then the system decides to fail over the the other box.
This is the script that I use to configure the cluster
haconf -makerw
haclus -modify PrintMsg 0
haclus -modify UserNames admin xxxxxxxxxxxxx
haclus -modify ClusterAddress "127.0.0.1"
haclus -modify Administrators admin
haclus -modify SourceFile "./main.cf"
haclus -modify ClusterName CMSApp
hasys -add SYSTEM_1
hasys -modify SYSTEM_1 SourceFile "./main.cf"
hasys -add SYSTEM_2
hasys -modify SYSTEM_2 SourceFile "./main.cf"
hagrp -add APP_CLUSTER
hagrp -modify APP_CLUSTER SystemList SYSTEM_1 0 SYSTEM_2 1
hagrp -modify APP_CLUSTER VCSi3Info "" "" -sys SYSTEM_1
hagrp -modify APP_CLUSTER AutoStartList SYSTEM_1 SYSTEM_2
hares -add Virtual_IP IP APP_CLUSTER
hares -local Virtual_IP Device
hares -modify Virtual_IP Device eth0 -sys SYSTEM_1
hares -modify Virtual_IP Device eth0 -sys SYSTEM_2
hares -modify Virtual_IP Address "192.168.0.3"
hares -modify Virtual_IP NetMask "255.255.255.0"
hares -modify Virtual_IP PrefixLen 1000
hares -modify Virtual_IP Enabled 1
hares -add Network_Card NIC APP_CLUSTER
hares -local Network_Card Device
hares -modify Network_Card Device eth0 -sys SYSTEM_1
hares -modify Network_Card Device eth0 -sys SYSTEM_2
hares -modify Network_Card PingOptimize 1
hares -modify Network_Card Mii 1
hares -modify Network_Card Enabled 1
thx
Your issue is that you only have 1 heartbeat. If you run gabconfig -a, you will see the output say jeopardy - this means there is only 1 heartbeat remaining and so from SYS2's point of view, if it looses heartbeat with SYS1 it doesn't know if a single network card failed (or other single network failures like a switch died), or if the server died - as it can't tell the difference, VCS on SYS2 will not take the decision to fail service groups over.
What you need to do is setup eth0 as a lowpri heartbeat so add following line to llttab:
link-lowpri eth0 eth-00:50:56:91:03:30 - ether - -
but replacing with MAC address for eth0 on each system
eth0 and eth1 must be independent (not a dual port card) and must be connected independently (via differerent switches) so that no single physical failure can cause both to go down at the same time and therefore if VCS on SYS2 sees both heartbeats go down at the same time, VCS knows server went down and will failover service groups over
Mike