cancel
Showing results for 
Search instead for 
Did you mean: 

Failover to secondary system upon ungraceful shutdown

mkruer
Level 4

I have configured a cluster with monitored services and it works as intended with one exception. If I hard/ungraceful power down, cluster does not fail over to the working system and start the services automatically. Below is the script that I use to create the cluster. I am hoping I just missed a setting.

hastatus displayes the following last message

App_Cluster     SYSTEM_1   *FAULTED* OFFLINE

If i bring the downed system backup up then the system decides to fail over the the other box.

This is the script that I use to configure the cluster

haconf -makerw
haclus -modify  PrintMsg 0
haclus -modify  UserNames  admin xxxxxxxxxxxxx
haclus -modify  ClusterAddress "127.0.0.1"
haclus -modify  Administrators  admin

haclus -modify  SourceFile "./main.cf"
haclus -modify  ClusterName CMSApp

 

hasys -add SYSTEM_1
hasys -modify SYSTEM_1 SourceFile "./main.cf"
hasys -add SYSTEM_2
hasys -modify SYSTEM_2 SourceFile "./main.cf"
hagrp -add APP_CLUSTER
hagrp -modify APP_CLUSTER SystemList  SYSTEM_1 0 SYSTEM_2 1

hagrp -modify APP_CLUSTER VCSi3Info  "" "" -sys SYSTEM_1
hagrp -modify APP_CLUSTER AutoStartList  SYSTEM_1 SYSTEM_2

 

hares -add Virtual_IP IP APP_CLUSTER
hares -local Virtual_IP Device
hares -modify Virtual_IP Device eth0 -sys SYSTEM_1
hares -modify Virtual_IP Device eth0 -sys SYSTEM_2
hares -modify Virtual_IP Address "192.168.0.3"
hares -modify Virtual_IP NetMask "255.255.255.0"
hares -modify Virtual_IP PrefixLen 1000
hares -modify Virtual_IP Enabled 1
hares -add Network_Card NIC APP_CLUSTER
hares -local Network_Card Device
hares -modify Network_Card Device eth0 -sys SYSTEM_1
hares -modify Network_Card Device eth0 -sys SYSTEM_2
hares -modify Network_Card PingOptimize 1
hares -modify Network_Card Mii 1
hares -modify Network_Card Enabled 1

thx

 

1 ACCEPTED SOLUTION

Accepted Solutions

mikebounds
Level 6
Partner Accredited

Your issue is that you only have 1 heartbeat.  If you run gabconfig -a, you will see the output say jeopardy - this means there is only 1 heartbeat remaining and so from SYS2's point of view, if it looses heartbeat with SYS1 it doesn't know if a single network card failed (or other single network failures like a switch died), or if the server died - as it can't tell the difference, VCS on SYS2 will not take the decision to fail service groups over.

What you need to do is setup eth0 as a lowpri heartbeat so add following line to llttab:

link-lowpri eth0 eth-00:50:56:91:03:30 - ether - -

but replacing with MAC address for eth0 on each system

eth0 and eth1 must be independent (not a dual port card) and must be connected independently (via differerent switches) so that no single physical failure can cause both to go down at the same time and therefore if VCS on SYS2 sees both heartbeats go down at the same time, VCS knows server went down and will failover service groups over

Mike

View solution in original post

8 REPLIES 8

mikebounds
Level 6
Partner Accredited

One possible problem is that the cluster nodes are the only devices on the 192.168.0 network.  By default the NIC agent pings the broadcast address and looks for repsonses from nodes other than itself.  So suppose IP on sys1 is 192.168.0.1 and on sys2 it is 192.168.0.2, then when sys2 tests NIC, it only gets a response from sys1.  When sys1 dies then NIC on sys2 can no longer ping any addresses (other than itself) and so won't allow service group to online until sys1 comes back so that NIC gets a response.

If this is the issue then you need something else on the nework like a gateway that response to broadcast pings or you can (preferably) put IP of gateway in NetworkHosts attribute of NIC agent (or you could just delete NIC resource if this is just a test so that you just have IP resource)

Best way to see what is happening is to look at engine log, so please post extract from /var/VRTSvcs/log/engine_A.log showing what is shown in the log after sys1 faults.

Mike

mkruer
Level 4

Below is the event list of what happens after the forcable power off the unit.

I am runnign this in an ESXi VM enviroment, with 2 nics one for the IP and the VIP and the other nic is for the heartbeat. But i odnt see how that would effect the failover, system 2 should see that system 1 is off and start up the services.

2012/07/12 21:33:49 VCS INFO V-16-1-10077 Received new cluster membership

2012/07/12 21:33:49 VCS NOTICE V-16-1-10112 System (SYSTEM_2) - Membership: 0x2, DDNA: 0x0

2012/07/12 21:33:49 VCS ERROR V-16-1-10079 System SYSTEM_1 (Node '0') is in Down State - Membership: 0x2

2012/07/12 21:33:49 VCS ERROR V-16-1-10322 System SYSTEM_1 (Node '0') changed state from RUNNING to FAULTED

2012/07/12 21:33:49 VCS NOTICE V-16-1-10086 System SYSTEM_2 (Node '1') is in Regular Membership - Membership: 0x2

2012/07/12 21:33:49 VCS NOTICE V-16-1-10449 Group APP_CLUSTER autodisabled on node SYSTEM_1 until it is probed

2012/07/12 21:33:49 VCS NOTICE V-16-1-10449 Group VCShmg autodisabled on node SYSTEM_1 until it is probed

2012/07/12 21:33:49 VCS NOTICE V-16-1-10446 Group APP_CLUSTER is offline on system SYSTEM_1

2012/07/12 21:33:54 VCS WARNING V-16-1-11141 LLT heartbeat link status changed. Previous status = eth1 UP; Current status = eth1 DOWN.

mikebounds
Level 6
Partner Accredited

"Membership: 0x2, DDNA: 0x0" means "Daemon Down Node Alive", so VCS thinks sys1 is still up but "had" daemon is down, which is why service groups are not failed over. 

Mike

joseph_dangelo
Level 6
Employee Accredited

Can you post your /etc/llttab file. Based on your comments you inicated there is only a single NIC assigned for LLT.  That is not supported for a variety of reasons.  Most important of which is SG failover behavior during "Jeopardy State".  When a node looses all but one LLT link, it enters into a Jeopardy membership with it's corresponding nodes. At which point If the remaing Heartbeat link were to also fail, VCS cannot differentiate that between a complete system crash.  As such the SG's will become autodisable and will not failover as means to prevent Split Brain  data corruption.

That said, DDNA will also autodiable the service groups. 

During a DDNA condition, the Cluster understands that the faulting node is up, at least at the network layer, due to the continued operation of LLT/GAB heartbeats. The Cluster has no idea of any other information regarding the node health, as in what storage is mounted and which applications are still up, or any state of the service groups on that node. The Cluster will then place the affected node in the autodisabled state.  DDNA conditions run a high risk of turning into a “split-brain” scenario. This is a situation in which two, or more, Cluster nodes are not properly communicating, and can run the risk of data corruption as both nodes can potentially attempt to perform inconsistent, non-isolated I/O on the same files.

VCS does not have the proper information to make a failover decision due to the half-frozen node’s loss of HAD. If VCS simply failed over the applications to a free node, it has no way of knowing if the DDNA node still has those applications up, with storage mounted.

Joe D

 

 

 

 

 

mikebounds
Level 6
Partner Accredited

Note the reason you may be getting DDNA, may be the way you "forcable power off the unit" - I have seen users trying to simulate this my powering down system board and sometimes this causes the server to go down relativey slowly, meaning the processes disapear first, but the network stays up for a few seconds which causes DDNA.  You can test this by setting a ping going when you "forcable power off the unit" to  check the ping stops responding imediately after powering off server.  You can also compare timings in system/console messages with engine log which may show you that LLT links were still up when engine log said "had" daemon was down.

Mike

mkruer
Level 4

@jdangelo_symc

 SYSTEM_1

eth0      Link encap:Ethernet  HWaddr 00:50:56:91:03:2D

          inet addr:10.116.49.207  Bcast:10.116.49.255  Mask:255.255.255.0

          inet6 addr: fe80::250:56ff:fe91:32d/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:799259 errors:0 dropped:0 overruns:0 frame:0

          TX packets:1028737 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:1169698499 (1.0 GiB)  TX bytes:724777273 (691.2 MiB)

 

eth0:0    Link encap:Ethernet  HWaddr 00:50:56:91:03:2D

          inet addr:10.116.49.208  Bcast:0.0.0.0  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

 

eth1      Link encap:Ethernet  HWaddr 00:50:56:91:03:2E

          inet6 addr: fe80::250:56ff:fe91:32e/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:220927 errors:0 dropped:0 overruns:0 frame:0

          TX packets:193003 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:20325936 (19.3 MiB)  TX bytes:21080598 (20.1 MiB)

 

set-node cmsapp-49-207

set-cluster 1

link eth1 eth-00:50:56:91:03:2E - ether - -

 

---------------------------------------------------

 

SYSTEM_2

eth0      Link encap:Ethernet  HWaddr 00:50:56:91:03:2F

          inet addr:10.116.49.209  Bcast:10.116.49.255  Mask:255.255.255.0

          inet6 addr: fe80::250:56ff:fe91:32f/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:769222 errors:0 dropped:0 overruns:0 frame:0

          TX packets:991209 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:1337447721 (1.2 GiB)  TX bytes:826631384 (788.3 MiB)

 

eth1      Link encap:Ethernet  HWaddr 00:50:56:91:03:30

          inet6 addr: fe80::250:56ff:fe91:330/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:258702 errors:0 dropped:0 overruns:0 frame:0

          TX packets:182403 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:27910461 (26.6 MiB)  TX bytes:17055765 (16.2 MiB)

 

set-node cmsapp-49-209

set-cluster 1

link eth1 eth-00:50:56:91:03:30 - ether - -

 

@mikebounds

When I power off the VM system via the vshpere client I instantly loose connection, no waiting. This is about as close as you can get to a power failure in a ESXi environment.

Also on the environment, I am using the same vlan for both networks.

 

mikebounds
Level 6
Partner Accredited

Your issue is that you only have 1 heartbeat.  If you run gabconfig -a, you will see the output say jeopardy - this means there is only 1 heartbeat remaining and so from SYS2's point of view, if it looses heartbeat with SYS1 it doesn't know if a single network card failed (or other single network failures like a switch died), or if the server died - as it can't tell the difference, VCS on SYS2 will not take the decision to fail service groups over.

What you need to do is setup eth0 as a lowpri heartbeat so add following line to llttab:

link-lowpri eth0 eth-00:50:56:91:03:30 - ether - -

but replacing with MAC address for eth0 on each system

eth0 and eth1 must be independent (not a dual port card) and must be connected independently (via differerent switches) so that no single physical failure can cause both to go down at the same time and therefore if VCS on SYS2 sees both heartbeats go down at the same time, VCS knows server went down and will failover service groups over

Mike

mkruer
Level 4

@mikebounds

Adding a second set of IP and placing them in a diffrent vlan on the EXSi system worked. I still need to do some more testing.