Forum Discussion

Shivam_HCL's avatar
Shivam_HCL
Level 3
12 years ago

Service group does not fail over on another node on force power down.

VCS 6.0.1

Hi i have configured a two node cluster with local storage and running two service groups. They both running fine and i am able to switch over them to any node on the cluster but when i forcely power down a node where both service groups are active, just one service group fails over to another node and the one running apache resource gets faild and do not fail over.

below pasted the contents of main.cf file.

==========================================

 

 

cat /etc/VRTSvcs/conf/config/main.cf
include "OracleASMTypes.cf"
include "types.cf"
include "Db2udbTypes.cf"
include "OracleTypes.cf"
include "SybaseTypes.cf"
 
cluster mycluster (
        UserNames = { admin = IJKcJEjGKfKKiSKeJH, root = ejkEjiIhjKjeJh }
        ClusterAddress = "192.168.25.101"
        Administrators = { admin, root }
        )
 
system server3 (
        )
 
system server4 (
        )
 
group ClusterService (
        SystemList = { server3 = 0, server4 = 1 }
        AutoStartList = { server3, server4 }
        OnlineRetryLimit = 3
        OnlineRetryInterval = 120
        )
 
        IP webip (
                Device = eth0
                Address = "192.168.25.101"
                NetMask = "255.255.255.0"
                )
 
        NIC csgnic (
                Device = eth0
                )
 
        webip requires csgnic
 
 
        // resource dependency tree
        //
        //      group ClusterService
        //      {
        //      IP webip
        //          {
        //          NIC csgnic
        //          }
        //      }
 
 
group httpsg (
        SystemList = { server3 = 0, server4 = 1 }
        AutoStartList = { server3, server4 }
        OnlineRetryLimit = 3
        OnlineRetryInterval = 15
        )
 
        Apache apachenew (
                httpdDir = "/usr/sbin"
                ConfigFile = "/etc/httpd/conf/httpd.conf"
                )
 
        IP ipresource (
                Device = eth0
                Address = "192.168.25.102"
                NetMask = "255.255.255.0"
                )
 
        apachenew requires ipresource
 
 
        // resource dependency tree
        //
        //      group httpsg
        //      {
        //      Apache apachenew
        //          {
        //          IP ipresource
        //          }
        //      }
#
=====================
 

engine logs while the powerdown occurs says -

 

 

2013/03/12 16:33:02 VCS INFO V-16-1-10077 Received new cluster membership
2013/03/12 16:33:02 VCS NOTICE V-16-1-10112 System (server3) - Membership: 0x1, DDNA: 0x0
2013/03/12 16:33:02 VCS ERROR V-16-1-10079 System server4 (Node '1') is in Down State - Membership: 0x1
2013/03/12 16:33:02 VCS ERROR V-16-1-10322 System server4 (Node '1') changed state from RUNNING to FAULTED
2013/03/12 16:33:02 VCS NOTICE V-16-1-10449 Group httpsg autodisabled on node server4 until it is probed
2013/03/12 16:33:02 VCS NOTICE V-16-1-10449 Group VCShmg autodisabled on node server4 until it is probed
2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group ClusterService is offline on system server4
2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group httpsg is offline on system server4
2013/03/12 16:33:02 VCS ERROR V-16-1-10205 Group ClusterService is faulted on system server4
2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group ClusterService is offline on system server4
2013/03/12 16:33:02 VCS INFO V-16-1-10493 Evaluating server3 as potential target node for group ClusterService
2013/03/12 16:33:02 VCS INFO V-16-1-10493 Evaluating server4 as potential target node for group ClusterService
2013/03/12 16:33:02 VCS INFO V-16-1-10494 System server4 not in RUNNING state
2013/03/12 16:33:02 VCS NOTICE V-16-1-10301 Initiating Online of Resource webip (Owner: Unspecified, Group: ClusterService) on System server3
2013/03/12 16:33:02 VCS WARNING V-16-1-11141 LLT heartbeat link status changed. Previous status =eth1, UP; Current status =eth1, DOWN.
2013/03/12 16:33:02 VCS INFO V-16-6-15015 (server3) hatrigger:/opt/VRTSvcs/bin/triggers/sysoffline is not a trigger scripts directory or can not be executed
2013/03/12 16:33:14 VCS INFO V-16-1-10298 Resource webip (Owner: Unspecified, Group: ClusterService) is online on server3 (VCS initiated)
2013/03/12 16:33:14 VCS NOTICE V-16-1-10447 Group ClusterService is online on system server3
 
as per the above logs, the default SG ClusterService has been failed over to another node but SG httpsg faild.
 
please suggest on it.
 
Thanks....
 
 
  • Just missed your email - so I can now see you have only defined one heartbeat and this means VCS cannot detect between eth1 failure and system failure and therefore it will not failover any service groups (apart from ClusterService), so you need to have at least 2 heartbeats (which need to be independent in a live cluster - i.e not a dual-port card, but this is ok for testing)

    Mike

16 Replies