Solved: IPMultiNICB with Solaris ipmp probe base for CFS 5...

Xentar · ‎09-06-2010

The environment as the following: cfs 5.1, on Solaris 10 u8, with the latest patches We did configure multinicb with Solaris IPMP probe based ip, and reside a VIP with resource type ipmultinicb. When we try to pull both Heartbeats of the system, one of the node become panic and reboot. We did expect the VIP should failover to the surival node. However, it shows the message "v-16-10001-5013 IPMultiNICB:vip1:online: This IP address is configured elsewhere. Will not online" and not able to online. So, any resolution for the case as the above.

Marianne · ‎09-06-2010

Pulling out both heartbeats simultaneously is a sure way of attempting to force a concurrency violation, not a failover.

Please read this section in the VCS UG:
(Please read the entire chapter 10 called About communications, membership, and data protection in the cluster)

About cluster membership and data protection without I/O fencing
Proper seeding of the cluster and the use of low priority heartbeat cluster interconnect links are best practices with or without the use of I/O fencing. Best practice also recommends multiple cluster interconnect links between systems in the cluster. This allows GAB to differentiate between:
■ A loss of all heartbeat links simultaneously, which is interpreted as a system failure. In this case, depending on failover configuration, HAD may attempt to restart the services that were running on that system on another system.
■ A loss of all heartbeat links over time, which is interpreted as an interconnect failure. In this case, the assumption is made that there is a
high probability that the system is not down, and HAD does not attempt to restart the services on another system.

In order for this differentiation to have meaning, it is important to ensure the cluster interconnect links do not have a single point of failure, such as a network hub or ethernet card.

The system panic is a way to prevent concurrency violation.

If you want to test system failure, pull out power cord on one node.

Handy NetBackup Links

View solution in original post

Gaurav_S · ‎09-06-2010

can u paste the main.cf & output of ifconfig -a ?

Gaurav

Xentar · ‎09-06-2010

Hi Gaurav,

Please find the following main.cf and ifconfig -a:

include "OracleASMTypes.cf"
include "types.cf"
include "CFSTypes.cf"
include "CVMTypes.cf"
include "Db2udbTypes.cf"
include "OracleTypes.cf"
include "SybaseTypes.cf"

cluster web (
UserNames = { admin = dlmElgLimHmmKumGlj }
Administrators = { admin }
UseFence = SCSI3
HacliUserLevel = COMMANDROOT
)

system node1 (
)

system node2 (
)

group cfs (
SystemList = { node1 = 0, node2 = 1 }
Parallel = 1
AutoStartList = { node1, node2 }
)

CFSMount CFSMount_bf (
  MountPoint = "/bf"
  BlockDevice = "/dev/vx/dsk/datadg/vol01"
  MountOpt = largefiles
  )

CVMVolDg CVMvolDg01 (
  CVMDiskGroup = datadg
  CVMVolume = { vol01 }
  CVMActivation = sw
  )

requires group cvm online local firm
CFSMount_bf requires CVMvolDg01

// resource dependency tree
//
// group cfs
// {
// CFSMount CFSMount_bf
//     {
//     CVMVolDg CVMvolDg01
//     }
// }

group cvm (
SystemList = { node1 = 0, node2 = 1 }
AutoFailOver = 0
Parallel = 1
AutoStartList = { node1, node2 }
)

CFSfsckd vxfsckd (
)

CVMCluster cvm_clus (
  CVMClustName = web
  CVMNodeId = { node1 = 0, node2 = 1 }
  CVMTransport = gab
  CVMTimeout = 200
  )

CVMVxconfigd cvm_vxconfigd (
  Critical = 0
  CVMVxconfigdArgs = { syslog }
  )

cvm_clus requires cvm_vxconfigd
vxfsckd requires cvm_clus

// resource dependency tree
//
// group cvm
// {
// CFSfsckd vxfsckd
//     {
//     CVMCluster cvm_clus
//         {
//         CVMVxconfigd cvm_vxconfigd
//         }
//     }
// }

group vcs_vip1 (
SystemList = { node1 = 0, node2 = 1 }
AutoStartList = { node1, node2 }
)

IPMultiNICB vip1 (
  BaseResName = MultiNICB_204
  Address = "192.168.204.135"
  NetMask = "255.255.255.0"
  )

MultiNICB MultiNICB_204 (
  UseMpathd = 1
  MpathdCommand = "/usr/lib/inet/in.mpathd -a"
  ConfigCheck = 0
  Device = { nxge0 = "", e1000g0 = "" }
  DefaultRouter = "192.168.204.10"
  GroupName = ipmp0
  )

requires group cfs online local firm
vip1 requires MultiNICB_204

// resource dependency tree
//
// group vcs_vip1
// {
// IPMultiNICB vip1
//     {
//     MultiNICB MultiNICB_204
//     }
// }

ifconfig -a for node1:
================

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=69040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 1500 index 2
inet 192.168.204.147 netmask ffffff00 broadcast 192.168.204.255
groupname ipmp0
ether 0:c0:dd:14:1:14
nxge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
inet 192.168.204.69 netmask ffffff00 broadcast 192.168.204.255
groupname ipmp0
ether 0:21:28:84:ba:50
nxge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
inet 192.168.204.146 netmask ffffff00 broadcast 192.168.204.255
nxge0:2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
inet 192.168.204.135 netmask ffffff00 broadcast 192.168.204.255

ifconifg -a for node2:
================

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=69040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 1500 index 2
inet 192.168.204.149 netmask ffffff00 broadcast 192.168.204.255
groupname ipmp0
ether 0:c0:dd:14:6:f0
nxge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
inet 192.168.204.70 netmask ffffff00 broadcast 192.168.204.255
groupname ipmp0
ether 0:21:28:84:ba:b0
nxge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
inet 192.168.204.148 netmask ffffff00 broadcast 192.168.204.255

Marianne · ‎09-06-2010

Pulling out both heartbeats simultaneously is a sure way of attempting to force a concurrency violation, not a failover.

Please read this section in the VCS UG:
(Please read the entire chapter 10 called About communications, membership, and data protection in the cluster)

About cluster membership and data protection without I/O fencing
Proper seeding of the cluster and the use of low priority heartbeat cluster interconnect links are best practices with or without the use of I/O fencing. Best practice also recommends multiple cluster interconnect links between systems in the cluster. This allows GAB to differentiate between:
■ A loss of all heartbeat links simultaneously, which is interpreted as a system failure. In this case, depending on failover configuration, HAD may attempt to restart the services that were running on that system on another system.
■ A loss of all heartbeat links over time, which is interpreted as an interconnect failure. In this case, the assumption is made that there is a
high probability that the system is not down, and HAD does not attempt to restart the services on another system.

In order for this differentiation to have meaning, it is important to ensure the cluster interconnect links do not have a single point of failure, such as a network hub or ethernet card.

The system panic is a way to prevent concurrency violation.

If you want to test system failure, pull out power cord on one node.

Handy NetBackup Links

VOX

IPMultiNICB with Solaris ipmp probe base for CFS 5.1