Forum Discussion

Xentar's avatar
Xentar
Level 4
14 years ago

IPMultiNICB with Solaris ipmp probe base for CFS 5.1

The environment as the following: cfs 5.1, on Solaris 10 u8, with the latest patches We did configure multinicb with Solaris IPMP probe based ip, and reside a VIP with resource type ipmultinicb. When we try to pull both Heartbeats of the system, one of the node become panic and reboot. We did expect the VIP should failover to the surival node. However, it shows the message "v-16-10001-5013 IPMultiNICB:vip1:online: This IP address is configured elsewhere. Will not online" and not able to online. So, any resolution for the case as the above.

  • Pulling out both heartbeats simultaneously is a sure way of attempting to force a concurrency violation, not a failover.

    Please read this section in the VCS UG:
    (Please read the entire chapter 10 called About communications, membership, and data protection in the cluster)

    About cluster membership and data protection without I/O fencing
    Proper seeding of the cluster and the use of low priority heartbeat cluster interconnect links are best practices with or without the use of I/O fencing. Best practice also recommends multiple cluster interconnect links between systems in the cluster. This allows GAB to differentiate between:
    A loss of all heartbeat links simultaneously, which is interpreted as a system failure. In this case, depending on failover configuration, HAD may attempt to restart the services that were running on that system on another system.
    ■ A loss of all heartbeat links over time, which is interpreted as an interconnect failure. In this case, the assumption is made that there is a
    high probability that the system is not down, and HAD does not attempt to restart the services on another system.

    In order for this differentiation to have meaning, it is important to ensure the cluster interconnect links do not have a single point of failure, such as a network hub or ethernet card.

    The system panic is a way to prevent concurrency violation.

    If you want to test system failure, pull out power cord on one node.

3 Replies

  • Hi Gaurav,

    Please find the following main.cf and ifconfig -a:

    include "OracleASMTypes.cf"
    include "types.cf"
    include "CFSTypes.cf"
    include "CVMTypes.cf"
    include "Db2udbTypes.cf"
    include "OracleTypes.cf"
    include "SybaseTypes.cf"

    cluster web (
     UserNames = { admin = dlmElgLimHmmKumGlj }
     Administrators = { admin }
     UseFence = SCSI3
     HacliUserLevel = COMMANDROOT
     )
    system node1 (
     )
    system node2 (
     )
    group cfs (
     SystemList = { node1 = 0, node2 = 1 }
     Parallel = 1
     AutoStartList = { node1, node2 }
     )
     CFSMount CFSMount_bf (
      MountPoint = "/bf"
      BlockDevice = "/dev/vx/dsk/datadg/vol01"
      MountOpt = largefiles
      )
     CVMVolDg CVMvolDg01 (
      CVMDiskGroup = datadg
      CVMVolume = { vol01 }
      CVMActivation = sw
      )
     requires group cvm online local firm
     CFSMount_bf requires CVMvolDg01

     // resource dependency tree
     //
     // group cfs
     // {
     // CFSMount CFSMount_bf
     //     {
     //     CVMVolDg CVMvolDg01
     //     }
     // }

    group cvm (
     SystemList = { node1 = 0, node2 = 1 }
     AutoFailOver = 0
     Parallel = 1
     AutoStartList = { node1, node2 }
     )
     CFSfsckd vxfsckd (
      )
     CVMCluster cvm_clus (
      CVMClustName = web
      CVMNodeId = { node1 = 0, node2 = 1 }
      CVMTransport = gab
      CVMTimeout = 200
      )
     CVMVxconfigd cvm_vxconfigd (
      Critical = 0
      CVMVxconfigdArgs = { syslog }
      )
     cvm_clus requires cvm_vxconfigd
     vxfsckd requires cvm_clus

     // resource dependency tree
     //
     // group cvm
     // {
     // CFSfsckd vxfsckd
     //     {
     //     CVMCluster cvm_clus
     //         {
     //         CVMVxconfigd cvm_vxconfigd
     //         }
     //     }
     // }

    group vcs_vip1 (
     SystemList = { node1 = 0, node2 = 1 }
     AutoStartList = { node1, node2 }
     )
     IPMultiNICB vip1 (
      BaseResName = MultiNICB_204
      Address = "192.168.204.135"
      NetMask = "255.255.255.0"
      )
     MultiNICB MultiNICB_204 (
      UseMpathd = 1
      MpathdCommand = "/usr/lib/inet/in.mpathd -a"
      ConfigCheck = 0
      Device = { nxge0 = "", e1000g0 = "" }
      DefaultRouter = "192.168.204.10"
      GroupName = ipmp0
      )
     requires group cfs online local firm
     vip1 requires MultiNICB_204

     // resource dependency tree
     //
     // group vcs_vip1
     // {
     // IPMultiNICB vip1
     //     {
     //     MultiNICB MultiNICB_204
     //     }
     // }


    ifconfig -a for node1:
    ================
    lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
     inet 127.0.0.1 netmask ff000000
    e1000g0: flags=69040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 1500 index 2
     inet 192.168.204.147 netmask ffffff00 broadcast 192.168.204.255
     groupname ipmp0
     ether 0:c0:dd:14:1:14
    nxge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
     inet 192.168.204.69 netmask ffffff00 broadcast 192.168.204.255
     groupname ipmp0
     ether 0:21:28:84:ba:50
    nxge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
     inet 192.168.204.146 netmask ffffff00 broadcast 192.168.204.255
    nxge0:2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
     inet 192.168.204.135 netmask ffffff00 broadcast 192.168.204.255

    ifconifg -a for node2:
    ================
    lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
     inet 127.0.0.1 netmask ff000000
    e1000g0: flags=69040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 1500 index 2
     inet 192.168.204.149 netmask ffffff00 broadcast 192.168.204.255
     groupname ipmp0
     ether 0:c0:dd:14:6:f0
    nxge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
     inet 192.168.204.70 netmask ffffff00 broadcast 192.168.204.255
     groupname ipmp0
     ether 0:21:28:84:ba:b0
    nxge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
     inet 192.168.204.148 netmask ffffff00 broadcast 192.168.204.255

  • Pulling out both heartbeats simultaneously is a sure way of attempting to force a concurrency violation, not a failover.

    Please read this section in the VCS UG:
    (Please read the entire chapter 10 called About communications, membership, and data protection in the cluster)

    About cluster membership and data protection without I/O fencing
    Proper seeding of the cluster and the use of low priority heartbeat cluster interconnect links are best practices with or without the use of I/O fencing. Best practice also recommends multiple cluster interconnect links between systems in the cluster. This allows GAB to differentiate between:
    A loss of all heartbeat links simultaneously, which is interpreted as a system failure. In this case, depending on failover configuration, HAD may attempt to restart the services that were running on that system on another system.
    ■ A loss of all heartbeat links over time, which is interpreted as an interconnect failure. In this case, the assumption is made that there is a
    high probability that the system is not down, and HAD does not attempt to restart the services on another system.

    In order for this differentiation to have meaning, it is important to ensure the cluster interconnect links do not have a single point of failure, such as a network hub or ethernet card.

    The system panic is a way to prevent concurrency violation.

    If you want to test system failure, pull out power cord on one node.