Forum Discussion

genesisb's avatar
genesisb
Level 2
13 years ago

Node Panic: VXFEN Critical

Hello, 

I have cluster system with two nodes, installed in a cluster configuration with some parallel service groups. I'm using I/O fencing and it's working fine for all failover tests but not when I reboot one server.

The problem I'm facing is when I try to perform a failover test performing an "init 6/reboot" in one of the servers (e.g. Server 2). When I do the restart, the cluster starts to switch service groups to the other server (e.g. Server 1) but before it finishes the server (Server1) crashes or panics and reboot.

Oct 25 15:35:31 mm7node2 genunix: NOTICE: VXFEN WARNING V-11-1-65 Could not eject node 0 from disk
Oct 25 15:35:31 mm7node2         with serial number 60060160F83A2D007C703F5839E4E011 since
Oct 25 15:35:31 mm7node2         keys of node 1 are not registered with it

panic[cpu0]/thread=fffffe8001bc8c60: VXFEN CRITICAL V-11-1-20 Local cluster node ejected from cluster to prevent potential data corruption.
fffffe8001bc87f0 vxfen:vxfen_plat_panic+e7 ()
fffffe8001bc88b0 vxfen:vxfen_grab_coord_disks+b46 ()

 

When the servers reboot, they are in a split-brain condition and I can only make it work again performing a vxfenclearpre.

Do you have an idea about what could be causing it? Any suggestion to fix the issue?

 

Thanks.

  • It would appear from your post that CVM and Fencing are actually performing exactly as expected.  When you use the reboot command (Solaris) with a CFS Cluster, all of the RC scripts for shutdown are completely bypassed.  As a result, in a 2 node cluster VCS can not differentiate between a loss of hearbeats and a complete system crash.  Either way the behavior of fencing is to protect the file systems from data corruption (in the event of a loss of communication with the corresponding node) by ejecting the offending node from cluster. That being said, the remaining node should still be active after having won the "Fencing Race" for the coordinator disks.

    Here is some further elaboration on the error codes:

    https://sort.symantec.com/ecls/umi/V-11-1-65

    https://sort.symantec.com/ecls/umi/V-11-1-20

    Can you post a copy of your main.cf?  You say that during the shutdown that some of the service group attempt to failover. I would assume then that you have both parallel and failover Service Groups configured. 

    Joe D

  • Yes during the shutdown, the service group attempts to failover to the other node. It is a symmetric cluster.

    Here's the main.cf:

     

    include "types.cf"
    include "CFSTypes.cf"
    include "CVMTypes.cf"

    cluster MediatorCluster (
            UserNames = { vcsguest = cD2a90jzh1hgg, vcsop = j5rBONYy1OtL6,
                     vcsadm = "sEFEKH1CaHW5.",
                     root = dOOtOJoIPhOUnJOvOM }
            Administrators = { vcsadm, root }
            UseFence = SCSI3
            )

    system mm7node1 (
            )

    system mm7node2 (
            )

    group FMMgrp (
            SystemList = { mm7node1 = 2, mm7node2 = 1 }
            AutoStartList = { mm7node2 }
            )

            IPMultiNIC FMMgrp_IP (
                    Address = "172.26.96.15"
                    NetMask = "255.255.255.224"
                    MultiNICResName = MultiNICA
                    IfconfigTwice = 1
                    )

            ORACLE fmm (
                    )

            Proxy FMMgrp_NIC_PROXY (
                    TargetResName = MultiNICA
                    )

            Tomcat fmmweb (
                    )

            requires group SENTINELgrp online global soft
            FMMgrp_IP requires FMMgrp_NIC_PROXY
            fmm requires FMMgrp_IP
            fmmweb requires fmm


            // resource dependency tree
            //
            //      group FMMgrp
            //      {
            //      Tomcat fmmweb
            //          {
            //          ORACLE fmm
            //              {
            //              IPMultiNIC FMMgrp_IP
            //                  {
            //                  Proxy FMMgrp_NIC_PROXY
            //                  }
            //              }
            //          }
            //      }


    group Mediator1 (
            SystemList = { mm7node1 = 1, mm7node2 = 2 }
            AutoStartList = { mm7node1 }
            )

            AlarmIRP AlarmIRP (
                    )

            IPMultiNIC Mediator1_IP (
                    Address = "172.26.96.8"
                    NetMask = "255.255.255.224"
                    MultiNICResName = MultiNICA
                    IfconfigTwice = 1
                    )

            Mediator Server1 (
                    )

            Mediator Server2 (
                    )

            Mediator ServerTest (
                    )

            NameService NameService (
                    )

            OSAgent OSAgent (
                    )

            Proxy Mediator1_NIC_PROXY (
                    TargetResName = MultiNICA
                    )

            VisiNotify VisiNotify (
                    )

            requires group ServerGroup1_DG online local firm
            AlarmIRP requires Mediator1_IP
            AlarmIRP requires VisiNotify
            Mediator1_IP requires Mediator1_NIC_PROXY
            NameService requires OSAgent
            Server1 requires Mediator1_IP
            Server2 requires Mediator1_IP
            ServerTest requires Mediator1_IP
            VisiNotify requires NameService


            // resource dependency tree
            //
            //      group Mediator1
            //      {
            //      AlarmIRP AlarmIRP
            //          {
            //          IPMultiNIC Mediator1_IP
            //              {
            //              Proxy Mediator1_NIC_PROXY
            //              }
            //          VisiNotify VisiNotify
            //              {
            //              NameService NameService
            //                  {
            //                  OSAgent OSAgent
            //                  }
            //              }
            //          }
            //      Mediator Server1
            //          {
            //          IPMultiNIC Mediator1_IP
            //              {
            //              Proxy Mediator1_NIC_PROXY
            //              }
            //          }
            //      Mediator Server2
            //          {
            //          IPMultiNIC Mediator1_IP
            //              {
            //              Proxy Mediator1_NIC_PROXY
            //              }
            //          }
            //      Mediator ServerTest
            //          {
            //          IPMultiNIC Mediator1_IP
            //              {
            //              Proxy Mediator1_NIC_PROXY
            //              }
            //          }
            //      }


    group Mediator2 (
            SystemList = { mm7node1 = 2, mm7node2 = 1 }
            AutoStartList = { mm7node2 }
            )

            IPMultiNIC Mediator2_IP (
                    Address = "172.26.96.9"
                    NetMask = "255.255.255.224"
                    MultiNICResName = MultiNICA
                    IfconfigTwice = 1
                    )

            Mediator Server3 (
                    )

            Mediator Server4 (
                    )

            Mediator Server5 (
                    )

            Proxy Mediator2_NIC_PROXY (
                    TargetResName = MultiNICA
                    )

            requires group ServerGroup1_DG online local firm
            Mediator2_IP requires Mediator2_NIC_PROXY
            Server3 requires Mediator2_IP
            Server4 requires Mediator2_IP
            Server5 requires Mediator2_IP


            // resource dependency tree
            //
            //      group Mediator2
            //      {
            //      Mediator Server3
            //          {
            //          IPMultiNIC Mediator2_IP
            //              {
            //              Proxy Mediator2_NIC_PROXY
            //              }
            //          }
            //      Mediator Server4
            //          {
            //          IPMultiNIC Mediator2_IP
            //              {
            //              Proxy Mediator2_NIC_PROXY
            //              }
            //          }
            //      Mediator Server5
            //          {
            //          IPMultiNIC Mediator2_IP
            //              {
            //              Proxy Mediator2_NIC_PROXY
            //              }
            //          }
            //      }


    group Network (
            SystemList = { mm7node1 = 1, mm7node2 = 2 }
            Parallel = 1
            AutoStartList = { mm7node1, mm7node2 }
            )

            MultiNICA MultiNICA (
                    Device @mm7node1 = { igb0 = "172.26.96.2", igb1 = "172.26.96.4" }
                    Device @mm7node2 = { igb0 = "172.26.96.3", igb1 = "172.26.96.5" }
                    NetMask = "255.255.255.224"
                    RouteOptions = "172.26.96.1"
                    IfconfigTwice = 1
                    NetworkHosts = { "172.26.96.1", "172.26.96.0" }
                    )

            Phantom Phantom (
                    )



            // resource dependency tree
            //
            //      group Network
            //      {
            //      MultiNICA MultiNICA
            //      Phantom Phantom
            //      }


    group Oracle1 (
            SystemList = { mm7node1 = 1, mm7node2 = 2 }
            AutoStartList = { mm7node1 }
            )

            IPMultiNIC Oracle1_IP (
                    Address = "172.26.96.10"
                    NetMask = "255.255.255.224"
                    MultiNICResName = MultiNICA
                    IfconfigTwice = 1
                    )

            ORACLE bgw (
                    )

            Proxy Oracle1_NIC_PROXY (
                    TargetResName = MultiNICA
                    )

            requires group ora_DG online local firm
            Oracle1_IP requires Oracle1_NIC_PROXY
            bgw requires Oracle1_IP


            // resource dependency tree
            //
            //      group Oracle1
            //      {
            //      ORACLE bgw
            //          {
            //          IPMultiNIC Oracle1_IP
            //              {
            //              Proxy Oracle1_NIC_PROXY
            //              }
            //          }
            //      }


    group SENTINELgrp (
            SystemList = { mm7node1 = 2, mm7node2 = 1 }
            AutoStartList = { mm7node2 }
            )

            IPMultiNIC SENTINELgrp_IP (
                    Address = "172.26.96.16"
                    NetMask = "255.255.255.224"
                    MultiNICResName = MultiNICA
                    IfconfigTwice = 1
                    )

            Proxy SENTINELgrp_NIC_PROXY (
                    TargetResName = MultiNICA
                    )

            Sentinel licserv (
                    )

            requires group lic_DG online local firm
            SENTINELgrp_IP requires SENTINELgrp_NIC_PROXY
            SENTINELgrp_IP requires licserv


            // resource dependency tree
            //
            //      group SENTINELgrp
            //      {
            //      IPMultiNIC SENTINELgrp_IP
            //          {
            //          Proxy SENTINELgrp_NIC_PROXY
            //          Sentinel licserv
            //          }
            //      }


    group SNMPMasterAgent (
            SystemList = { mm7node1 = 0, mm7node2 = 1 }
            Parallel = 1
            AutoStartList = { mm7node1, mm7node2 }
            )

            SNMPMasterAgent SNMPMasterAgent (
                    )



            // resource dependency tree
            //
            //      group SNMPMasterAgent
            //      {
            //      SNMPMasterAgent SNMPMasterAgent
            //      }


    group ServerGroup1_DG (
            SystemList = { mm7node1 = 0, mm7node2 = 1 }
            AutoFailOver = 0
            Parallel = 1
            AutoStartList = { mm7node1, mm7node2 }
            )

            CFSMount cfsmount1 (
                    Critical = 0
                    MountPoint = "/var/opt/BGw/ServerGroup1"
                    BlockDevice = "/dev/vx/dsk/bgw1dg/vol01"
                    MountOpt @mm7node1 = "cluster"
                    MountOpt @mm7node2 = "cluster"
                    NodeList = { mm7node1, mm7node2 }
                    )

            CVMVolDg cvmvoldg1 (
                    Critical = 0
                    CVMDiskGroup = bgw1dg
                    CVMActivation @mm7node1 = sw
                    CVMActivation @mm7node2 = sw
                    )

            requires group cvm online local firm
            cfsmount1 requires cvmvoldg1


            // resource dependency tree
            //
            //      group ServerGroup1_DG
            //      {
            //      CFSMount cfsmount1
            //          {
            //          CVMVolDg cvmvoldg1
            //          }
            //      }


    group Storage_DG (
            SystemList = { mm7node1 = 0, mm7node2 = 1 }
            Parallel = 1
            AutoStartList = { mm7node1, mm7node2 }
            )

            CFSMount cfsmount11 (
                    Critical = 0
                    MountPoint = "/Storage1"
                    BlockDevice = "/dev/vx/dsk/store1dg/store1"
                    MountOpt @mm7node1 = "cluster"
                    MountOpt @mm7node2 = "cluster"
                    NodeList = { mm7node1, mm7node2 }
                    )

            CFSMount cfsmount12 (
                    Critical = 0
                    MountPoint = "/Storage2"
                    BlockDevice = "/dev/vx/dsk/store2dg/store2"
                    MountOpt @mm7node1 = "cluster"
                    MountOpt @mm7node2 = "cluster"
                    NodeList = { mm7node1, mm7node2 }
                    )

            CFSMount cfsmount13 (
                    Critical = 0
                    MountPoint = "/Storage3"
                    BlockDevice = "/dev/vx/dsk/store3dg/store3"
                    MountOpt @mm7node1 = "cluster"
                    MountOpt @mm7node2 = "cluster"
                    NodeList = { mm7node1, mm7node2 }
                    )

            CFSMount cfsmount14 (
                    Critical = 0
                    MountPoint = "/Storage4"
                    BlockDevice = "/dev/vx/dsk/store4dg/store4"
                    MountOpt @mm7node1 = "cluster"
                    MountOpt @mm7node2 = "cluster"
                    NodeList = { mm7node1, mm7node2 }
                    )

            CFSMount cfsmount15 (
                    Critical = 0
                    MountPoint = "/Storage5"
                    BlockDevice = "/dev/vx/dsk/store5dg/vol01"
                    MountOpt @mm7node1 = "cluster"
                    MountOpt @mm7node2 = "cluster"
                    NodeList = { mm7node1, mm7node2 }
                    )

            CFSMount cfsmount16 (
                    Critical = 0
                    MountPoint = "/Storage6"
                    BlockDevice = "/dev/vx/dsk/store6dg/vol01"
                    MountOpt @mm7node1 = "cluster"
                    MountOpt @mm7node2 = "cluster"
                    NodeList = { mm7node1, mm7node2 }
                    )

            CVMVolDg cvmvoldg11 (
                    Critical = 0
                    CVMDiskGroup = store1dg
                    CVMActivation @mm7node1 = sw
                    CVMActivation @mm7node2 = sw
                    )

            CVMVolDg cvmvoldg12 (
                    Critical = 0
                    CVMDiskGroup = store2dg
                    CVMActivation @mm7node1 = sw
                    CVMActivation @mm7node2 = sw
                    )

            CVMVolDg cvmvoldg13 (
                    Critical = 0
                    CVMDiskGroup = store3dg
                    CVMActivation @mm7node1 = sw
                    CVMActivation @mm7node2 = sw
                    )

            CVMVolDg cvmvoldg14 (
                    Critical = 0
                    CVMDiskGroup = store4dg
                    CVMActivation @mm7node1 = sw
                    CVMActivation @mm7node2 = sw
                    )

            CVMVolDg cvmvoldg15 (
                    Critical = 0
                    CVMDiskGroup = store5dg
                    CVMActivation @mm7node1 = sw
                    CVMActivation @mm7node2 = sw
                    )

            CVMVolDg cvmvoldg16 (
                    Critical = 0
                    CVMDiskGroup = store6dg
                    CVMActivation @mm7node1 = sw
                    CVMActivation @mm7node2 = sw
                    )

            requires group cvm online local firm
            cfsmount11 requires cvmvoldg11
            cfsmount12 requires cvmvoldg12
            cfsmount13 requires cvmvoldg13
            cfsmount14 requires cvmvoldg14
            cfsmount15 requires cvmvoldg15
            cfsmount16 requires cvmvoldg16


            // resource dependency tree
            //
            //      group Storage_DG
            //      {
            //      CFSMount cfsmount11
            //          {
            //          CVMVolDg cvmvoldg11
            //          }
            //      CFSMount cfsmount12
            //          {
            //          CVMVolDg cvmvoldg12
            //          }
            //      CFSMount cfsmount13
            //          {
            //          CVMVolDg cvmvoldg13
            //          }
            //      CFSMount cfsmount14
            //          {
            //          CVMVolDg cvmvoldg14
            //          }
            //      CFSMount cfsmount15
            //          {
            //          CVMVolDg cvmvoldg15
            //          }
            //      CFSMount cfsmount16
            //          {
            //          CVMVolDg cvmvoldg16
            //          }
            //      }


    group cvm (
            SystemList = { mm7node1 = 0, mm7node2 = 1 }
            AutoFailOver = 0
            Parallel = 1
            AutoStartList = { mm7node1, mm7node2 }
            )

            CFSfsckd vxfsckd (
                    ActivationMode @mm7node1 = { bgw1dg = sw, ora1dg = sw, lic1dg = sw,
                             fmm1dg = sw,
                             store1dg = sw,
                             store2dg = sw,
                             store3dg = sw,
                             store4dg = sw,
                             store5dg = sw,
                             store6dg = sw }
                    ActivationMode @mm7node2 = { bgw1dg = sw, ora1dg = sw, lic1dg = sw,
                             fmm1dg = sw,
                             store1dg = sw,
                             store2dg = sw,
                             store3dg = sw,
                             store4dg = sw,
                             store5dg = sw,
                             store6dg = sw }
                    )

            CVMCluster cvm_clus (
                    CVMClustName = MediatorCluster
                    CVMNodeId = { mm7node1 = 0, mm7node2 = 1 }
                    CVMTransport = gab
                    CVMTimeout = 200
                    )

            CVMVxconfigd cvm_vxconfigd (
                    Critical = 0
                    CVMVxconfigdArgs = { syslog }
                    )

            cvm_clus requires cvm_vxconfigd
            vxfsckd requires cvm_clus


            // resource dependency tree
            //
            //      group cvm
            //      {
            //      CFSfsckd vxfsckd
            //          {
            //          CVMCluster cvm_clus
            //              {
            //              CVMVxconfigd cvm_vxconfigd
            //              }
            //          }
            //      }


    group lic_DG (
            SystemList = { mm7node1 = 0, mm7node2 = 1 }
            AutoFailOver = 0
            Parallel = 1
            AutoStartList = { mm7node1, mm7node2 }
            )

            CFSMount cfsmount5 (
                    Critical = 0
                    MountPoint = "/var/opt/sentinel"
                    BlockDevice = "/dev/vx/dsk/lic1dg/vol01"
                    MountOpt @mm7node1 = "cluster"
                    MountOpt @mm7node2 = "cluster"
                    NodeList = { mm7node1, mm7node2 }
                    )

            CFSMount cfsmount6 (
                    Critical = 0
                    MountPoint = "/var/opt/mediation/fmmdb"
                    BlockDevice = "/dev/vx/dsk/fmm1dg/vol01"
                    MountOpt @mm7node1 = "cluster"
                    MountOpt @mm7node2 = "cluster"
                    NodeList = { mm7node1, mm7node2 }
                    )

            CVMVolDg cvmvoldg5 (
                    Critical = 0
                    CVMDiskGroup = lic1dg
                    CVMActivation @mm7node1 = sw
                    CVMActivation @mm7node2 = sw
                    )

            CVMVolDg cvmvoldg6 (
                    Critical = 0
                    CVMDiskGroup = fmm1dg
                    CVMActivation @mm7node1 = sw
                    CVMActivation @mm7node2 = sw
                    )

            requires group cvm online local firm
            cfsmount5 requires cvmvoldg5
            cfsmount6 requires cvmvoldg6


            // resource dependency tree
            //
            //      group lic_DG
            //      {
            //      CFSMount cfsmount5
            //          {
            //          CVMVolDg cvmvoldg5
            //          }
            //      CFSMount cfsmount6
            //          {
            //          CVMVolDg cvmvoldg6
            //          }
            //      }


    group ora_DG (
            SystemList = { mm7node1 = 0, mm7node2 = 1 }
            AutoFailOver = 0
            Parallel = 1
            AutoStartList = { mm7node1, mm7node2 }
            )

            CFSMount cfsmount4 (
                    Critical = 0
                    MountPoint = "/var/opt/mediation/ora"
                    BlockDevice = "/dev/vx/dsk/ora1dg/vol01"
                    MountOpt @mm7node1 = "cluster"
                    MountOpt @mm7node2 = "cluster"
                    NodeList = { mm7node1, mm7node2 }
                    )

            CVMVolDg cvmvoldg4 (
                    Critical = 0
                    CVMDiskGroup = ora1dg
                    CVMActivation @mm7node1 = sw
                    CVMActivation @mm7node2 = sw
                    )

            requires group cvm online local firm
            cfsmount4 requires cvmvoldg4


            // resource dependency tree
            //
            //      group ora_DG
            //      {
            //      CFSMount cfsmount4
            //          {
            //          CVMVolDg cvmvoldg4
            //          }
            //      }
     

     

    //Genesis

  • Just to explain a little futher:

    If you kill Server 2 instantly (like pull power to it by shutting down systems boards or disconnecting all power supplies) then Server 1 will race for coordinators disks and will win race as Server 2 is down and so Server 1 will online the service groups.

    If you gracefully shutdown Server 2 (using shutdown command), the cluster is shutdown cleaning and so VCS deports diskgroups and reservations are released and Server 1 cleanly takes over.

    If you do something in between like the "reboot" command where it can take a few seconds to go down, but the rc scripts are not called then I think what is happening is:

    1. Processes are killed which means Server 1 races for coordinator disks
    2. Server 2 is not quite down yet and so maybe either Server 2 wins the race or Server 1 can't eject reservation keys on data disks as Server 2 is still using them.

    VCS is there to protect against hardware and software failure and will not always protect you against administrive error and so you should educate administrators to not use "reboot" command (or "suspend" system) and they must use graceful shutdown and ideally they should switch services manually before shutting node down.  

    Reboots and "suspends" are still valid tests to do as in some configurations VCS will help you out, so it is useful to know what you are protected against.

    Mike

  • Thanks for the explanation Mike.

    I went over a test for the instant/abrupt server shutdown but I still get the same result where the other node panics as well which results to split-brain. So I have to clear the VXFEN keys manually to start the cluster again.

    I also get the following messages on the console (from the other node), before the panic.

    Oct 26 18:54:19 mm7node2 scsi: WARNING: /pci@0,0/pci8086,340a@3/pci111d,806c@0/pci111d,806c@2/pci1077,172@0/fp@0,0/disk@w500601613ba04e3b,3 (sd13):
    Oct 26 18:54:19 mm7node2        Error for Command: write(10)               Error Level: Retryable
    Oct 26 18:54:19 mm7node2 scsi:  Requested Block: 5617511                   Error Block: 5617511
    Oct 26 18:54:19 mm7node2 scsi:  Vendor: DGC                                Serial Number: 640000AF4ECL
    Oct 26 18:54:19 mm7node2 scsi:  Sense Key: Unit Attention
    Oct 26 18:54:19 mm7node2 scsi:  ASC: 0x2a (reservations released), ASCQ: 0x4, FRU: 0x0
    Oct 26 18:54:19 mm7node2 scsi: WARNING: /pci@0,0/pci8086,340a@3/pci111d,806c@0/pci111d,806c@2/pci1077,172@0/fp@0,0/disk@w500601613ba04e3b,6 (sd10):
    Oct 26 18:54:19 mm7node2        Error for Command: write(10)               Error Level: Retryable
    Oct 26 18:54:19 mm7node2 scsi:  Requested Block: 162722                    Error Block: 162722
    Oct 26 18:54:19 mm7node2 scsi:  Vendor: DGC                                Serial Number: 670000AF7DCL
    Oct 26 18:54:19 mm7node2 scsi:  Sense Key: Unit Attention
    Oct 26 18:54:19 mm7node2 scsi:  ASC: 0x2a (reservations released), ASCQ: 0x4, FRU: 0x0
    Oct 26 18:54:20 mm7node2 genunix: GAB INFO V-15-1-20036 Port f gen   30290d membership ;1
    Oct 26 18:54:20 mm7node2 genunix: GAB INFO V-15-1-20038 Port f gen   30290d k_jeopardy 0
    Oct 26 18:54:20 mm7node2 genunix: GAB INFO V-15-1-20040 Port f gen   30290d    visible 0
    Oct 26 18:54:20 mm7node2 vxfs: I/O fencing is ON
    Oct 26 18:54:22 mm7node2 scsi: WARNING: /pci@0,0/pci8086,340a@3/pci111d,806c@0/pci111d,806c@2/pci1077,172@0/fp@0,0/disk@w500601613ba04e3b,4 (sd12):
    Oct 26 18:54:22 mm7node2        Error for Command: write(10)               Error Level: Retryable
    Oct 26 18:54:22 mm7node2 scsi:  Requested Block: 5355455                   Error Block: 5355455
    Oct 26 18:54:22 mm7node2 scsi:  Vendor: DGC                                Serial Number: 650000AF60CL

    ........

    panic[cpu4]/thread=fffffe80018f9c60: VXFEN CRITICAL V-11-1-20 Local cluster node
     ejected from cluster to prevent potential data corruption.

    fffffe80018f97f0 vxfen:vxfen_plat_panic+e7 ()
    fffffe80018f98b0 vxfen:vxfen_grab_coord_disks+b46 ()
    fffffe80018f98e0 vxfen:vxfen_grab_coord_pt+d8 ()
    fffffe80018f9920 vxfen:vxfen_msg_node_left_ack+212 ()
    fffffe80018f9970 vxfen:vxfen_process_client_msg+39a ()
    fffffe80018f9aa0 vxfen:vxfen_vrfsm_cback+cfd ()
    fffffe80018f9b40 vxfen:vrfsm_step+42e ()
    fffffe80018f9bc0 vxfen:vrfsm_msg_dispatch+2e9 ()
    fffffe80018f9c40 vxfen:vrfsm_recv_thread+193 ()
    fffffe80018f9c50 unix:thread_start+8 ()

     

    Thanks.

     

    Genesis

  • This is strange, if you kill one node, then the other one should not panic as long as it can grab a majority of the coordinator disks, so you should check that this node is able to grab all the coordinator disks.  Are the disk paths above in the console message (sd13, sd10 & sd12) the 3 coordinator disks (you can use my download at https://www-secure.symantec.com/connect/downloads/relating-devices-reported-solaris-sar-utility-vxvm-disks to find out how these disks map to VM disks).

    Mike

  • I would have to agree with Mike,  it would seem you are actually have disk connectivity issues.  This very well may be the reason the surving node cannot properly take ownership of the fencing disks and essentially both nodes lose the fencing race.

    Do you see the same SCSI errors while both nodes are operational?

    Joe D