cancel
Showing results for 
Search instead for 
Did you mean: 

Node Panic: VXFEN Critical

genesisb
Level 2

Hello, 

I have cluster system with two nodes, installed in a cluster configuration with some parallel service groups. I'm using I/O fencing and it's working fine for all failover tests but not when I reboot one server.

The problem I'm facing is when I try to perform a failover test performing an "init 6/reboot" in one of the servers (e.g. Server 2). When I do the restart, the cluster starts to switch service groups to the other server (e.g. Server 1) but before it finishes the server (Server1) crashes or panics and reboot.

Oct 25 15:35:31 mm7node2 genunix: NOTICE: VXFEN WARNING V-11-1-65 Could not eject node 0 from disk
Oct 25 15:35:31 mm7node2         with serial number 60060160F83A2D007C703F5839E4E011 since
Oct 25 15:35:31 mm7node2         keys of node 1 are not registered with it

panic[cpu0]/thread=fffffe8001bc8c60: VXFEN CRITICAL V-11-1-20 Local cluster node ejected from cluster to prevent potential data corruption.
fffffe8001bc87f0 vxfen:vxfen_plat_panic+e7 ()
fffffe8001bc88b0 vxfen:vxfen_grab_coord_disks+b46 ()

 

When the servers reboot, they are in a split-brain condition and I can only make it work again performing a vxfenclearpre.

Do you have an idea about what could be causing it? Any suggestion to fix the issue?

 

Thanks.

7 REPLIES 7

joseph_dangelo
Level 6
Employee Accredited

It would appear from your post that CVM and Fencing are actually performing exactly as expected.  When you use the reboot command (Solaris) with a CFS Cluster, all of the RC scripts for shutdown are completely bypassed.  As a result, in a 2 node cluster VCS can not differentiate between a loss of hearbeats and a complete system crash.  Either way the behavior of fencing is to protect the file systems from data corruption (in the event of a loss of communication with the corresponding node) by ejecting the offending node from cluster. That being said, the remaining node should still be active after having won the "Fencing Race" for the coordinator disks.

Here is some further elaboration on the error codes:

https://sort.symantec.com/ecls/umi/V-11-1-65

https://sort.symantec.com/ecls/umi/V-11-1-20

Can you post a copy of your main.cf?  You say that during the shutdown that some of the service group attempt to failover. I would assume then that you have both parallel and failover Service Groups configured. 

Joe D

genesisb
Level 2

Yes during the shutdown, the service group attempts to failover to the other node. It is a symmetric cluster.

Here's the main.cf:

 

include "types.cf"
include "CFSTypes.cf"
include "CVMTypes.cf"

cluster MediatorCluster (
        UserNames = { vcsguest = cD2a90jzh1hgg, vcsop = j5rBONYy1OtL6,
                 vcsadm = "sEFEKH1CaHW5.",
                 root = dOOtOJoIPhOUnJOvOM }
        Administrators = { vcsadm, root }
        UseFence = SCSI3
        )

system mm7node1 (
        )

system mm7node2 (
        )

group FMMgrp (
        SystemList = { mm7node1 = 2, mm7node2 = 1 }
        AutoStartList = { mm7node2 }
        )

        IPMultiNIC FMMgrp_IP (
                Address = "172.26.96.15"
                NetMask = "255.255.255.224"
                MultiNICResName = MultiNICA
                IfconfigTwice = 1
                )

        ORACLE fmm (
                )

        Proxy FMMgrp_NIC_PROXY (
                TargetResName = MultiNICA
                )

        Tomcat fmmweb (
                )

        requires group SENTINELgrp online global soft
        FMMgrp_IP requires FMMgrp_NIC_PROXY
        fmm requires FMMgrp_IP
        fmmweb requires fmm


        // resource dependency tree
        //
        //      group FMMgrp
        //      {
        //      Tomcat fmmweb
        //          {
        //          ORACLE fmm
        //              {
        //              IPMultiNIC FMMgrp_IP
        //                  {
        //                  Proxy FMMgrp_NIC_PROXY
        //                  }
        //              }
        //          }
        //      }


group Mediator1 (
        SystemList = { mm7node1 = 1, mm7node2 = 2 }
        AutoStartList = { mm7node1 }
        )

        AlarmIRP AlarmIRP (
                )

        IPMultiNIC Mediator1_IP (
                Address = "172.26.96.8"
                NetMask = "255.255.255.224"
                MultiNICResName = MultiNICA
                IfconfigTwice = 1
                )

        Mediator Server1 (
                )

        Mediator Server2 (
                )

        Mediator ServerTest (
                )

        NameService NameService (
                )

        OSAgent OSAgent (
                )

        Proxy Mediator1_NIC_PROXY (
                TargetResName = MultiNICA
                )

        VisiNotify VisiNotify (
                )

        requires group ServerGroup1_DG online local firm
        AlarmIRP requires Mediator1_IP
        AlarmIRP requires VisiNotify
        Mediator1_IP requires Mediator1_NIC_PROXY
        NameService requires OSAgent
        Server1 requires Mediator1_IP
        Server2 requires Mediator1_IP
        ServerTest requires Mediator1_IP
        VisiNotify requires NameService


        // resource dependency tree
        //
        //      group Mediator1
        //      {
        //      AlarmIRP AlarmIRP
        //          {
        //          IPMultiNIC Mediator1_IP
        //              {
        //              Proxy Mediator1_NIC_PROXY
        //              }
        //          VisiNotify VisiNotify
        //              {
        //              NameService NameService
        //                  {
        //                  OSAgent OSAgent
        //                  }
        //              }
        //          }
        //      Mediator Server1
        //          {
        //          IPMultiNIC Mediator1_IP
        //              {
        //              Proxy Mediator1_NIC_PROXY
        //              }
        //          }
        //      Mediator Server2
        //          {
        //          IPMultiNIC Mediator1_IP
        //              {
        //              Proxy Mediator1_NIC_PROXY
        //              }
        //          }
        //      Mediator ServerTest
        //          {
        //          IPMultiNIC Mediator1_IP
        //              {
        //              Proxy Mediator1_NIC_PROXY
        //              }
        //          }
        //      }


group Mediator2 (
        SystemList = { mm7node1 = 2, mm7node2 = 1 }
        AutoStartList = { mm7node2 }
        )

        IPMultiNIC Mediator2_IP (
                Address = "172.26.96.9"
                NetMask = "255.255.255.224"
                MultiNICResName = MultiNICA
                IfconfigTwice = 1
                )

        Mediator Server3 (
                )

        Mediator Server4 (
                )

        Mediator Server5 (
                )

        Proxy Mediator2_NIC_PROXY (
                TargetResName = MultiNICA
                )

        requires group ServerGroup1_DG online local firm
        Mediator2_IP requires Mediator2_NIC_PROXY
        Server3 requires Mediator2_IP
        Server4 requires Mediator2_IP
        Server5 requires Mediator2_IP


        // resource dependency tree
        //
        //      group Mediator2
        //      {
        //      Mediator Server3
        //          {
        //          IPMultiNIC Mediator2_IP
        //              {
        //              Proxy Mediator2_NIC_PROXY
        //              }
        //          }
        //      Mediator Server4
        //          {
        //          IPMultiNIC Mediator2_IP
        //              {
        //              Proxy Mediator2_NIC_PROXY
        //              }
        //          }
        //      Mediator Server5
        //          {
        //          IPMultiNIC Mediator2_IP
        //              {
        //              Proxy Mediator2_NIC_PROXY
        //              }
        //          }
        //      }


group Network (
        SystemList = { mm7node1 = 1, mm7node2 = 2 }
        Parallel = 1
        AutoStartList = { mm7node1, mm7node2 }
        )

        MultiNICA MultiNICA (
                Device @mm7node1 = { igb0 = "172.26.96.2", igb1 = "172.26.96.4" }
                Device @mm7node2 = { igb0 = "172.26.96.3", igb1 = "172.26.96.5" }
                NetMask = "255.255.255.224"
                RouteOptions = "172.26.96.1"
                IfconfigTwice = 1
                NetworkHosts = { "172.26.96.1", "172.26.96.0" }
                )

        Phantom Phantom (
                )



        // resource dependency tree
        //
        //      group Network
        //      {
        //      MultiNICA MultiNICA
        //      Phantom Phantom
        //      }


group Oracle1 (
        SystemList = { mm7node1 = 1, mm7node2 = 2 }
        AutoStartList = { mm7node1 }
        )

        IPMultiNIC Oracle1_IP (
                Address = "172.26.96.10"
                NetMask = "255.255.255.224"
                MultiNICResName = MultiNICA
                IfconfigTwice = 1
                )

        ORACLE bgw (
                )

        Proxy Oracle1_NIC_PROXY (
                TargetResName = MultiNICA
                )

        requires group ora_DG online local firm
        Oracle1_IP requires Oracle1_NIC_PROXY
        bgw requires Oracle1_IP


        // resource dependency tree
        //
        //      group Oracle1
        //      {
        //      ORACLE bgw
        //          {
        //          IPMultiNIC Oracle1_IP
        //              {
        //              Proxy Oracle1_NIC_PROXY
        //              }
        //          }
        //      }


group SENTINELgrp (
        SystemList = { mm7node1 = 2, mm7node2 = 1 }
        AutoStartList = { mm7node2 }
        )

        IPMultiNIC SENTINELgrp_IP (
                Address = "172.26.96.16"
                NetMask = "255.255.255.224"
                MultiNICResName = MultiNICA
                IfconfigTwice = 1
                )

        Proxy SENTINELgrp_NIC_PROXY (
                TargetResName = MultiNICA
                )

        Sentinel licserv (
                )

        requires group lic_DG online local firm
        SENTINELgrp_IP requires SENTINELgrp_NIC_PROXY
        SENTINELgrp_IP requires licserv


        // resource dependency tree
        //
        //      group SENTINELgrp
        //      {
        //      IPMultiNIC SENTINELgrp_IP
        //          {
        //          Proxy SENTINELgrp_NIC_PROXY
        //          Sentinel licserv
        //          }
        //      }


group SNMPMasterAgent (
        SystemList = { mm7node1 = 0, mm7node2 = 1 }
        Parallel = 1
        AutoStartList = { mm7node1, mm7node2 }
        )

        SNMPMasterAgent SNMPMasterAgent (
                )



        // resource dependency tree
        //
        //      group SNMPMasterAgent
        //      {
        //      SNMPMasterAgent SNMPMasterAgent
        //      }


group ServerGroup1_DG (
        SystemList = { mm7node1 = 0, mm7node2 = 1 }
        AutoFailOver = 0
        Parallel = 1
        AutoStartList = { mm7node1, mm7node2 }
        )

        CFSMount cfsmount1 (
                Critical = 0
                MountPoint = "/var/opt/BGw/ServerGroup1"
                BlockDevice = "/dev/vx/dsk/bgw1dg/vol01"
                MountOpt @mm7node1 = "cluster"
                MountOpt @mm7node2 = "cluster"
                NodeList = { mm7node1, mm7node2 }
                )

        CVMVolDg cvmvoldg1 (
                Critical = 0
                CVMDiskGroup = bgw1dg
                CVMActivation @mm7node1 = sw
                CVMActivation @mm7node2 = sw
                )

        requires group cvm online local firm
        cfsmount1 requires cvmvoldg1


        // resource dependency tree
        //
        //      group ServerGroup1_DG
        //      {
        //      CFSMount cfsmount1
        //          {
        //          CVMVolDg cvmvoldg1
        //          }
        //      }


group Storage_DG (
        SystemList = { mm7node1 = 0, mm7node2 = 1 }
        Parallel = 1
        AutoStartList = { mm7node1, mm7node2 }
        )

        CFSMount cfsmount11 (
                Critical = 0
                MountPoint = "/Storage1"
                BlockDevice = "/dev/vx/dsk/store1dg/store1"
                MountOpt @mm7node1 = "cluster"
                MountOpt @mm7node2 = "cluster"
                NodeList = { mm7node1, mm7node2 }
                )

        CFSMount cfsmount12 (
                Critical = 0
                MountPoint = "/Storage2"
                BlockDevice = "/dev/vx/dsk/store2dg/store2"
                MountOpt @mm7node1 = "cluster"
                MountOpt @mm7node2 = "cluster"
                NodeList = { mm7node1, mm7node2 }
                )

        CFSMount cfsmount13 (
                Critical = 0
                MountPoint = "/Storage3"
                BlockDevice = "/dev/vx/dsk/store3dg/store3"
                MountOpt @mm7node1 = "cluster"
                MountOpt @mm7node2 = "cluster"
                NodeList = { mm7node1, mm7node2 }
                )

        CFSMount cfsmount14 (
                Critical = 0
                MountPoint = "/Storage4"
                BlockDevice = "/dev/vx/dsk/store4dg/store4"
                MountOpt @mm7node1 = "cluster"
                MountOpt @mm7node2 = "cluster"
                NodeList = { mm7node1, mm7node2 }
                )

        CFSMount cfsmount15 (
                Critical = 0
                MountPoint = "/Storage5"
                BlockDevice = "/dev/vx/dsk/store5dg/vol01"
                MountOpt @mm7node1 = "cluster"
                MountOpt @mm7node2 = "cluster"
                NodeList = { mm7node1, mm7node2 }
                )

        CFSMount cfsmount16 (
                Critical = 0
                MountPoint = "/Storage6"
                BlockDevice = "/dev/vx/dsk/store6dg/vol01"
                MountOpt @mm7node1 = "cluster"
                MountOpt @mm7node2 = "cluster"
                NodeList = { mm7node1, mm7node2 }
                )

        CVMVolDg cvmvoldg11 (
                Critical = 0
                CVMDiskGroup = store1dg
                CVMActivation @mm7node1 = sw
                CVMActivation @mm7node2 = sw
                )

        CVMVolDg cvmvoldg12 (
                Critical = 0
                CVMDiskGroup = store2dg
                CVMActivation @mm7node1 = sw
                CVMActivation @mm7node2 = sw
                )

        CVMVolDg cvmvoldg13 (
                Critical = 0
                CVMDiskGroup = store3dg
                CVMActivation @mm7node1 = sw
                CVMActivation @mm7node2 = sw
                )

        CVMVolDg cvmvoldg14 (
                Critical = 0
                CVMDiskGroup = store4dg
                CVMActivation @mm7node1 = sw
                CVMActivation @mm7node2 = sw
                )

        CVMVolDg cvmvoldg15 (
                Critical = 0
                CVMDiskGroup = store5dg
                CVMActivation @mm7node1 = sw
                CVMActivation @mm7node2 = sw
                )

        CVMVolDg cvmvoldg16 (
                Critical = 0
                CVMDiskGroup = store6dg
                CVMActivation @mm7node1 = sw
                CVMActivation @mm7node2 = sw
                )

        requires group cvm online local firm
        cfsmount11 requires cvmvoldg11
        cfsmount12 requires cvmvoldg12
        cfsmount13 requires cvmvoldg13
        cfsmount14 requires cvmvoldg14
        cfsmount15 requires cvmvoldg15
        cfsmount16 requires cvmvoldg16


        // resource dependency tree
        //
        //      group Storage_DG
        //      {
        //      CFSMount cfsmount11
        //          {
        //          CVMVolDg cvmvoldg11
        //          }
        //      CFSMount cfsmount12
        //          {
        //          CVMVolDg cvmvoldg12
        //          }
        //      CFSMount cfsmount13
        //          {
        //          CVMVolDg cvmvoldg13
        //          }
        //      CFSMount cfsmount14
        //          {
        //          CVMVolDg cvmvoldg14
        //          }
        //      CFSMount cfsmount15
        //          {
        //          CVMVolDg cvmvoldg15
        //          }
        //      CFSMount cfsmount16
        //          {
        //          CVMVolDg cvmvoldg16
        //          }
        //      }


group cvm (
        SystemList = { mm7node1 = 0, mm7node2 = 1 }
        AutoFailOver = 0
        Parallel = 1
        AutoStartList = { mm7node1, mm7node2 }
        )

        CFSfsckd vxfsckd (
                ActivationMode @mm7node1 = { bgw1dg = sw, ora1dg = sw, lic1dg = sw,
                         fmm1dg = sw,
                         store1dg = sw,
                         store2dg = sw,
                         store3dg = sw,
                         store4dg = sw,
                         store5dg = sw,
                         store6dg = sw }
                ActivationMode @mm7node2 = { bgw1dg = sw, ora1dg = sw, lic1dg = sw,
                         fmm1dg = sw,
                         store1dg = sw,
                         store2dg = sw,
                         store3dg = sw,
                         store4dg = sw,
                         store5dg = sw,
                         store6dg = sw }
                )

        CVMCluster cvm_clus (
                CVMClustName = MediatorCluster
                CVMNodeId = { mm7node1 = 0, mm7node2 = 1 }
                CVMTransport = gab
                CVMTimeout = 200
                )

        CVMVxconfigd cvm_vxconfigd (
                Critical = 0
                CVMVxconfigdArgs = { syslog }
                )

        cvm_clus requires cvm_vxconfigd
        vxfsckd requires cvm_clus


        // resource dependency tree
        //
        //      group cvm
        //      {
        //      CFSfsckd vxfsckd
        //          {
        //          CVMCluster cvm_clus
        //              {
        //              CVMVxconfigd cvm_vxconfigd
        //              }
        //          }
        //      }


group lic_DG (
        SystemList = { mm7node1 = 0, mm7node2 = 1 }
        AutoFailOver = 0
        Parallel = 1
        AutoStartList = { mm7node1, mm7node2 }
        )

        CFSMount cfsmount5 (
                Critical = 0
                MountPoint = "/var/opt/sentinel"
                BlockDevice = "/dev/vx/dsk/lic1dg/vol01"
                MountOpt @mm7node1 = "cluster"
                MountOpt @mm7node2 = "cluster"
                NodeList = { mm7node1, mm7node2 }
                )

        CFSMount cfsmount6 (
                Critical = 0
                MountPoint = "/var/opt/mediation/fmmdb"
                BlockDevice = "/dev/vx/dsk/fmm1dg/vol01"
                MountOpt @mm7node1 = "cluster"
                MountOpt @mm7node2 = "cluster"
                NodeList = { mm7node1, mm7node2 }
                )

        CVMVolDg cvmvoldg5 (
                Critical = 0
                CVMDiskGroup = lic1dg
                CVMActivation @mm7node1 = sw
                CVMActivation @mm7node2 = sw
                )

        CVMVolDg cvmvoldg6 (
                Critical = 0
                CVMDiskGroup = fmm1dg
                CVMActivation @mm7node1 = sw
                CVMActivation @mm7node2 = sw
                )

        requires group cvm online local firm
        cfsmount5 requires cvmvoldg5
        cfsmount6 requires cvmvoldg6


        // resource dependency tree
        //
        //      group lic_DG
        //      {
        //      CFSMount cfsmount5
        //          {
        //          CVMVolDg cvmvoldg5
        //          }
        //      CFSMount cfsmount6
        //          {
        //          CVMVolDg cvmvoldg6
        //          }
        //      }


group ora_DG (
        SystemList = { mm7node1 = 0, mm7node2 = 1 }
        AutoFailOver = 0
        Parallel = 1
        AutoStartList = { mm7node1, mm7node2 }
        )

        CFSMount cfsmount4 (
                Critical = 0
                MountPoint = "/var/opt/mediation/ora"
                BlockDevice = "/dev/vx/dsk/ora1dg/vol01"
                MountOpt @mm7node1 = "cluster"
                MountOpt @mm7node2 = "cluster"
                NodeList = { mm7node1, mm7node2 }
                )

        CVMVolDg cvmvoldg4 (
                Critical = 0
                CVMDiskGroup = ora1dg
                CVMActivation @mm7node1 = sw
                CVMActivation @mm7node2 = sw
                )

        requires group cvm online local firm
        cfsmount4 requires cvmvoldg4


        // resource dependency tree
        //
        //      group ora_DG
        //      {
        //      CFSMount cfsmount4
        //          {
        //          CVMVolDg cvmvoldg4
        //          }
        //      }
 

 

//Genesis

mikebounds
Level 6
Partner Accredited

Just to explain a little futher:

If you kill Server 2 instantly (like pull power to it by shutting down systems boards or disconnecting all power supplies) then Server 1 will race for coordinators disks and will win race as Server 2 is down and so Server 1 will online the service groups.

If you gracefully shutdown Server 2 (using shutdown command), the cluster is shutdown cleaning and so VCS deports diskgroups and reservations are released and Server 1 cleanly takes over.

If you do something in between like the "reboot" command where it can take a few seconds to go down, but the rc scripts are not called then I think what is happening is:

  1. Processes are killed which means Server 1 races for coordinator disks
  2. Server 2 is not quite down yet and so maybe either Server 2 wins the race or Server 1 can't eject reservation keys on data disks as Server 2 is still using them.

VCS is there to protect against hardware and software failure and will not always protect you against administrive error and so you should educate administrators to not use "reboot" command (or "suspend" system) and they must use graceful shutdown and ideally they should switch services manually before shutting node down.  

Reboots and "suspends" are still valid tests to do as in some configurations VCS will help you out, so it is useful to know what you are protected against.

Mike

genesisb
Level 2

Thanks for the explanation Mike.

I went over a test for the instant/abrupt server shutdown but I still get the same result where the other node panics as well which results to split-brain. So I have to clear the VXFEN keys manually to start the cluster again.

I also get the following messages on the console (from the other node), before the panic.

Oct 26 18:54:19 mm7node2 scsi: WARNING: /pci@0,0/pci8086,340a@3/pci111d,806c@0/pci111d,806c@2/pci1077,172@0/fp@0,0/disk@w500601613ba04e3b,3 (sd13):
Oct 26 18:54:19 mm7node2        Error for Command: write(10)               Error Level: Retryable
Oct 26 18:54:19 mm7node2 scsi:  Requested Block: 5617511                   Error Block: 5617511
Oct 26 18:54:19 mm7node2 scsi:  Vendor: DGC                                Serial Number: 640000AF4ECL
Oct 26 18:54:19 mm7node2 scsi:  Sense Key: Unit Attention
Oct 26 18:54:19 mm7node2 scsi:  ASC: 0x2a (reservations released), ASCQ: 0x4, FRU: 0x0
Oct 26 18:54:19 mm7node2 scsi: WARNING: /pci@0,0/pci8086,340a@3/pci111d,806c@0/pci111d,806c@2/pci1077,172@0/fp@0,0/disk@w500601613ba04e3b,6 (sd10):
Oct 26 18:54:19 mm7node2        Error for Command: write(10)               Error Level: Retryable
Oct 26 18:54:19 mm7node2 scsi:  Requested Block: 162722                    Error Block: 162722
Oct 26 18:54:19 mm7node2 scsi:  Vendor: DGC                                Serial Number: 670000AF7DCL
Oct 26 18:54:19 mm7node2 scsi:  Sense Key: Unit Attention
Oct 26 18:54:19 mm7node2 scsi:  ASC: 0x2a (reservations released), ASCQ: 0x4, FRU: 0x0
Oct 26 18:54:20 mm7node2 genunix: GAB INFO V-15-1-20036 Port f gen   30290d membership ;1
Oct 26 18:54:20 mm7node2 genunix: GAB INFO V-15-1-20038 Port f gen   30290d k_jeopardy 0
Oct 26 18:54:20 mm7node2 genunix: GAB INFO V-15-1-20040 Port f gen   30290d    visible 0
Oct 26 18:54:20 mm7node2 vxfs: I/O fencing is ON
Oct 26 18:54:22 mm7node2 scsi: WARNING: /pci@0,0/pci8086,340a@3/pci111d,806c@0/pci111d,806c@2/pci1077,172@0/fp@0,0/disk@w500601613ba04e3b,4 (sd12):
Oct 26 18:54:22 mm7node2        Error for Command: write(10)               Error Level: Retryable
Oct 26 18:54:22 mm7node2 scsi:  Requested Block: 5355455                   Error Block: 5355455
Oct 26 18:54:22 mm7node2 scsi:  Vendor: DGC                                Serial Number: 650000AF60CL

........

panic[cpu4]/thread=fffffe80018f9c60: VXFEN CRITICAL V-11-1-20 Local cluster node
 ejected from cluster to prevent potential data corruption.

fffffe80018f97f0 vxfen:vxfen_plat_panic+e7 ()
fffffe80018f98b0 vxfen:vxfen_grab_coord_disks+b46 ()
fffffe80018f98e0 vxfen:vxfen_grab_coord_pt+d8 ()
fffffe80018f9920 vxfen:vxfen_msg_node_left_ack+212 ()
fffffe80018f9970 vxfen:vxfen_process_client_msg+39a ()
fffffe80018f9aa0 vxfen:vxfen_vrfsm_cback+cfd ()
fffffe80018f9b40 vxfen:vrfsm_step+42e ()
fffffe80018f9bc0 vxfen:vrfsm_msg_dispatch+2e9 ()
fffffe80018f9c40 vxfen:vrfsm_recv_thread+193 ()
fffffe80018f9c50 unix:thread_start+8 ()

 

Thanks.

 

Genesis

mikebounds
Level 6
Partner Accredited

This is strange, if you kill one node, then the other one should not panic as long as it can grab a majority of the coordinator disks, so you should check that this node is able to grab all the coordinator disks.  Are the disk paths above in the console message (sd13, sd10 & sd12) the 3 coordinator disks (you can use my download at https://www-secure.symantec.com/connect/downloads/relating-devices-reported-solaris-sar-utility-vxvm... to find out how these disks map to VM disks).

Mike

joseph_dangelo
Level 6
Employee Accredited

I would have to agree with Mike,  it would seem you are actually have disk connectivity issues.  This very well may be the reason the surving node cannot properly take ownership of the fencing disks and essentially both nodes lose the fencing race.

Do you see the same SCSI errors while both nodes are operational?

Joe D

Tmy_70
Level 5
Partner Accredited Certified