Forum Discussion

Home_224's avatar
Home_224
Level 6
6 years ago

CVMCluster:???:monitor:node - state: out of cluster

Hi ,

 

I encounter the problem when start up the cluster,  I check the CVM disk group cannot faulted on the both node. 

root@devuaedbs31 # hastatus -sum

 

-- SYSTEM STATE

-- System               State                Frozen

 

A  devuaedbs31          RUNNING              0

A  devuaedbs32          RUNNING              0

 

-- GROUP STATE

-- Group           System               Probed     AutoDisabled    State

 

B  MNICB_iUATgroup devuaedbs31          Y          N               ONLINE

B  MNICB_iUATgroup devuaedbs32          Y          N               ONLINE

B  cvm             devuaedbs31          Y          N               PARTIAL

B  cvm             devuaedbs32          Y          N               OFFLINE|FAULTED

B  dbs_rac_bkup    devuaedbs31          Y          N               ONLINE

B  dbs_rac_bkup    devuaedbs32          Y          N               OFFLINE

B  oraclerac_db1   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db1   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db2   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db2   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db3   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db3   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db4   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db4   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db5   devuaedbs31          Y          N               ONLINE

B  oraclerac_db5   devuaedbs32          Y          N               OFFLINE|FAULTED

B  oraclerac_db6   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db6   devuaedbs32          Y          N               OFFLINE

 

-- RESOURCES FAILED

-- Group           Type                 Resource             System

 

C  cvm             CVMVolDg             diskgroup_crsdg      devuaedbs32

C  oraclerac_db5   CVMVolDg             diskgroup_db5        devuaedbs32

 

-- RESOURCES OFFLINING

-- Group           Type            Resource             System               IState

 

F  cvm             CVMCluster      cvm_clus             devuaedbs31          W_OFFLINE

root@devuaedbs31 # cfscluster status

  Node             :  devuaedbs31

  Cluster Manager  :  running

  CVM state        :  running

  MOUNT POINT    SHARED VOLUME  DISK GROUP        STATUS

 

  Node             :  devuaedbs32

  Cluster Manager  :  running

  CVM state        :  not-running

  MOUNT POINT    SHARED VOLUME  DISK GROUP        STATUS

  List of mount points registered with cluster-configuration

  but not associated with any node: []

Before start up the file, I edit the main.cf to disable the service group online by VCS, then run hacf -verify . there is no error , then I startup the VCS, but the cvm not to able startup.

 

Please advice 

 

Many Thanks,

 

  • your cluster is up (had on each node is up) but vxconfigd  on each node is not in cluster mode

     

    if you run

    vxdctl -c mode

     

    you would see the ouitput something l;ike

    mode: enabled: cluster inactive

     

    if there is no hardware issue with the storage, you should mbe able to clear the faults by running

    1. hastop -all -force

    2. vxclustadm startnode     <<< run this command on each node

    3. gabconfig -a         <<< check if ports u, w are open

    4. vxdclt -c mode  if output shows cluister is active

    5. if vxconfigd on each node is in cluster mode, run

     

    hastart on each node to start vcs

     

     

    • EdwardC's avatar
      EdwardC
      Level 3

      We can see two failed CVMVolDg resources on host devuaedbs32

       

      -- RESOURCES FAILED

       

      -- Group           Type                 Resource             System

       

       

      C  cvm             CVMVolDg             diskgroup_crsdg      devuaedbs32

      C  oraclerac_db5   CVMVolDg             diskgroup_db5        devuaedbs32

       

       

        Earlier in the engine_A.log the diskgroups were not imported as a shared diskgroup.

       

      2019/06/20 19:44:54 VCS ERROR V-16-10001-1010 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:online_change_activation: can not change activation of dg crs_dg to shared-write                                      

      2019/06/20 19:44:54 VCS WARNING V-16-10001-1025 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:Can not set diskgroup crs_dg activation to sw                                                                       

      2019/06/20 19:44:54 VCS ERROR V-16-10001-1045 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:Initial check failed                                                                                                  

      2019/06/20 19:44:55 VCS INFO V-16-2-13001 (devuaedbs31) Resource(diskgroup_crsdg): Output of the completed operation (online)                                                                                      

      VxVM vxdg ERROR V-5-1-3268  activation failed: Disk group crs_dg: shared-write: Invalid mode for non-shared disk group                         

       

      We don't see the same invalid mode message during the failure an hour later, but the diskgroup resource is still being marked as faulted.

       

       

       

      2019/06/20 21:01:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource diskgroup_crsdg (Owner: unknown, Group: cvm) on System devuaedbs32                                                                       2019/06/20 21:01:55 VCS WARNING V-16-10001-1074 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:setup_vxnotify: old vxnotify of pid 1496 will be killed. my pid is 389                                              

      2019/06/20 21:01:58 VCS ERROR V-16-10001-1009 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:could not find diskgroup crs_dg imported. If it was previously deported, it will have to be manually imported         

      2019/06/20 21:01:58 VCS ERROR V-16-10001-1044 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:Error saving vxprint file                                                                                              

      2019/06/20 21:01:59 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (online)                                                                                      

      VxVM vxprint ERROR V-5-1-582 Disk group crs_dg: No such disk group                                                                                                                                                

      2019/06/20 21:01:10 VCS ERROR V-16-2-13066 (devuaedbs32) Agent is calling clean for resource(diskgroup_crsdg) because the resource is not up even after online completed.                                         

      2019/06/20 21:01:11 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (clean)                                                                                      

      /var/VRTSvcs/lock/diskgroup_crsdg_crs_dg_stat: No such file or directory                                                                                                                                           

      2019/06/20 21:01:11 VCS INFO V-16-2-13068 (devuaedbs32) Resource(diskgroup_crsdg) - clean completed successfully.                                                                                                  

      2019/06/20 21:01:11 VCS INFO V-16-2-13071 (devuaedbs32) Resource(diskgroup_crsdg): reached OnlineRetryLimit(0).                                                                                                    

      2019/06/20 21:01:12 VCS ERROR V-16-1-10303 Resource diskgroup_crsdg (Owner: unknown, Group: cvm) is FAULTED (timed out) on sys devuaedbs32

       

       

      Once the resource diskgroup_crsdg is marked as faulted VCS initiates an offline of the CVM service group.

       

       

       

      2019/06/20 21:01:12 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vxfsckd (Owner: unknown, Group: cvm) on System devuaedbs32                                                                             

      2019/06/20 21:01:12 VCS INFO V-16-6-15004 (devuaedbs32) hatrigger:Failed to send trigger for resfault; script doesn't exist                                                                                       

      2019/06/20 21:01:14 VCS INFO V-16-1-10305 Resource vxfsckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)                                                                                 

      2019/06/20 21:01:14 VCS NOTICE V-16-1-10300 Initiating Offline of Resource qlogckd (Owner: unknown, Group: cvm) on System devuaedbs32                                                                             

      2019/06/20 21:01:15 VCS INFO V-16-2-13001 (devuaedbs32) Resource(qlogckd): Output of the completed operation (offline)                                                                                             

      UX:vxfs qlogprint: INFO: V-3-22897: There are no QuickLog devices active                                                                                                                                           

      2019/06/20 21:01:16 VCS INFO V-16-1-10305 Resource qlogckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)                                                                                 

      2019/06/20 21:01:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cvm_clus (Owner: unknown, Group: cvm) on System devuaedbs32                                                                             2019/06/20 21:01:18 VCS ERROR V-16-10001-1005 (devuaedbs32) CVMCluster:???:monitor:node - state: out of cluster                                                                                                    reason: user initiated stop                                                                                                                                                                                        2019/06/20 21:01:19 VCS INFO V-16-1-10305 Resource cvm_clus (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)                                                                                 

      2019/06/20 21:01:19 VCS ERROR V-16-1-10205 Group cvm is faulted on system devuaedbs32                                                                                                                              2019/06/20 21:01:19 VCS NOTICE V-16-1-10446 Group cvm is offline on system devuaedbs32 

       

       

      The same happens for the group oraclerac_db5 when the diskgroup resource diskgroup_db5 is faulted.

       

       

       

      I recommend moving the diskgroup resource diskgroup_crsdg to another service group, or marking it as non-critical to prevent VCS from taking it offline when the resource is faulted.

       

       

      As mentioned earlier we can start CVM outside of VCS.

       

       

       

      [root@server101 ~]# gabconfig -a

      GAB Port Memberships

      ===============================================================

      Port a gen   332404 membership 01

      Port b gen   332407 membership 01

      Port d gen   332403 membership 01

       

       

      [root@server101 ~]# vxclustadm -t gab -m vcs startnode

      VxVM vxclustadm INFO V-5-2-9687 vxclustadm: Fencing driver is in disabled mode

       

      [root@server101 ~]# gabconfig -a

      GAB Port Memberships

      ===============================================================

      Port a gen   332404 membership 01

      Port b gen   332407 membership 01

      Port d gen   332403 membership 01

      Port m gen   33240c membership 01

      Port v gen   33240e membership 01

      Port y gen   33240d membership 01

       

      Verify the host has successfully joined the cluster using vxclustadm.

       

      [root@server101 ~]# vxclustadm nidmap

      Name                             CVM Nid    CM Nid     State              

      server101                        0          0          Joined: Slave      

      server102                        2          1          Joined: Master     

       

       

        Next you want to make sure the CVM diskgroups are imported as shared.

       

       

      [root@server101 ~]# vxdg list

      NAME         STATE           ID

      lockdg       enabled,shared,cds   1561033675.95.server101

      datadg       enabled,shared,cds   1561032806.93.server101

       

       

        If they are not imported, attempt to import them with the shared flag.

       

      #vxdg -s import datadg

       

       If you receive the following error, you will need to deport the diskgroup from all nodes in the cluster before it can be imported as shared.

      [root@server101 ~]# vxdg -s import datadg

      VxVM vxdg ERROR V-5-1-19179 Disk group datadg: import failed:

      Disk is in use by another host

       

      Once the diskgroups are successfully imported as shared start VCS again.

      #hastart

      Please let me know if this helps.

      ED