cancel
Showing results for 
Search instead for 
Did you mean: 

CVMCluster:???:monitor:node - state: out of cluster

Home_224
Level 6

Hi ,

 

I encounter the problem when start up the cluster,  I check the CVM disk group cannot faulted on the both node. 

root@devuaedbs31 # hastatus -sum

 

-- SYSTEM STATE

-- System               State                Frozen

 

A  devuaedbs31          RUNNING              0

A  devuaedbs32          RUNNING              0

 

-- GROUP STATE

-- Group           System               Probed     AutoDisabled    State

 

B  MNICB_iUATgroup devuaedbs31          Y          N               ONLINE

B  MNICB_iUATgroup devuaedbs32          Y          N               ONLINE

B  cvm             devuaedbs31          Y          N               PARTIAL

B  cvm             devuaedbs32          Y          N               OFFLINE|FAULTED

B  dbs_rac_bkup    devuaedbs31          Y          N               ONLINE

B  dbs_rac_bkup    devuaedbs32          Y          N               OFFLINE

B  oraclerac_db1   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db1   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db2   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db2   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db3   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db3   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db4   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db4   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db5   devuaedbs31          Y          N               ONLINE

B  oraclerac_db5   devuaedbs32          Y          N               OFFLINE|FAULTED

B  oraclerac_db6   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db6   devuaedbs32          Y          N               OFFLINE

 

-- RESOURCES FAILED

-- Group           Type                 Resource             System

 

C  cvm             CVMVolDg             diskgroup_crsdg      devuaedbs32

C  oraclerac_db5   CVMVolDg             diskgroup_db5        devuaedbs32

 

-- RESOURCES OFFLINING

-- Group           Type            Resource             System               IState

 

F  cvm             CVMCluster      cvm_clus             devuaedbs31          W_OFFLINE

root@devuaedbs31 # cfscluster status

  Node             :  devuaedbs31

  Cluster Manager  :  running

  CVM state        :  running

  MOUNT POINT    SHARED VOLUME  DISK GROUP        STATUS

 

  Node             :  devuaedbs32

  Cluster Manager  :  running

  CVM state        :  not-running

  MOUNT POINT    SHARED VOLUME  DISK GROUP        STATUS

  List of mount points registered with cluster-configuration

  but not associated with any node: []

Before start up the file, I edit the main.cf to disable the service group online by VCS, then run hacf -verify . there is no error , then I startup the VCS, but the cvm not to able startup.

 

Please advice 

 

Many Thanks,

 

3 REPLIES 3

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@Home_224 

I have moved your post to the Cluster forum.
Nobody is monitoring the Documentation forum. 

frankgfan
Moderator
Moderator
   VIP   

your cluster is up (had on each node is up) but vxconfigd  on each node is not in cluster mode

 

if you run

vxdctl -c mode

 

you would see the ouitput something l;ike

mode: enabled: cluster inactive

 

if there is no hardware issue with the storage, you should mbe able to clear the faults by running

1. hastop -all -force

2. vxclustadm startnode     <<< run this command on each node

3. gabconfig -a         <<< check if ports u, w are open

4. vxdclt -c mode  if output shows cluister is active

5. if vxconfigd on each node is in cluster mode, run

 

hastart on each node to start vcs

 

 

We can see two failed CVMVolDg resources on host devuaedbs32

 

-- RESOURCES FAILED

 

-- Group           Type                 Resource             System

 

 

C  cvm             CVMVolDg             diskgroup_crsdg      devuaedbs32

C  oraclerac_db5   CVMVolDg             diskgroup_db5        devuaedbs32

 

 

  Earlier in the engine_A.log the diskgroups were not imported as a shared diskgroup.

 

2019/06/20 19:44:54 VCS ERROR V-16-10001-1010 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:online_change_activation: can not change activation of dg crs_dg to shared-write                                      

2019/06/20 19:44:54 VCS WARNING V-16-10001-1025 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:Can not set diskgroup crs_dg activation to sw                                                                       

2019/06/20 19:44:54 VCS ERROR V-16-10001-1045 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:Initial check failed                                                                                                  

2019/06/20 19:44:55 VCS INFO V-16-2-13001 (devuaedbs31) Resource(diskgroup_crsdg): Output of the completed operation (online)                                                                                      

VxVM vxdg ERROR V-5-1-3268  activation failed: Disk group crs_dg: shared-write: Invalid mode for non-shared disk group                         

 

We don't see the same invalid mode message during the failure an hour later, but the diskgroup resource is still being marked as faulted.

 

 

 

2019/06/20 21:01:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource diskgroup_crsdg (Owner: unknown, Group: cvm) on System devuaedbs32                                                                       2019/06/20 21:01:55 VCS WARNING V-16-10001-1074 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:setup_vxnotify: old vxnotify of pid 1496 will be killed. my pid is 389                                              

2019/06/20 21:01:58 VCS ERROR V-16-10001-1009 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:could not find diskgroup crs_dg imported. If it was previously deported, it will have to be manually imported         

2019/06/20 21:01:58 VCS ERROR V-16-10001-1044 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:Error saving vxprint file                                                                                              

2019/06/20 21:01:59 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (online)                                                                                      

VxVM vxprint ERROR V-5-1-582 Disk group crs_dg: No such disk group                                                                                                                                                

2019/06/20 21:01:10 VCS ERROR V-16-2-13066 (devuaedbs32) Agent is calling clean for resource(diskgroup_crsdg) because the resource is not up even after online completed.                                         

2019/06/20 21:01:11 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (clean)                                                                                      

/var/VRTSvcs/lock/diskgroup_crsdg_crs_dg_stat: No such file or directory                                                                                                                                           

2019/06/20 21:01:11 VCS INFO V-16-2-13068 (devuaedbs32) Resource(diskgroup_crsdg) - clean completed successfully.                                                                                                  

2019/06/20 21:01:11 VCS INFO V-16-2-13071 (devuaedbs32) Resource(diskgroup_crsdg): reached OnlineRetryLimit(0).                                                                                                    

2019/06/20 21:01:12 VCS ERROR V-16-1-10303 Resource diskgroup_crsdg (Owner: unknown, Group: cvm) is FAULTED (timed out) on sys devuaedbs32

 

 

Once the resource diskgroup_crsdg is marked as faulted VCS initiates an offline of the CVM service group.

 

 

 

2019/06/20 21:01:12 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vxfsckd (Owner: unknown, Group: cvm) on System devuaedbs32                                                                             

2019/06/20 21:01:12 VCS INFO V-16-6-15004 (devuaedbs32) hatrigger:Failed to send trigger for resfault; script doesn't exist                                                                                       

2019/06/20 21:01:14 VCS INFO V-16-1-10305 Resource vxfsckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)                                                                                 

2019/06/20 21:01:14 VCS NOTICE V-16-1-10300 Initiating Offline of Resource qlogckd (Owner: unknown, Group: cvm) on System devuaedbs32                                                                             

2019/06/20 21:01:15 VCS INFO V-16-2-13001 (devuaedbs32) Resource(qlogckd): Output of the completed operation (offline)                                                                                             

UX:vxfs qlogprint: INFO: V-3-22897: There are no QuickLog devices active                                                                                                                                           

2019/06/20 21:01:16 VCS INFO V-16-1-10305 Resource qlogckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)                                                                                 

2019/06/20 21:01:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cvm_clus (Owner: unknown, Group: cvm) on System devuaedbs32                                                                             2019/06/20 21:01:18 VCS ERROR V-16-10001-1005 (devuaedbs32) CVMCluster:???:monitor:node - state: out of cluster                                                                                                    reason: user initiated stop                                                                                                                                                                                        2019/06/20 21:01:19 VCS INFO V-16-1-10305 Resource cvm_clus (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)                                                                                 

2019/06/20 21:01:19 VCS ERROR V-16-1-10205 Group cvm is faulted on system devuaedbs32                                                                                                                              2019/06/20 21:01:19 VCS NOTICE V-16-1-10446 Group cvm is offline on system devuaedbs32 

 

 

The same happens for the group oraclerac_db5 when the diskgroup resource diskgroup_db5 is faulted.

 

 

 

I recommend moving the diskgroup resource diskgroup_crsdg to another service group, or marking it as non-critical to prevent VCS from taking it offline when the resource is faulted.

 

 

As mentioned earlier we can start CVM outside of VCS.

 

 

 

[root@server101 ~]# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen   332404 membership 01

Port b gen   332407 membership 01

Port d gen   332403 membership 01

 

 

[root@server101 ~]# vxclustadm -t gab -m vcs startnode

VxVM vxclustadm INFO V-5-2-9687 vxclustadm: Fencing driver is in disabled mode

 

[root@server101 ~]# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen   332404 membership 01

Port b gen   332407 membership 01

Port d gen   332403 membership 01

Port m gen   33240c membership 01

Port v gen   33240e membership 01

Port y gen   33240d membership 01

 

Verify the host has successfully joined the cluster using vxclustadm.

 

[root@server101 ~]# vxclustadm nidmap

Name                             CVM Nid    CM Nid     State              

server101                        0          0          Joined: Slave      

server102                        2          1          Joined: Master     

 

 

  Next you want to make sure the CVM diskgroups are imported as shared.

 

 

[root@server101 ~]# vxdg list

NAME         STATE           ID

lockdg       enabled,shared,cds   1561033675.95.server101

datadg       enabled,shared,cds   1561032806.93.server101

 

 

  If they are not imported, attempt to import them with the shared flag.

 

#vxdg -s import datadg

 

 If you receive the following error, you will need to deport the diskgroup from all nodes in the cluster before it can be imported as shared.

[root@server101 ~]# vxdg -s import datadg

VxVM vxdg ERROR V-5-1-19179 Disk group datadg: import failed:

Disk is in use by another host

 

Once the diskgroups are successfully imported as shared start VCS again.

#hastart

Please let me know if this helps.

ED