cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted

CVMCluster:???:monitor:node - state: out of cluster

Hi ,

 

I encounter the problem when start up the cluster,  I check the CVM disk group cannot faulted on the both node. 

root@devuaedbs31 # hastatus -sum

 

-- SYSTEM STATE

-- System               State                Frozen

 

A  devuaedbs31          RUNNING              0

A  devuaedbs32          RUNNING              0

 

-- GROUP STATE

-- Group           System               Probed     AutoDisabled    State

 

B  MNICB_iUATgroup devuaedbs31          Y          N               ONLINE

B  MNICB_iUATgroup devuaedbs32          Y          N               ONLINE

B  cvm             devuaedbs31          Y          N               PARTIAL

B  cvm             devuaedbs32          Y          N               OFFLINE|FAULTED

B  dbs_rac_bkup    devuaedbs31          Y          N               ONLINE

B  dbs_rac_bkup    devuaedbs32          Y          N               OFFLINE

B  oraclerac_db1   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db1   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db2   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db2   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db3   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db3   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db4   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db4   devuaedbs32          Y          N               OFFLINE

B  oraclerac_db5   devuaedbs31          Y          N               ONLINE

B  oraclerac_db5   devuaedbs32          Y          N               OFFLINE|FAULTED

B  oraclerac_db6   devuaedbs31          Y          N               OFFLINE

B  oraclerac_db6   devuaedbs32          Y          N               OFFLINE

 

-- RESOURCES FAILED

-- Group           Type                 Resource             System

 

C  cvm             CVMVolDg             diskgroup_crsdg      devuaedbs32

C  oraclerac_db5   CVMVolDg             diskgroup_db5        devuaedbs32

 

-- RESOURCES OFFLINING

-- Group           Type            Resource             System               IState

 

F  cvm             CVMCluster      cvm_clus             devuaedbs31          W_OFFLINE

root@devuaedbs31 # cfscluster status

  Node             :  devuaedbs31

  Cluster Manager  :  running

  CVM state        :  running

  MOUNT POINT    SHARED VOLUME  DISK GROUP        STATUS

 

  Node             :  devuaedbs32

  Cluster Manager  :  running

  CVM state        :  not-running

  MOUNT POINT    SHARED VOLUME  DISK GROUP        STATUS

  List of mount points registered with cluster-configuration

  but not associated with any node: []

Before start up the file, I edit the main.cf to disable the service group online by VCS, then run hacf -verify . there is no error , then I startup the VCS, but the cvm not to able startup.

 

Please advice 

 

Many Thanks,

 

3 Replies
Highlighted

Re: CVMCluster:???:monitor:node - state: out of cluster

@Home_224 

I have moved your post to the Cluster forum.
Nobody is monitoring the Documentation forum. 

Highlighted

Re: CVMCluster:???:monitor:node - state: out of cluster

your cluster is up (had on each node is up) but vxconfigd  on each node is not in cluster mode

 

if you run

vxdctl -c mode

 

you would see the ouitput something l;ike

mode: enabled: cluster inactive

 

if there is no hardware issue with the storage, you should mbe able to clear the faults by running

1. hastop -all -force

2. vxclustadm startnode     <<< run this command on each node

3. gabconfig -a         <<< check if ports u, w are open

4. vxdclt -c mode  if output shows cluister is active

5. if vxconfigd on each node is in cluster mode, run

 

hastart on each node to start vcs

 

 

Highlighted

Re: CVMCluster:???:monitor:node - state: out of cluster

We can see two failed CVMVolDg resources on host devuaedbs32

 

-- RESOURCES FAILED

 

-- Group           Type                 Resource             System

 

 

C  cvm             CVMVolDg             diskgroup_crsdg      devuaedbs32

C  oraclerac_db5   CVMVolDg             diskgroup_db5        devuaedbs32

 

 

  Earlier in the engine_A.log the diskgroups were not imported as a shared diskgroup.

 

2019/06/20 19:44:54 VCS ERROR V-16-10001-1010 (devuaedbs31) CVMVolDg:diskgroup_crsdgSmiley SurprisednlineSmiley Surprisednline_change_activation: can not change activation of dg crs_dg to shared-write                                      

2019/06/20 19:44:54 VCS WARNING V-16-10001-1025 (devuaedbs31) CVMVolDg:diskgroup_crsdgSmiley Surprisednline:Can not set diskgroup crs_dg activation to sw                                                                       

2019/06/20 19:44:54 VCS ERROR V-16-10001-1045 (devuaedbs31) CVMVolDg:diskgroup_crsdgSmiley Surprisednline:Initial check failed                                                                                                  

2019/06/20 19:44:55 VCS INFO V-16-2-13001 (devuaedbs31) Resource(diskgroup_crsdg): Output of the completed operation (online)                                                                                      

VxVM vxdg ERROR V-5-1-3268  activation failed: Disk group crs_dg: shared-write: Invalid mode for non-shared disk group                         

 

We don't see the same invalid mode message during the failure an hour later, but the diskgroup resource is still being marked as faulted.

 

 

 

2019/06/20 21:01:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource diskgroup_crsdg (Owner: unknown, Group: cvm) on System devuaedbs32                                                                       2019/06/20 21:01:55 VCS WARNING V-16-10001-1074 (devuaedbs32) CVMVolDg:diskgroup_crsdgSmiley Surprisednline:setup_vxnotify: old vxnotify of pid 1496 will be killed. my pid is 389                                              

2019/06/20 21:01:58 VCS ERROR V-16-10001-1009 (devuaedbs32) CVMVolDg:diskgroup_crsdgSmiley Surprisednline:could not find diskgroup crs_dg imported. If it was previously deported, it will have to be manually imported         

2019/06/20 21:01:58 VCS ERROR V-16-10001-1044 (devuaedbs32) CVMVolDg:diskgroup_crsdgSmiley Surprisednline:Error saving vxprint file                                                                                              

2019/06/20 21:01:59 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (online)                                                                                      

VxVM vxprint ERROR V-5-1-582 Disk group crs_dg: No such disk group                                                                                                                                                

2019/06/20 21:01:10 VCS ERROR V-16-2-13066 (devuaedbs32) Agent is calling clean for resource(diskgroup_crsdg) because the resource is not up even after online completed.                                         

2019/06/20 21:01:11 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (clean)                                                                                      

/var/VRTSvcs/lock/diskgroup_crsdg_crs_dg_stat: No such file or directory                                                                                                                                           

2019/06/20 21:01:11 VCS INFO V-16-2-13068 (devuaedbs32) Resource(diskgroup_crsdg) - clean completed successfully.                                                                                                  

2019/06/20 21:01:11 VCS INFO V-16-2-13071 (devuaedbs32) Resource(diskgroup_crsdg): reached OnlineRetryLimit(0).                                                                                                    

2019/06/20 21:01:12 VCS ERROR V-16-1-10303 Resource diskgroup_crsdg (Owner: unknown, Group: cvm) is FAULTED (timed out) on sys devuaedbs32

 

 

Once the resource diskgroup_crsdg is marked as faulted VCS initiates an offline of the CVM service group.

 

 

 

2019/06/20 21:01:12 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vxfsckd (Owner: unknown, Group: cvm) on System devuaedbs32                                                                             

2019/06/20 21:01:12 VCS INFO V-16-6-15004 (devuaedbs32) hatrigger:Failed to send trigger for resfault; script doesn't exist                                                                                       

2019/06/20 21:01:14 VCS INFO V-16-1-10305 Resource vxfsckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)                                                                                 

2019/06/20 21:01:14 VCS NOTICE V-16-1-10300 Initiating Offline of Resource qlogckd (Owner: unknown, Group: cvm) on System devuaedbs32                                                                             

2019/06/20 21:01:15 VCS INFO V-16-2-13001 (devuaedbs32) Resource(qlogckd): Output of the completed operation (offline)                                                                                             

UX:vxfs qlogprint: INFO: V-3-22897: There are no QuickLog devices active                                                                                                                                           

2019/06/20 21:01:16 VCS INFO V-16-1-10305 Resource qlogckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)                                                                                 

2019/06/20 21:01:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cvm_clus (Owner: unknown, Group: cvm) on System devuaedbs32                                                                             2019/06/20 21:01:18 VCS ERROR V-16-10001-1005 (devuaedbs32) CVMCluster:???:monitor:node - state: out of cluster                                                                                                    reason: user initiated stop                                                                                                                                                                                        2019/06/20 21:01:19 VCS INFO V-16-1-10305 Resource cvm_clus (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)                                                                                 

2019/06/20 21:01:19 VCS ERROR V-16-1-10205 Group cvm is faulted on system devuaedbs32                                                                                                                              2019/06/20 21:01:19 VCS NOTICE V-16-1-10446 Group cvm is offline on system devuaedbs32 

 

 

The same happens for the group oraclerac_db5 when the diskgroup resource diskgroup_db5 is faulted.

 

 

 

I recommend moving the diskgroup resource diskgroup_crsdg to another service group, or marking it as non-critical to prevent VCS from taking it offline when the resource is faulted.

 

 

As mentioned earlier we can start CVM outside of VCS.

 

 

 

[root@server101 ~]# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen   332404 membership 01

Port b gen   332407 membership 01

Port d gen   332403 membership 01

 

 

[root@server101 ~]# vxclustadm -t gab -m vcs startnode

VxVM vxclustadm INFO V-5-2-9687 vxclustadm: Fencing driver is in disabled mode

 

[root@server101 ~]# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen   332404 membership 01

Port b gen   332407 membership 01

Port d gen   332403 membership 01

Port m gen   33240c membership 01

Port v gen   33240e membership 01

Port y gen   33240d membership 01

 

Verify the host has successfully joined the cluster using vxclustadm.

 

[root@server101 ~]# vxclustadm nidmap

Name                             CVM Nid    CM Nid     State              

server101                        0          0          Joined: Slave      

server102                        2          1          Joined: Master     

 

 

  Next you want to make sure the CVM diskgroups are imported as shared.

 

 

[root@server101 ~]# vxdg list

NAME         STATE           ID

lockdg       enabled,shared,cds   1561033675.95.server101

datadg       enabled,shared,cds   1561032806.93.server101

 

 

  If they are not imported, attempt to import them with the shared flag.

 

#vxdg -s import datadg

 

 If you receive the following error, you will need to deport the diskgroup from all nodes in the cluster before it can be imported as shared.

[root@server101 ~]# vxdg -s import datadg

VxVM vxdg ERROR V-5-1-19179 Disk group datadg: import failed:

Disk is in use by another host

 

Once the diskgroups are successfully imported as shared start VCS again.

#hastart

Please let me know if this helps.

ED