CVMCluster:???:monitor:node - state: out of cluste...

Home_224 · ‎06-20-2019

Hi ,

I encounter the problem when start up the cluster, I check the CVM disk group cannot faulted on the both node.

root@devuaedbs31 # hastatus -sum

-- SYSTEM STATE

-- System State Frozen

A devuaedbs31 RUNNING 0

A devuaedbs32 RUNNING 0

-- GROUP STATE

-- Group System Probed AutoDisabled State

B MNICB_iUATgroup devuaedbs31 Y N ONLINE

B MNICB_iUATgroup devuaedbs32 Y N ONLINE

B cvm devuaedbs31 Y N PARTIAL

B cvm devuaedbs32 Y N OFFLINE|FAULTED

B dbs_rac_bkup devuaedbs31 Y N ONLINE

B dbs_rac_bkup devuaedbs32 Y N OFFLINE

B oraclerac_db1 devuaedbs31 Y N OFFLINE

B oraclerac_db1 devuaedbs32 Y N OFFLINE

B oraclerac_db2 devuaedbs31 Y N OFFLINE

B oraclerac_db2 devuaedbs32 Y N OFFLINE

B oraclerac_db3 devuaedbs31 Y N OFFLINE

B oraclerac_db3 devuaedbs32 Y N OFFLINE

B oraclerac_db4 devuaedbs31 Y N OFFLINE

B oraclerac_db4 devuaedbs32 Y N OFFLINE

B oraclerac_db5 devuaedbs31 Y N ONLINE

B oraclerac_db5 devuaedbs32 Y N OFFLINE|FAULTED

B oraclerac_db6 devuaedbs31 Y N OFFLINE

B oraclerac_db6 devuaedbs32 Y N OFFLINE

-- RESOURCES FAILED

-- Group Type Resource System

C cvm CVMVolDg diskgroup_crsdg devuaedbs32

C oraclerac_db5 CVMVolDg diskgroup_db5 devuaedbs32

-- RESOURCES OFFLINING

-- Group Type Resource System IState

F cvm CVMCluster cvm_clus devuaedbs31 W_OFFLINE

root@devuaedbs31 # cfscluster status

Node : devuaedbs31

Cluster Manager : running

CVM state : running

MOUNT POINT SHARED VOLUME DISK GROUP STATUS

Node : devuaedbs32

Cluster Manager : running

CVM state : not-running

MOUNT POINT SHARED VOLUME DISK GROUP STATUS

List of mount points registered with cluster-configuration

but not associated with any node: []

Before start up the file, I edit the main.cf to disable the service group online by VCS, then run hacf -verify . there is no error , then I startup the VCS, but the cvm not to able startup.

Please advice

Many Thanks,

Marianne · ‎07-04-2019

@Home_224

I have moved your post to the Cluster forum.
Nobody is monitoring the Documentation forum.

Handy NetBackup Links

frankgfan · ‎07-10-2019

your cluster is up (had on each node is up) but vxconfigd on each node is not in cluster mode

if you run

vxdctl -c mode

you would see the ouitput something l;ike

mode: enabled: cluster inactive

if there is no hardware issue with the storage, you should mbe able to clear the faults by running

1. hastop -all -force

2. vxclustadm startnode <<< run this command on each node

3. gabconfig -a <<< check if ports u, w are open

4. vxdclt -c mode if output shows cluister is active

5. if vxconfigd on each node is in cluster mode, run

hastart on each node to start vcs

EdwardC · ‎07-16-2019

We can see two failed CVMVolDg resources on host devuaedbs32

-- RESOURCES FAILED

-- Group Type Resource System

C cvm CVMVolDg diskgroup_crsdg devuaedbs32

C oraclerac_db5 CVMVolDg diskgroup_db5 devuaedbs32

Earlier in the engine_A.log the diskgroups were not imported as a shared diskgroup.

2019/06/20 19:44:54 VCS ERROR V-16-10001-1010 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:online_change_activation: can not change activation of dg crs_dg to shared-write

2019/06/20 19:44:54 VCS WARNING V-16-10001-1025 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:Can not set diskgroup crs_dg activation to sw

2019/06/20 19:44:54 VCS ERROR V-16-10001-1045 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:Initial check failed

2019/06/20 19:44:55 VCS INFO V-16-2-13001 (devuaedbs31) Resource(diskgroup_crsdg): Output of the completed operation (online)

VxVM vxdg ERROR V-5-1-3268 activation failed: Disk group crs_dg: shared-write: Invalid mode for non-shared disk group

We don't see the same invalid mode message during the failure an hour later, but the diskgroup resource is still being marked as faulted.

2019/06/20 21:01:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource diskgroup_crsdg (Owner: unknown, Group: cvm) on System devuaedbs32 2019/06/20 21:01:55 VCS WARNING V-16-10001-1074 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:setup_vxnotify: old vxnotify of pid 1496 will be killed. my pid is 389

2019/06/20 21:01:58 VCS ERROR V-16-10001-1009 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:could not find diskgroup crs_dg imported. If it was previously deported, it will have to be manually imported

2019/06/20 21:01:58 VCS ERROR V-16-10001-1044 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:Error saving vxprint file

2019/06/20 21:01:59 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (online)

VxVM vxprint ERROR V-5-1-582 Disk group crs_dg: No such disk group

2019/06/20 21:01:10 VCS ERROR V-16-2-13066 (devuaedbs32) Agent is calling clean for resource(diskgroup_crsdg) because the resource is not up even after online completed.

2019/06/20 21:01:11 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (clean)

/var/VRTSvcs/lock/diskgroup_crsdg_crs_dg_stat: No such file or directory

2019/06/20 21:01:11 VCS INFO V-16-2-13068 (devuaedbs32) Resource(diskgroup_crsdg) - clean completed successfully.

2019/06/20 21:01:11 VCS INFO V-16-2-13071 (devuaedbs32) Resource(diskgroup_crsdg): reached OnlineRetryLimit(0).

2019/06/20 21:01:12 VCS ERROR V-16-1-10303 Resource diskgroup_crsdg (Owner: unknown, Group: cvm) is FAULTED (timed out) on sys devuaedbs32

Once the resource diskgroup_crsdg is marked as faulted VCS initiates an offline of the CVM service group.

2019/06/20 21:01:12 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vxfsckd (Owner: unknown, Group: cvm) on System devuaedbs32

2019/06/20 21:01:12 VCS INFO V-16-6-15004 (devuaedbs32) hatrigger:Failed to send trigger for resfault; script doesn't exist

2019/06/20 21:01:14 VCS INFO V-16-1-10305 Resource vxfsckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)

2019/06/20 21:01:14 VCS NOTICE V-16-1-10300 Initiating Offline of Resource qlogckd (Owner: unknown, Group: cvm) on System devuaedbs32

2019/06/20 21:01:15 VCS INFO V-16-2-13001 (devuaedbs32) Resource(qlogckd): Output of the completed operation (offline)

UX:vxfs qlogprint: INFO: V-3-22897: There are no QuickLog devices active

2019/06/20 21:01:16 VCS INFO V-16-1-10305 Resource qlogckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)

2019/06/20 21:01:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cvm_clus (Owner: unknown, Group: cvm) on System devuaedbs32 2019/06/20 21:01:18 VCS ERROR V-16-10001-1005 (devuaedbs32) CVMCluster:???:monitor:node - state: out of cluster reason: user initiated stop 2019/06/20 21:01:19 VCS INFO V-16-1-10305 Resource cvm_clus (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)

2019/06/20 21:01:19 VCS ERROR V-16-1-10205 Group cvm is faulted on system devuaedbs32 2019/06/20 21:01:19 VCS NOTICE V-16-1-10446 Group cvm is offline on system devuaedbs32

The same happens for the group oraclerac_db5 when the diskgroup resource diskgroup_db5 is faulted.

I recommend moving the diskgroup resource diskgroup_crsdg to another service group, or marking it as non-critical to prevent VCS from taking it offline when the resource is faulted.

As mentioned earlier we can start CVM outside of VCS.

[root@server101 ~]# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen 332404 membership 01

Port b gen 332407 membership 01

Port d gen 332403 membership 01

[root@server101 ~]# vxclustadm -t gab -m vcs startnode

VxVM vxclustadm INFO V-5-2-9687 vxclustadm: Fencing driver is in disabled mode

[root@server101 ~]# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen 332404 membership 01

Port b gen 332407 membership 01

Port d gen 332403 membership 01

Port m gen 33240c membership 01

Port v gen 33240e membership 01

Port y gen 33240d membership 01

Verify the host has successfully joined the cluster using vxclustadm.

[root@server101 ~]# vxclustadm nidmap

Name CVM Nid CM Nid State

server101 0 0 Joined: Slave

server102 2 1 Joined: Master

Next you want to make sure the CVM diskgroups are imported as shared.

[root@server101 ~]# vxdg list

NAME STATE ID

lockdg enabled,shared,cds 1561033675.95.server101

datadg enabled,shared,cds 1561032806.93.server101

If they are not imported, attempt to import them with the shared flag.

#vxdg -s import datadg

If you receive the following error, you will need to deport the diskgroup from all nodes in the cluster before it can be imported as shared.

[root@server101 ~]# vxdg -s import datadg

VxVM vxdg ERROR V-5-1-19179 Disk group datadg: import failed:

Disk is in use by another host

Once the diskgroups are successfully imported as shared start VCS again.

#hastart

Please let me know if this helps.

ED

VOX

CVMCluster:???:monitor:node - state: out of cluster