We can see two failed CVMVolDg resources on host devuaedbs32
-- RESOURCES FAILED
-- Group Type Resource System
C cvm CVMVolDg diskgroup_crsdg devuaedbs32
C oraclerac_db5 CVMVolDg diskgroup_db5 devuaedbs32
Earlier in the engine_A.log the diskgroups were not imported as a shared diskgroup.
2019/06/20 19:44:54 VCS ERROR V-16-10001-1010 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:online_change_activation: can not change activation of dg crs_dg to shared-write
2019/06/20 19:44:54 VCS WARNING V-16-10001-1025 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:Can not set diskgroup crs_dg activation to sw
2019/06/20 19:44:54 VCS ERROR V-16-10001-1045 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:Initial check failed
2019/06/20 19:44:55 VCS INFO V-16-2-13001 (devuaedbs31) Resource(diskgroup_crsdg): Output of the completed operation (online)
VxVM vxdg ERROR V-5-1-3268 activation failed: Disk group crs_dg: shared-write: Invalid mode for non-shared disk group
We don't see the same invalid mode message during the failure an hour later, but the diskgroup resource is still being marked as faulted.
2019/06/20 21:01:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource diskgroup_crsdg (Owner: unknown, Group: cvm) on System devuaedbs32 2019/06/20 21:01:55 VCS WARNING V-16-10001-1074 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:setup_vxnotify: old vxnotify of pid 1496 will be killed. my pid is 389
2019/06/20 21:01:58 VCS ERROR V-16-10001-1009 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:could not find diskgroup crs_dg imported. If it was previously deported, it will have to be manually imported
2019/06/20 21:01:58 VCS ERROR V-16-10001-1044 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:Error saving vxprint file
2019/06/20 21:01:59 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (online)
VxVM vxprint ERROR V-5-1-582 Disk group crs_dg: No such disk group
2019/06/20 21:01:10 VCS ERROR V-16-2-13066 (devuaedbs32) Agent is calling clean for resource(diskgroup_crsdg) because the resource is not up even after online completed.
2019/06/20 21:01:11 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (clean)
/var/VRTSvcs/lock/diskgroup_crsdg_crs_dg_stat: No such file or directory
2019/06/20 21:01:11 VCS INFO V-16-2-13068 (devuaedbs32) Resource(diskgroup_crsdg) - clean completed successfully.
2019/06/20 21:01:11 VCS INFO V-16-2-13071 (devuaedbs32) Resource(diskgroup_crsdg): reached OnlineRetryLimit(0).
2019/06/20 21:01:12 VCS ERROR V-16-1-10303 Resource diskgroup_crsdg (Owner: unknown, Group: cvm) is FAULTED (timed out) on sys devuaedbs32
Once the resource diskgroup_crsdg is marked as faulted VCS initiates an offline of the CVM service group.
2019/06/20 21:01:12 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vxfsckd (Owner: unknown, Group: cvm) on System devuaedbs32
2019/06/20 21:01:12 VCS INFO V-16-6-15004 (devuaedbs32) hatrigger:Failed to send trigger for resfault; script doesn't exist
2019/06/20 21:01:14 VCS INFO V-16-1-10305 Resource vxfsckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)
2019/06/20 21:01:14 VCS NOTICE V-16-1-10300 Initiating Offline of Resource qlogckd (Owner: unknown, Group: cvm) on System devuaedbs32
2019/06/20 21:01:15 VCS INFO V-16-2-13001 (devuaedbs32) Resource(qlogckd): Output of the completed operation (offline)
UX:vxfs qlogprint: INFO: V-3-22897: There are no QuickLog devices active
2019/06/20 21:01:16 VCS INFO V-16-1-10305 Resource qlogckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)
2019/06/20 21:01:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cvm_clus (Owner: unknown, Group: cvm) on System devuaedbs32 2019/06/20 21:01:18 VCS ERROR V-16-10001-1005 (devuaedbs32) CVMCluster:???:monitor:node - state: out of cluster reason: user initiated stop 2019/06/20 21:01:19 VCS INFO V-16-1-10305 Resource cvm_clus (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)
2019/06/20 21:01:19 VCS ERROR V-16-1-10205 Group cvm is faulted on system devuaedbs32 2019/06/20 21:01:19 VCS NOTICE V-16-1-10446 Group cvm is offline on system devuaedbs32
The same happens for the group oraclerac_db5 when the diskgroup resource diskgroup_db5 is faulted.
I recommend moving the diskgroup resource diskgroup_crsdg to another service group, or marking it as non-critical to prevent VCS from taking it offline when the resource is faulted.
As mentioned earlier we can start CVM outside of VCS.
[root@server101 ~]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 332404 membership 01
Port b gen 332407 membership 01
Port d gen 332403 membership 01
[root@server101 ~]# vxclustadm -t gab -m vcs startnode
VxVM vxclustadm INFO V-5-2-9687 vxclustadm: Fencing driver is in disabled mode
[root@server101 ~]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 332404 membership 01
Port b gen 332407 membership 01
Port d gen 332403 membership 01
Port m gen 33240c membership 01
Port v gen 33240e membership 01
Port y gen 33240d membership 01
Verify the host has successfully joined the cluster using vxclustadm.
[root@server101 ~]# vxclustadm nidmap
Name CVM Nid CM Nid State
server101 0 0 Joined: Slave
server102 2 1 Joined: Master
Next you want to make sure the CVM diskgroups are imported as shared.
[root@server101 ~]# vxdg list
NAME STATE ID
lockdg enabled,shared,cds 1561033675.95.server101
datadg enabled,shared,cds 1561032806.93.server101
If they are not imported, attempt to import them with the shared flag.
#vxdg -s import datadg
If you receive the following error, you will need to deport the diskgroup from all nodes in the cluster before it can be imported as shared.
[root@server101 ~]# vxdg -s import datadg
VxVM vxdg ERROR V-5-1-19179 Disk group datadg: import failed:
Disk is in use by another host
Once the diskgroups are successfully imported as shared start VCS again.
#hastart
Please let me know if this helps.
ED