06-20-2019 06:51 AM
Hi ,
I encounter the problem when start up the cluster, I check the CVM disk group cannot faulted on the both node.
root@devuaedbs31 # hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A devuaedbs31 RUNNING 0
A devuaedbs32 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B MNICB_iUATgroup devuaedbs31 Y N ONLINE
B MNICB_iUATgroup devuaedbs32 Y N ONLINE
B cvm devuaedbs31 Y N PARTIAL
B cvm devuaedbs32 Y N OFFLINE|FAULTED
B dbs_rac_bkup devuaedbs31 Y N ONLINE
B dbs_rac_bkup devuaedbs32 Y N OFFLINE
B oraclerac_db1 devuaedbs31 Y N OFFLINE
B oraclerac_db1 devuaedbs32 Y N OFFLINE
B oraclerac_db2 devuaedbs31 Y N OFFLINE
B oraclerac_db2 devuaedbs32 Y N OFFLINE
B oraclerac_db3 devuaedbs31 Y N OFFLINE
B oraclerac_db3 devuaedbs32 Y N OFFLINE
B oraclerac_db4 devuaedbs31 Y N OFFLINE
B oraclerac_db4 devuaedbs32 Y N OFFLINE
B oraclerac_db5 devuaedbs31 Y N ONLINE
B oraclerac_db5 devuaedbs32 Y N OFFLINE|FAULTED
B oraclerac_db6 devuaedbs31 Y N OFFLINE
B oraclerac_db6 devuaedbs32 Y N OFFLINE
-- RESOURCES FAILED
-- Group Type Resource System
C cvm CVMVolDg diskgroup_crsdg devuaedbs32
C oraclerac_db5 CVMVolDg diskgroup_db5 devuaedbs32
-- RESOURCES OFFLINING
-- Group Type Resource System IState
F cvm CVMCluster cvm_clus devuaedbs31 W_OFFLINE
root@devuaedbs31 # cfscluster status
Node : devuaedbs31
Cluster Manager : running
CVM state : running
MOUNT POINT SHARED VOLUME DISK GROUP STATUS
Node : devuaedbs32
Cluster Manager : running
CVM state : not-running
MOUNT POINT SHARED VOLUME DISK GROUP STATUS
List of mount points registered with cluster-configuration
but not associated with any node: []
Before start up the file, I edit the main.cf to disable the service group online by VCS, then run hacf -verify . there is no error , then I startup the VCS, but the cvm not to able startup.
Please advice
Many Thanks,
07-04-2019 12:50 AM
I have moved your post to the Cluster forum.
Nobody is monitoring the Documentation forum.
07-10-2019 05:20 AM
your cluster is up (had on each node is up) but vxconfigd on each node is not in cluster mode
if you run
vxdctl -c mode
you would see the ouitput something l;ike
mode: enabled: cluster inactive
if there is no hardware issue with the storage, you should mbe able to clear the faults by running
1. hastop -all -force
2. vxclustadm startnode <<< run this command on each node
3. gabconfig -a <<< check if ports u, w are open
4. vxdclt -c mode if output shows cluister is active
5. if vxconfigd on each node is in cluster mode, run
hastart on each node to start vcs
07-16-2019 08:13 AM
We can see two failed CVMVolDg resources on host devuaedbs32
-- RESOURCES FAILED
-- Group Type Resource System
C cvm CVMVolDg diskgroup_crsdg devuaedbs32
C oraclerac_db5 CVMVolDg diskgroup_db5 devuaedbs32
Earlier in the engine_A.log the diskgroups were not imported as a shared diskgroup.
2019/06/20 19:44:54 VCS ERROR V-16-10001-1010 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:online_change_activation: can not change activation of dg crs_dg to shared-write
2019/06/20 19:44:54 VCS WARNING V-16-10001-1025 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:Can not set diskgroup crs_dg activation to sw
2019/06/20 19:44:54 VCS ERROR V-16-10001-1045 (devuaedbs31) CVMVolDg:diskgroup_crsdg:online:Initial check failed
2019/06/20 19:44:55 VCS INFO V-16-2-13001 (devuaedbs31) Resource(diskgroup_crsdg): Output of the completed operation (online)
VxVM vxdg ERROR V-5-1-3268 activation failed: Disk group crs_dg: shared-write: Invalid mode for non-shared disk group
We don't see the same invalid mode message during the failure an hour later, but the diskgroup resource is still being marked as faulted.
2019/06/20 21:01:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource diskgroup_crsdg (Owner: unknown, Group: cvm) on System devuaedbs32 2019/06/20 21:01:55 VCS WARNING V-16-10001-1074 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:setup_vxnotify: old vxnotify of pid 1496 will be killed. my pid is 389
2019/06/20 21:01:58 VCS ERROR V-16-10001-1009 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:could not find diskgroup crs_dg imported. If it was previously deported, it will have to be manually imported
2019/06/20 21:01:58 VCS ERROR V-16-10001-1044 (devuaedbs32) CVMVolDg:diskgroup_crsdg:online:Error saving vxprint file
2019/06/20 21:01:59 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (online)
VxVM vxprint ERROR V-5-1-582 Disk group crs_dg: No such disk group
2019/06/20 21:01:10 VCS ERROR V-16-2-13066 (devuaedbs32) Agent is calling clean for resource(diskgroup_crsdg) because the resource is not up even after online completed.
2019/06/20 21:01:11 VCS INFO V-16-2-13001 (devuaedbs32) Resource(diskgroup_crsdg): Output of the completed operation (clean)
/var/VRTSvcs/lock/diskgroup_crsdg_crs_dg_stat: No such file or directory
2019/06/20 21:01:11 VCS INFO V-16-2-13068 (devuaedbs32) Resource(diskgroup_crsdg) - clean completed successfully.
2019/06/20 21:01:11 VCS INFO V-16-2-13071 (devuaedbs32) Resource(diskgroup_crsdg): reached OnlineRetryLimit(0).
2019/06/20 21:01:12 VCS ERROR V-16-1-10303 Resource diskgroup_crsdg (Owner: unknown, Group: cvm) is FAULTED (timed out) on sys devuaedbs32
Once the resource diskgroup_crsdg is marked as faulted VCS initiates an offline of the CVM service group.
2019/06/20 21:01:12 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vxfsckd (Owner: unknown, Group: cvm) on System devuaedbs32
2019/06/20 21:01:12 VCS INFO V-16-6-15004 (devuaedbs32) hatrigger:Failed to send trigger for resfault; script doesn't exist
2019/06/20 21:01:14 VCS INFO V-16-1-10305 Resource vxfsckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)
2019/06/20 21:01:14 VCS NOTICE V-16-1-10300 Initiating Offline of Resource qlogckd (Owner: unknown, Group: cvm) on System devuaedbs32
2019/06/20 21:01:15 VCS INFO V-16-2-13001 (devuaedbs32) Resource(qlogckd): Output of the completed operation (offline)
UX:vxfs qlogprint: INFO: V-3-22897: There are no QuickLog devices active
2019/06/20 21:01:16 VCS INFO V-16-1-10305 Resource qlogckd (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)
2019/06/20 21:01:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cvm_clus (Owner: unknown, Group: cvm) on System devuaedbs32 2019/06/20 21:01:18 VCS ERROR V-16-10001-1005 (devuaedbs32) CVMCluster:???:monitor:node - state: out of cluster reason: user initiated stop 2019/06/20 21:01:19 VCS INFO V-16-1-10305 Resource cvm_clus (Owner: unknown, Group: cvm) is offline on devuaedbs32 (VCS initiated)
2019/06/20 21:01:19 VCS ERROR V-16-1-10205 Group cvm is faulted on system devuaedbs32 2019/06/20 21:01:19 VCS NOTICE V-16-1-10446 Group cvm is offline on system devuaedbs32
The same happens for the group oraclerac_db5 when the diskgroup resource diskgroup_db5 is faulted.
I recommend moving the diskgroup resource diskgroup_crsdg to another service group, or marking it as non-critical to prevent VCS from taking it offline when the resource is faulted.
As mentioned earlier we can start CVM outside of VCS.
[root@server101 ~]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 332404 membership 01
Port b gen 332407 membership 01
Port d gen 332403 membership 01
[root@server101 ~]# vxclustadm -t gab -m vcs startnode
VxVM vxclustadm INFO V-5-2-9687 vxclustadm: Fencing driver is in disabled mode
[root@server101 ~]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 332404 membership 01
Port b gen 332407 membership 01
Port d gen 332403 membership 01
Port m gen 33240c membership 01
Port v gen 33240e membership 01
Port y gen 33240d membership 01
Verify the host has successfully joined the cluster using vxclustadm.
[root@server101 ~]# vxclustadm nidmap
Name CVM Nid CM Nid State
server101 0 0 Joined: Slave
server102 2 1 Joined: Master
Next you want to make sure the CVM diskgroups are imported as shared.
[root@server101 ~]# vxdg list
NAME STATE ID
lockdg enabled,shared,cds 1561033675.95.server101
datadg enabled,shared,cds 1561032806.93.server101
If they are not imported, attempt to import them with the shared flag.
#vxdg -s import datadg
If you receive the following error, you will need to deport the diskgroup from all nodes in the cluster before it can be imported as shared.
[root@server101 ~]# vxdg -s import datadg
VxVM vxdg ERROR V-5-1-19179 Disk group datadg: import failed:
Disk is in use by another host
Once the diskgroups are successfully imported as shared start VCS again.
#hastart
Please let me know if this helps.
ED