09-12-2019 12:27 AM
Dears,
I have a cluster of 6 nodes, and cvm is online on all the 6 nodes.
Our OS is rehat 7.4, and when I try to reboot 2 serevers simultaneously, then one of the nodes and sometimes both of them does not go down, and it stays hang till we reboot from management console.
I can see in engine_A.log
2019/09/11 10:32:17 VCS ERROR V-16-2-13079 (CBBGEMM06SESMBV) Resource(cvm_clus): The last 10 invocations of the clean procedure have failed.
Logs from CFSfsckd_A.log
2019/09/11 10:07:41 VCS WARNING V-16-20071-5501 CFSfsckd:vxfsckd:offline:Deinit Error : UX:vxfs fsclustadm: ERROR: V-3-23718: cannot deinit cfs: error:Device or resource busy 2019/09/11 10:07:50 VCS WARNING V-16-20071-5501 CFSfsckd:vxfsckd:offline:Deinit Error : UX:vxfs fsclustadm: ERROR: V-3-23718: cannot deinit cfs: error:Device or resource busy 2019/09/11 10:07:57 VCS WARNING V-16-20071-5501 CFSfsckd:vxfsckd:offline:Deinit Error : UX:vxfs fsclustadm: ERROR: V-3-23718: cannot deinit cfs: error:Device or resource busy 2019/09/11 10:08:02 VCS WARNING V-16-20071-5501 CFSfsckd:vxfsckd:offline:Deinit Error : UX:vxfs fsclustadm: ERROR: V-3-23718: cannot deinit cfs: error:Device or resource busy
Your urgent support is highly appreciated.
Let me know if more information is required.
Thanks a lot
09-14-2019 03:53 AM
the possible causes of the reboot issue mentions are:
1. the server was hung (load too high/lack of resources to run the applications on a constant peak load)
2. resoruce dependency related issue
3. an application(process) failed to go offline(offline timeout or hung)
To troubleshoot the issue further, next time, before rebooting, just run the command below
#hastop -local -force
to find out if HAD can be offlined gracefully.
If HAD can be offlined gracefully, run the commands below onthe same server
#hastart
#hastatus -sum <<< make sure the node is online
#hastop -local
#hastatus <<< this command output shows dynamically the processes/resources are being offlined
Ctrl-C to brk out
#hastatus -sum
If an applications fails to go offline, check engine_A.log to determine the cause of the issue and take measure to rectify the issue (like VCS tuning) accordingly
10-10-2019 02:03 AM
Did you see below TN