cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted

Cluster node hangs when 2 servers are rebooted simultaneously

Dears,

I have a cluster of 6 nodes, and cvm is online on all the 6 nodes.

Our OS is rehat 7.4, and when I try to reboot 2 serevers simultaneously, then one of the nodes and sometimes both of them does not go down, and it stays hang till we reboot from management console.

I can see in engine_A.log

2019/09/11 10:32:17 VCS ERROR V-16-2-13079 (CBBGEMM06SESMBV) Resource(cvm_clus): The last 10 invocations of the clean procedure have failed.

Logs from CFSfsckd_A.log

2019/09/11 10:07:41 VCS WARNING V-16-20071-5501 CFSfsckd:vxfsckd:offline:Deinit Error : UX:vxfs fsclustadm: ERROR: V-3-23718: cannot deinit cfs: error:Device or resource busy
2019/09/11 10:07:50 VCS WARNING V-16-20071-5501 CFSfsckd:vxfsckd:offline:Deinit Error : UX:vxfs fsclustadm: ERROR: V-3-23718: cannot deinit cfs: error:Device or resource busy
2019/09/11 10:07:57 VCS WARNING V-16-20071-5501 CFSfsckd:vxfsckd:offline:Deinit Error : UX:vxfs fsclustadm: ERROR: V-3-23718: cannot deinit cfs: error:Device or resource busy
2019/09/11 10:08:02 VCS WARNING V-16-20071-5501 CFSfsckd:vxfsckd:offline:Deinit Error : UX:vxfs fsclustadm: ERROR: V-3-23718: cannot deinit cfs: error:Device or resource busy

Your urgent support is highly appreciated. 

Let me know if more information is required.

Thanks a lot

2 Replies
Highlighted

Re: Cluster node hangs when 2 servers are rebooted simultaneously

the possible causes of the reboot issue mentions are:

1. the server was hung (load too high/lack of resources to run the applications on a constant peak load)

2. resoruce dependency related issue

3. an application(process) failed to go offline(offline timeout or hung)

To troubleshoot the issue further, next time, before rebooting, just run the command below

#hastop -local -force

to find out if HAD can be offlined gracefully.

If HAD can be offlined gracefully, run the commands below onthe same server

#hastart

#hastatus -sum       <<< make sure the node is online

#hastop -local

#hastatus              <<< this command output shows dynamically the processes/resources are being offlined

Ctrl-C to brk out

#hastatus -sum

If an applications fails to go offline, check engine_A.log to determine the cause of the issue and take measure to rectify the issue (like VCS tuning) accordingly

 

Highlighted

Re: Cluster node hangs when 2 servers are rebooted simultaneously