We have several multilayer volumes (striped- mirror). Because of a recent server crash a lot of sub-volumes mirrored plexes are syncing up (112 plex att operations are running). But I'm having issues with vxconfigd. It's a 3 node cluster. I could get 2 nodes up, but cannot get cvm online on 3rd node, it times out. While this is happening, it appears that vxconfigd on the master node just hangs. I cannot run any vxprint or vxtask commands. I even let cvm online operation time out, then waited for 2 hours for vxconfigd on master server to be responsive, then tried again to manually bring cvm online, but same issue, operation timed out and vxconfigd got hung.
Has anyone experienced this? I'm right now very apprehensive to run production load on the cluster. I'm running FIlestore 5.7.
Hi Sean, I'm running linux, so file descriptor count on /proc/30663/fd (vxconfigd process) shows 189.
I've Symantec case open for almost 2 weeks now, but all I got was "let the sync finish which could be causing vxconfigd to get busy". I wanted to verify if this is pausible or if there is a known bug that might be causing this.
I believe there could be possibility of 2 different issues here ...
1. CVM not joining the cluster -- agree that it could be because of heavy IO being issued to vxconfigd but as u mention that following a server crash . did u chk the amount of shared disks visible from the 3rd node ? CVM is very much sensitive to number of disks it can see ..... in case you are unable to see the equal number of shared disks from the three servers, you may face issues joining the CVM ..
2. Heavy sync on vxconfigd - I believe there is no direct solution to this ... but as an interim , you can try to pause most of the tasks allow only few like 5-10 tasks to resync & then try joining (considering number of disks are equal) ... may help to sail thru here ..