cancel
Showing results for 
Search instead for 
Did you mean: 

vxconfigd hung under heavy mirror resync activity

sthakur
Level 3
Partner Accredited

We have several multilayer volumes (striped- mirror). Because of a recent server crash a lot of sub-volumes mirrored plexes are syncing up (112 plex att operations are running). But I'm having issues with vxconfigd. It's a 3 node cluster. I could get 2 nodes up, but cannot get cvm online on 3rd node, it times out. While this is happening, it appears that vxconfigd on the master node just hangs. I cannot run any vxprint or vxtask commands. I even let cvm online operation time out, then waited for 2 hours for vxconfigd on master server to be responsive, then tried again to manually bring cvm online, but same issue, operation timed out and vxconfigd got hung.

Has anyone experienced this? I'm right now very apprehensive to run production load on the cluster. I'm running FIlestore 5.7.

 

6 REPLIES 6

joseph_dangelo
Level 6
Employee Accredited

Although we do enjoy challenging post and requests I would have highly recommend that you open a support case to ensure that your issue is resolved in a timely manner. 

Joe D

S_Herdejurgen
Level 4
Accredited Certified
If you run pfiles against the vxconfigd process ID, how many file descriptors are being used?

sthakur
Level 3
Partner Accredited

Hi Sean, I'm running linux, so file descriptor count on /proc/30663/fd (vxconfigd process) shows 189.

Hi Joe,

I've Symantec case open for almost 2 weeks now, but all I got was  "let the sync finish which could be causing vxconfigd to get busy". I wanted to verify if this is pausible or if there is a known bug that might be causing this.

joseph_dangelo
Level 6
Employee Accredited

Can you please send me the case ID that you were given by support. Please forward it to me in a private message to my connect inbox.

Thanks,

Joe D

Gaurav_S
Moderator
Moderator
   VIP    Certified

I believe there could be possibility of 2 different issues here ...

1. CVM not joining the cluster -- agree that it could be because of heavy IO being issued  to vxconfigd but as u mention that following a server crash . did u chk the amount of shared disks visible from the 3rd node ? CVM is very much sensitive to number of disks it can see ..... in case you are unable to see the equal number of shared disks from the three servers, you may face issues joining the CVM ..

2. Heavy sync on vxconfigd - I believe there is no direct solution to this ... but as an interim , you can try to pause most of the tasks allow only few like 5-10 tasks to resync & then try joining (considering number of disks are equal) ... may help to sail thru here ..

Gaurav

S_Herdejurgen
Level 4
Accredited Certified

How many file descriptors are open now?  Are the resyncs still running?