Solved: CFS access is blocking after both heartbeat links ...

kongzzzz · ‎09-02-2013

Hi

I have a server cluster environment with VCS 6.0.2, 6 servers constitute the cluster, I/O fencing is configured with 3 coordinator disks. If I cold boot one of server, I found CFS access on other running server is blocking in a period.

I found CFS access starts being blocked when below logs appear in /var/log/messages

  LLT INFO V-14-1-10205 link 0 (eth6.109) node 0 in trouble

  LLT INFO V-14-1-10205 link 1 (eth7.110) node 0 in trouble

And access allowing when below logs appear in /var/log/messages

vxfs: msgcnt 8 Phase 2 - /dev/vx/dsk/filedg/filevol - Buffer reads allowed.

vxfs: msgcnt 9 Phase 9 - /dev/vx/dsk/filedg/filevol - Set Primary nodeid to 2

vxglm INFO V-42-106 GLM recovery complete, gen f59d30, mbr 2c/0/0/0

vxglm INFO V-42-107 times: skew 2673 ms, remaster 78 ms, completion 40 ms

I think the CFS access blocking is for data protection, but as my observation, CFS access blocking may continue 10+ seconds on running servers, so my questions are:

1. Is this the correct behaviour for VCS to block CFS access 10+ seconds?

2. Why not start CFS access blocking after heartbeat link being expired and before racing coordinator disks.

Thanks in advance!

Daniel_Matheus · ‎09-02-2013

Hi Kongzz,

this is expected behaviour.

CVM and CFS master need to failover if the master node is borught down or faulted.

This also includes replaying any queued I/O (intent log).

Please see excerpt from the SFCFS admin guide:

If the server on which the Cluster File System (CFS) primary node is running fails,
the remaining cluster nodes elect a new primary node. The new primary node
reads the file system intent log and completes any metadata updates that were
in process at the time of the failure. Application I/O from other nodes may block
during this process and cause a delay. When the file system is again consistent,
application processing resumes.
Because nodes using a cluster file system in secondary node do not update file
system metadata directly, failure of a secondary node does not require metadata
repair. CFS recovery from secondary node failure is therefore faster than from a
primary node failure.

View solution in original post

Daniel_Matheus · ‎09-02-2013

Hi Kongzz,

this is expected behaviour.

CVM and CFS master need to failover if the master node is borught down or faulted.

This also includes replaying any queued I/O (intent log).

Please see excerpt from the SFCFS admin guide:

If the server on which the Cluster File System (CFS) primary node is running fails,
the remaining cluster nodes elect a new primary node. The new primary node
reads the file system intent log and completes any metadata updates that were
in process at the time of the failure. Application I/O from other nodes may block
during this process and cause a delay. When the file system is again consistent,
application processing resumes.
Because nodes using a cluster file system in secondary node do not update file
system metadata directly, failure of a secondary node does not require metadata
repair. CFS recovery from secondary node failure is therefore faster than from a
primary node failure.

mikebounds · ‎09-02-2013

Default timeout for LLT is 15 seconds

With a CFS mount, one node is designated the primary (for each mount) and this controls coordination of writes.

So I believe this is how it works:

If you loose the node where the CFS primary resides, then until another node is elected primary, the secondary nodes can't write and can only read from buffer as if you read from disk then data could be changing as you read the data as the primary might not be down - you could have split-brain. No action can be taken until heartbeat times out as the heartbeat could return. A new node cannot be elected primary until race condition decides which nodes stay up.

Additionally, see extract from CFS admin guide:

If the server on which the SFCFS primary is running fails, the remaining cluster

nodes elect a new primary. The new primary reads the file system intent log and

completes any metadata updates that were in process at the time of the failure.

Application I/O from other nodes may block during this process and cause a delay.

When the file system is again consistent, application processing resumes.

So I would think that CFS access could be blocked for 15+ seconds

Mike

kongzzzz · ‎09-28-2013

Thanks! Daniel and mikebounds.

VOX

CFS access is blocking after both heartbeat links are down