cancel
Showing results for 
Search instead for 
Did you mean: 

vxfen in replaying state after reboot

IdaWong
Level 4

Hi,

I have rebooted a cluster with CPS vxfencing. 

there is no SAN disks mapped to the cluster. However, "vxfenadm -d" shows 2 out of 4 nodes is in (replaying state) for a long time. i had to do

hastop -all (properly on the nodes with the replay status would do)

then hastart again

it seems like a timing. has anyone out there seen this before?

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

starflyfly
Level 6
Employee Accredited Certified
Hi, Wong You can ignore that, it will function normally. Here is explanation from a technote: RFSM ( Replicated Finite State Machine) is an abstraction layer over GAB. VxFen uses it to handle node join processing functionality. The ‘replaying’ state When a node joins the cluster the RFSM builds its state table by requesting a snapshot from a node which is already running in the cluster. In this duration, all broadcast messages from peer nodes are queued. Once the local node receives the broadcast echo of its own SNAP_REQ message, it starts replaying these messages to the RFSM client; hence the name ‘replaying’. The fact that a node has gone in to the ‘replaying’ state means, that RFSM has acquired all the data needed for properly configuring itself. A CONFIG_DONE message is broadcast to indicate this. As soon as RFSM receives its own CONFIG_DONE message (while in replaying state), it transitions to the ‘running’ state. Solution For all practical purposes, the ‘replaying’ state can be assumed to be equivalent to the ‘running’ state. Fencing continues to function in the same way (for e.g during a race or node join/leave) as in the ‘running’ state. Nodes remaining in ‘replaying’ state is a known issue in RFSM, and is being tracked by engineering through an etrack, to be fixed in a future release.

View solution in original post

4 REPLIES 4

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

Can you paste the outputs ?  Also mention the OS & Linux versions as well ..

 

G

Marianne
Level 6
Partner    VIP    Accredited Certified

I have rebooted a cluster with CPS vxfencing. 

Just for the record - does this mean that you have rebooted all nodes simultaneously?

Where are the nodes in relation to the CPS servers?  
Did CPS servers stay up during reboot of nodes?

starflyfly
Level 6
Employee Accredited Certified
Hi, Wong You can ignore that, it will function normally. Here is explanation from a technote: RFSM ( Replicated Finite State Machine) is an abstraction layer over GAB. VxFen uses it to handle node join processing functionality. The ‘replaying’ state When a node joins the cluster the RFSM builds its state table by requesting a snapshot from a node which is already running in the cluster. In this duration, all broadcast messages from peer nodes are queued. Once the local node receives the broadcast echo of its own SNAP_REQ message, it starts replaying these messages to the RFSM client; hence the name ‘replaying’. The fact that a node has gone in to the ‘replaying’ state means, that RFSM has acquired all the data needed for properly configuring itself. A CONFIG_DONE message is broadcast to indicate this. As soon as RFSM receives its own CONFIG_DONE message (while in replaying state), it transitions to the ‘running’ state. Solution For all practical purposes, the ‘replaying’ state can be assumed to be equivalent to the ‘running’ state. Fencing continues to function in the same way (for e.g during a race or node join/leave) as in the ‘running’ state. Nodes remaining in ‘replaying’ state is a known issue in RFSM, and is being tracked by engineering through an etrack, to be fixed in a future release.

IdaWong
Level 4

Hi Marianne,

Indeed I have rebooted them at the same time. if i reboot 1 node first, the replay state would not show up.