cancel
Showing results for 
Search instead for 
Did you mean: 

Heartbeat timeout value

Divakar_SK
Level 2

Hi,

Recently one of the cluster node got rebooted due to all heartbeat network down (Due to some changes on switch. It took about app 60 Secs)

We informed about the reboot to Network Team and in turn they suggested to change the heartbeat timeout value to 60 Secs.

 

Requesting your help - Is it advisable to change the heartbeat timeout value to 60 Secs.

I think the default value is 15 Secs. If we change the value from default, what is the consequences?

Please advise.

 

Divakar

4 REPLIES 4

Wally_Heim
Level 6
Employee

Hi Divakar,

It is possible to set the heartbeat timeout to be 60 seconds.  However, on the windows platform we don't recommend setting the heartbeat value above 30 seconds. 

 

If you are concerned with the reboot there are several switches in the heartbeat configuration that control the reboot of the node in certain situations.  It sounds like you have one or more of these swtiches set.  You would check to see if disabling the reboot option would be more of what you are looking for.

 

Thanks,

Wally

Gaurav_S
Moderator
Moderator
   VIP    Certified

I wouldn't really recommend that value .. couple of reasons ..

1. manually increasing the timeout value means you are increasing the time cluster will detect the fault which means delayed fault detection, delayed corrective actions .. business may not really permit it, if the running apps are mission critical even 30s may have value.

2. 30s we are talking on heartbeat, so in case of split brain situation you are intentionally delaying cluster to take action which could be serious (hope you are IOFencing in place)

LLT or heartbeat is a very crucial part of cluster, in a runing cluster heartbeats are exchanged every 1 second to know the status of other nodes, total 15s of LLT time out + 15s of Gab timeout gives 30s of failover detection which I believe is very prominent from stability & resilience.

To my opininion it would not be wise idea..

 

Gaurav

Divakar_SK
Level 2

Hi Gaurav,

Thanks for your input.

Due to spanning tree problem, network engineer asked to change the value to 60s Sec to avoid cluster failover.

Customer also not intrested this on this failover :(

they are saying the spanning tree issue may take 45Sec to 60Sec to solve.

Could you please confim - what is the default heartbeat timeout value 15 Sec or 30 Sec

Thanks,

Divakar

Anoop_Kumar1
Level 5

Agree with above comments that increasing timeout value is not advisable.

- To avoid failover, freezing SG is option.

- However, in case of LLT completely down, node will go down.

If you are using N/W switches between LLT links, there should be two switches for two High Priority LLT links. And, doing a change on one Switch at one time is advisable.

Having a single switch for all LLT links is again a risk on single poing of failure on LLT links.