Solved: Cluster node not starting as one node is down.

Saikishore_Vema · ‎05-25-2015

Hi Experts,

Need help in understanding cluster of 1+1 , my master node is down due an OS issue and now the slave node should ideal takeover but i see it is not starting.

Attached is the Engine.log where i can see some "ShutdownTimeout" , please let me know how to increase it.( I am not able to use hastatus it says veritas is not started, so i cannot execute hares to increate the shutdowntimeout).

1. In a 1+1 cluster if master node is down , i need slave to run as normal. what should i do for this ?

2. how to set ShutdownTimeout ?

3. Does we have any bugs in 5.1 SP1RP4 ?

Wally_Heim · ‎05-26-2015

Hi Saikishore,

It looks like your cluster service (HAD) is not running because the cluster is not seeded. To seed you need run the following command:

gabconfig -c -x

As for the ShutdownTimeout value:

haconf -makerw

hasys -modify <servername> ShutdownTimeout <value>

haconf -dump -makero

The hasys command needs to be run for each node in the cluster. The default value should be 300 or 600 depending on the version of the product.

Thank you,

Wally

View solution in original post

Wally_Heim · ‎05-26-2015

Hi Saikishore,

It looks like your cluster service (HAD) is not running because the cluster is not seeded. To seed you need run the following command:

gabconfig -c -x

As for the ShutdownTimeout value:

haconf -makerw

hasys -modify <servername> ShutdownTimeout <value>

haconf -dump -makero

The hasys command needs to be run for each node in the cluster. The default value should be 300 or 600 depending on the version of the product.

Thank you,

Wally

starflyfly · ‎05-27-2015

Hi, Saik

1. since one node down, and gab need all membership found before go on, so you need :

gabconfig -c -x

skip membership wait to make gab go on start.

2. same operation as Wally.

refer https://iconnect-symwise.symantec.com/infocenter/index?page=content&id=TECH152750&locale=en_US

to check this value.

3. after 5.1sp1rp4, no major patch release for this version. So it's ok.

Sunil_Yadav · ‎05-27-2015

Hi Saikishore,

1. Regarding V-16-1-11306("Did not receive cluster membership, manual intervention may be needed for seeding") error.

Wally and ‘startflyfly’ correctly guided you with GAB’s manual seeding(gabconfig –c –n <count> number of nodes for auto-seed). In your particular case, <count> should be 1. Thereafter, VCS should successfully start on slave node.

2. Regarding ShutdownTimeout warning.

During startup, if VCS detects >1 CPUs, it logs following warning.

"In a multi-CPU system, configure an adequately high value for the ShutdownTimeout attribute. This ensures that when a system panics, its service groups successfully fail over to other systems. For more information, refer to the VCS Administrator's Guide". It is a statutory warning.

After HAD(VCS process) exits, if GAB exits within the time designated in the ShutdownTimeout attribute, the remaining systems recognize this as a reboot and fail over service groups from the departed system. For systems that run several applications, consider increasing the value of the ShutdownTimeout attribute.

3. Why slave node didn’t take over?

Just with log snapshot shared, it is difficult to RCA. ShutdownTimeout behavior could be a reason, but it is not the only reason. Please share config file and logs for detailed/exact RCA of issues.

Thanks & Regards,

Sunil Y

VOX

Cluster node not starting as one node is down.