cancel
Showing results for 
Search instead for 
Did you mean: 

VCS cluster

Ayan1987
Level 3

Hi Team,

I have one 8+1 veritas cluster and node2 was removed from the cluster but it was not removed properly. Now we are trying to join Node2 in the cluster and when it is joining the same cluster but we are facing issue. We get fencing key error and still we see some entries of Node2 in the existing cluster. Need help emergency.

/etc/VRTSvcs/conf/config# grep MMTE02 main.cf

                NodeList = { MMTE02, MMTE03, MMTE01, MMTE04, MMTE05, MMTE06,

                NodeList = { MMTE02, MMTE03, MMTE01, MMTE04, MMTE05, MMTE06,

                NodeList = { MMTE02, MMTE03, MMTE01, MMTE04, MMTE05, MMTE06,

                NodeList = { MMTE02, MMTE03, MMTE01, MMTE04, MMTE05, MMTE06,

                NodeList = { MMTE02, MMTE03, MMTE01, MMTE04, MMTE05, MMTE06,

                NodeList = { MMTE02, MMTE03, MMTE01, MMTE04, MMTE05, MMTE06,

                NodeList = { MMTE02, MMTE03, MMTE01, MMTE04, MMTE05, MMTE06,

                NodeList = { MMTE02, MMTE03, MMTE01, MMTE04, MMTE05, MMTE06,

                CVMNodeId = { MMTE02 = 0, MMTE03 = 1, MMTE01 = 2, MMTE04 = 3,

                NodeList = { MMTE02, MMTE03, MMTE01, MMTE04, MMTE05, MMTE06,

                NodeList = { MMTE02, MMTE03, MMTE01, MMTE04, MMTE05, MMTE06,

 

 

11 REPLIES 11

bhoms
Level 4
Employee

Assuming that you are getting error when you are adding node using installer script ?

 

can u attach main.cf here ? I believe that node MMTE02 need to be removed from main.cf on remaining cluster nodes.

Follow the documentation of adding and removing node from CVM/CFS.

 

However, you can do something like below...

e.g.

Delete the departing node from the SystemList of service groups grp3 and grp4.(used names of group for example)

# hagrp -modify grp3 SystemList -delete MMTE02
# hagrp -modify grp4 SystemList -delete MMTE02

In the same manner, remove the old node from resource as

# hares -modify cfsmount1 NodeList -delete MMTE02

(Assumed that CFSmount res name is cfsmount1)

# hagres -modify cvm_clus CVMNodeId -delete MMTE02

(Assumed that cvm_clus resource still have CVMNodeId attribute with the node name)

 

Hope this helps.

 

 

 

 

I am sharing my main.cf could you please verify, that would be very helpful

bhoms
Level 4
Employee

Apologies for late response as I was out.

Here are the below commands need to execute on any one of the cluster nodes..

#haconf -makerw

#hares -modify cfsmount7 NodeList -delete MMTE02
#hares -modify cfsmount8 NodeList -delete MMTE02
#hares -modify cfsmount3 NodeList -delete MMTE02
#hares -modify cfsmount4 NodeList -delete MMTE02
#hares -modify cfsmount9 NodeList -delete MMTE02
#hares -modify cfsmount10 NodeList -delete MMTE02
#hares -modify cfsmount1 NodeList -delete MMTE02
#hares -modify cfsmount5 NodeList -delete MMTE02
#hares -modify cfsmount6 NodeList -delete MMTE02
#hares -modify cfsmount2 NodeList -delete MMTE02

#hares -modify cvm_clus CVMNodeId -delete MMTE02

#haconf -dump -makero

---------------------------------------------------------------------------

Attached is the main.cf  after the execution of above commands.

You need to execute the commands related to cfsmount first then only you will be able to remove the node from CVM cluster resource, so follow the same sequence as I have mentioned above..

Hope this helps.

 

I thought of the below steps :

  1. haconf –makerw
  2. hares -modify cfsmount1 NodeList -delete MMTE02
  3. hares -modify cfsmount2 NodeList -delete MMTE02
  4. hares -modify cfsmount3 NodeList -delete MMTE02
  5. hares -modify cfsmount4 NodeList -delete MMTE02
  6. hares -modify cfsmount5 NodeList -delete MMTE02
  7. hares -modify cfsmount6 NodeList -delete MMTE02
  8. hares -modify cfsmount7 NodeList -delete MMTE02
  9. hares -modify cfsmount8 NodeList -delete MMTE02
  10. hares -modify cfsmount9 NodeList -delete MMTE02
  11. hares -modify cfsmount10 NodeList -delete MMTE02
  12. hares -modify cvm_clus CVMNodeId -delete MMTE02
  13. haconf -dump –makero
  14. /etc/init.d/gab stop
  15. vi /etc/gabtab (/sbin/gabconfig -c –n8) 
  16. /et/init.d/gab start

Please let me know if I need to do any change in gab because I have now 8 node but my gab shows like below :

MMTE01:~# cat /etc/gabtab
/sbin/gabconfig -c -n9
MMTE01:~# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 245b808 membership ;12345678
Port b gen 245b81a membership ;12345678
Port d gen 245b80b membership ;12345678
Port f gen 245b826 membership ;12345678
Port h gen 245b81c membership ;12345678
Port m gen 245b81e membership ;12345678
Port u gen 245b824 membership ;12345678
Port v gen 245b820 membership ;12345678
Port w gen 245b822 membership ;12345678
Port y gen 245b81f membership ;12345678
MMTE01:~#

bhoms
Level 4
Employee

You can not stop GAB/LLT in CVM cluster. GAB would not stop directly without stopping the other ports(h,u,v,w,f etc..) that are dependent on GAB.

You can of course change the gabtab anytime on all the nodes for 8 nodes as /sbin/gabconfig -c -n8 (no need to stop anything).

 

 

Okay so I can edit it? and one more thing when I do vxdg list I see all the DG is in MMTE02 control which is not there in the cluster 

MMTE01:~# vxdg list
NAME STATE ID
Archive2 enabled,shared 1441526691.263.MMTE02
Archive enabled,shared 1436696266.52.MMTE02
Archive1 enabled,shared 1441524866.181.MMTE02
Archive3 enabled,shared 1441530387.319.MMTE02
Archive4 enabled,shared 1441532169.349.MMTE02
Datastorage enabled,shared 1436691677.39.MMTE02
mmdatadg enabled,shared 1435841790.12.MMTE02
mmdata1dg enabled,shared 1436954191.46.MMTE02
mmdata2dg enabled,shared 1437387085.52.MMTE02
mmdbdg enabled,shared       1435841795.15.MMTE02

MMTE02 is not in the cluster any more so how to resolve this?

bhoms
Level 4
Employee

Yes, you can edit the file /etc/gabtab anytime after cluster is started as this file is referred during loading of the GAB service. If you need to make nodes 0 to 7 then  you need to modify /etc/llthosts on each node with correct number and restart whole cluster.(so downtime need to schedule)

 

vxdg list output Lists the contents of disk groups. If diskgroups are specified, a longer format provides additional status and configuration data.

The last column of this output shows the DG id and DG id shows the hostname where you have created the Diskgroup initially. You can see same DG id in the output of "#vxdisk list diskname |grep group"(use any of the disk in the diskgroup)

That is nothing to do with host. You should not be worried about it. Even if you have removed that host from cluster, DG will not look for that host while importing. It is just a DG ID which has hostname in it.

Hope this helps.

 

So if the same host wants to join the cluster with the same name MMTE02 will it creat cluster panic? 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

If you want to replace failed node using the same nodename, then please follow these steps posted by @mikebounds :

https://vox.veritas.com/t5/Cluster-Server/Cluster-fails-after-solaris-server-is-brought-online-after...

I joined MMTE02 in the cluster however everytime the node reboots I need to run vxdctl enable to see the LUNs

This should happen auomatically please help 

bhoms
Level 4
Employee

I believe that we need to start a separate thread for this issue as original issue was to remove the remaining entries of the node that was taken out of cluster.

However, it looks like that when VXVM is starting after reboot, the Luns are not available at OS level for VXVM to scan.

You can enable the debug option of VXVM in the sysboot file (depends on the version of SF / Infoscale) and collect the debug messages file .

https://www.veritas.com/support/en_US/article.000004820   - Refer to this technote to enable vxvm logging during boot time.

Hope  this helps.