Forum Discussion

vcsguest's avatar
vcsguest
Level 2
15 years ago

Unable to start VCS on one Node. Its two node cluster

I have recently installed VCS on two blade 1500. I am unable to start VCS on both machine at the same time. If I start systemA, I noticed VCS started and on systemB unable to start. It depends on which system I am starting first.


2010/10/22 22:16:40 VCS INFO V-16-1-10196 Cluster logger started
2010/10/22 22:16:40 VCS NOTICE V-16-1-11022 VCS engine (had) started
2010/10/22 22:16:40 VCS NOTICE V-16-1-11050 VCS engine version=5.1
2010/10/22 22:16:40 VCS NOTICE V-16-1-11051 VCS engine join version=5.1.00.0
2010/10/22 22:16:40 VCS NOTICE V-16-1-11052 VCS engine pstamp=Veritas-5.1-10/06/10-14:37:00
2010/10/22 22:16:41 VCS NOTICE V-16-1-10114 Opening GAB library
2010/10/22 22:16:50 VCS NOTICE V-16-1-10619 'HAD' starting on: systemA
2010/10/22 22:17:10 VCS INFO V-16-1-10125 GAB timeout set to 30000 ms
2010/10/22 22:17:10 VCS NOTICE V-16-1-11057 GAB registration monitoring timeout set to 200000 ms
2010/10/22 22:17:10 VCS NOTICE V-16-1-11059 GAB registration monitoring action set to log system message
2010/10/22 22:17:17 VCS INFO V-16-1-10077 Received new cluster membership
2010/10/22 22:17:18 VCS NOTICE V-16-1-10112 System (systemA) - Membership: 0x3, DDNA: 0x0
2010/10/22 22:17:18 VCS NOTICE V-16-1-10322 System  (Node '0') changed state from UNKNOWN to INITING
2010/10/22 22:17:18 VCS NOTICE V-16-1-10086 System  (Node '0') is in Regular Membership - Membership: 0x3
2010/10/22 22:17:18 VCS NOTICE V-16-1-10086 System systemA (Node '1') is in Regular Membership - Membership: 0x3
2010/10/22 22:17:18 VCS WARNING V-16-1-50129 Operation 'haclus -modify' rejected as the node is in CURRENT_DISCOVER_WAIT state
2010/10/22 22:17:18 VCS WARNING V-16-1-50129 Operation 'haclus -modify' rejected as the node is in CURRENT_DISCOVER_WAIT state
2010/10/22 22:17:18 VCS NOTICE V-16-1-10453 Node: 0 changed name from: '' to: 'systemB'
2010/10/22 22:17:18 VCS NOTICE V-16-1-10322 System systemB (Node '0') changed state from INITING to RUNNING
2010/10/22 22:17:18 VCS NOTICE V-16-1-10322 System systemA (Node '1') changed state from CURRENT_DISCOVER_WAIT to REMOTE_BUILD
2010/10/22 22:17:19 VCS NOTICE V-16-1-10464 Requesting snapshot from node: 0
2010/10/22 22:17:19 VCS NOTICE V-16-1-10465 Getting snapshot.  snapped_membership: 0x3 current_membership: 0x3 current_jeopardy_membership: 0x0
2010/10/22 22:17:27 VCS NOTICE V-16-1-10181 Group VCShmg AutoRestart set to 1
2010/10/22 22:17:27 VCS INFO V-16-1-10466 End of snapshot received from node: 0.  snapped_membership: 0x3 current_membership: 0x3 current_jeopardy_membership: 0x0
2010/10/22 22:17:27 VCS WARNING V-16-1-10030 UseFence=NONE. Hence do not need fencing
2010/10/22 22:17:27 VCS ERROR V-16-1-10651 Cluster UUID received from snapshot is not matching with one existing on systemA.  VCS Stopping.  Manually restart VCS after configuring correct CID in the cluster.
 

Please help me.

  • I believe that this is VCS 5.1

     

    With this version, a new feature called "Cluster UUID" was introduced.

     

    Please run the following command to configure uuid:

     

    /opt/VRTSvcs/bin/uuidconfig.pl [-rsh] -clus -configure systemA systemB

     

    once this is done, try to start had on both nodes.

  • I believe that this is VCS 5.1

     

    With this version, a new feature called "Cluster UUID" was introduced.

     

    Please run the following command to configure uuid:

     

    /opt/VRTSvcs/bin/uuidconfig.pl [-rsh] -clus -configure systemA systemB

     

    once this is done, try to start had on both nodes.

  • That is exactly how cluster works. The startup sequence is one of the built-in protection methods to prevent split-brain and to ensure that the SAME main.cf is read into memory on all nodes.

    Each node has a copy of main.cf. The first node to start had (INITING) will first check to see if anoter cluster node is already up and running with a valid config (CURRENT_DISCOVER_WAIT), then read it's local main.cf, confirm that it's valid, and reads the config into memory (LOCAL_BUILD):

    INITING -> CURRENT_DISCOVER_WAIT -> LOCAL_BUILD -> RUNNING

    The 2nd node will go though the same process, get notification via GAB that another node is busy reading config into memory, wait for 1st node to complete (RUNNING), and then use the config in memory to build it's own config. If the main.cf was different on the 2nd node, it will be updated with the config that's in memory. Transition states on the 2nd node:

    INITING -> CURRENT_DISCOVER_WAIT -> REMOTE_BUILD -> RUNNING

    This behaviour is described in the VCS admin guide under Cluster and system states -> Examples of system state transitions

    If we know your O/S and VCS version, we can provide the links to the manual. Else, find all manuals here:

    https://sort.symantec.com/documents

     

     

  • Hi,

     

    If rsh is not setup can I run this individually. please let me know.

     

    /opt/VRTSvcs/bin/uuidconfig.pl [-rsh] -clus -configure systemA systemB

  • Hi,

    Can you please provide me some assistance on how to setup rsh on both system.

  • The -rsh option is for environments where ssh isn't configured so rsh has to be used instead (ie: it's an optional argument if you're using rsh instead of the default ssh) - so provided you have ssh set up, you don't need to set up rsh.

  • I got Similar issue and i resolved bu copying the content of  /etc/vx/.uuids/clusuuid from running server to the other server and restart the stack. Please try the same.

    Lineesh.NM