Forum Discussion

Atul_Khanna's avatar
9 years ago

Veritas cluster is not starting

Hi ,

We have veritas cluster setup  for oracle database

Engine Ver : 5.1.10.0

On Linux Red hat 5.(2 server)

The  Server is undegoing domain migration.

what steps shall be necessary to be done on VCS end.

Thanks

 

 

the VCS is not starting,

 

hastart did not respond.

 

while server rebooted below are the logs that show , Iam not that aquintant with VCS , please advise.

$ /opt/VRTSvcs/bin/hastatus -sum
VCS WARNING V-16-1-10641 IpmHandle::open Cannot create AF_INET6 socket. errno = 97
VCS ERROR V-16-1-10600 Cannot connect to VCS engine
VCS WARNING V-16-1-11046 Local system not available
 

 

log at /var/VRTSvcs/log/engine_A.log

2016/01/24 19:21:15 VCS NOTICE V-16-1-11022 VCS engine (had) started
2016/01/24 19:21:15 VCS NOTICE V-16-1-11050 VCS engine version=5.1
2016/01/24 19:21:15 VCS NOTICE V-16-1-11051 VCS engine join version=5.1.10.0
2016/01/24 19:21:15 VCS NOTICE V-16-1-11052 VCS engine pstamp=5.1.100.000-5.1SP1GA-2010-09-30_23.30.00
2016/01/24 19:21:15 VCS INFO V-16-1-10196 Cluster logger started
2016/01/24 19:21:15 VCS NOTICE V-16-1-10114 Opening GAB library
2016/01/24 19:21:15 VCS NOTICE V-16-1-10619 'HAD' starting on: vdalpxorap002
2016/01/24 19:21:15 VCS INFO V-16-1-51138 Number of processors configured on this system are 32
2016/01/24 19:21:15 VCS WARNING V-16-1-51140 In a multi-CPU system, configure an adequately high value for the ShutdownTimeout attribute. This ensures that when a system panics, its service groups successfully fail over to other systems. For more information, refer to the VCS Administrator's Guide
2016/01/24 19:21:15 VCS WARNING V-16-1-10543 IpmServer::open Cannot create socket errno = 97
2016/01/24 19:21:16 VCS INFO V-16-1-10125 GAB timeout set to 30000 ms
2016/01/24 19:21:16 VCS NOTICE V-16-1-11057 GAB registration monitoring timeout set to 200000 ms
2016/01/24 19:21:16 VCS NOTICE V-16-1-11059 GAB registration monitoring action set to log system message
2016/01/24 19:21:30 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding
2016/01/24 20:23:59 VCS INFO V-16-1-10196 Cluster logger started
2016/01/24 20:23:59 VCS NOTICE V-16-1-11022 VCS engine (had) started
2016/01/24 20:23:59 VCS NOTICE V-16-1-11050 VCS engine version=5.1
2016/01/24 20:23:59 VCS NOTICE V-16-1-11051 VCS engine join version=5.1.10.0
2016/01/24 20:23:59 VCS NOTICE V-16-1-11052 VCS engine pstamp=5.1.100.000-5.1SP1GA-2010-09-30_23.30.00
2016/01/24 20:23:59 VCS NOTICE V-16-1-10114 Opening GAB library
2016/01/24 20:23:59 VCS NOTICE V-16-1-10619 'HAD' starting on: vdalpxorap002
2016/01/24 20:23:59 VCS INFO V-16-1-51138 Number of processors configured on this system are 32
2016/01/24 20:23:59 VCS WARNING V-16-1-51140 In a multi-CPU system, configure an adequately high value for the ShutdownTimeout attribute. This ensures that when a system panics, its service groups successfully fail over to other systems. For more information, refer to the VCS Administrator's Guide
2016/01/24 20:23:59 VCS WARNING V-16-1-10543 IpmServer::open Cannot create socket errno = 97
2016/01/24 20:23:59 VCS INFO V-16-1-10125 GAB timeout set to 30000 ms
2016/01/24 20:23:59 VCS NOTICE V-16-1-11057 GAB registration monitoring timeout set to 200000 ms
2016/01/24 20:23:59 VCS NOTICE V-16-1-11059 GAB registration monitoring action set to log system message
2016/01/24 20:24:14 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding
2016/01/25 00:18:45 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2016/01/25 04:18:46 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2016/01/25 08:18:49 VCS INFO V-16-1-53504 VCS Engine Alive message!!
 

  • Hi, Atul

     

    1.

      refer gabconfig manual:

     

         -c Configure the driver for use. Configuring the GAB driver
            enables client registrations and the joining of an
            already seeded group.

           -x Seed control port. This option affords protection from
            pre-existing network partitions. The control port (port
            a) propagates the seed to all configured systems. GAB
            must be seeded to enable the delivery of membership on
            client ports.

     

       -n system_count
            Count of systems in the cluster. A non-zero system count
            auto-seeds the cluster when all systems are present. The
            default is zero, for no auto-seeding.

    ===

    Normally, while gab start, it run: gabconfig -c -nX

    like if there is two node:


    gabconfig -c -n2

     

    But if while one node can't join, gab  will not start another the node .  

    if you want manual start gab, use

    gabconfig -a -x <<<tell  gab start, without check membership.

     

    after gab start, you can use hastart to start   vcs.

     

    2.  when oracle start outside  vcs, vcs start then, first it will probe all resources on each node, if it found oracle already online, it will mark oracle and related resources as online on the node, no other action required. It will not impact db .

     

     

5 Replies

  • Hi,

     

    Here is reason why vcs can't start:

    2016/01/24 20:24:14 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding

    ..

    that's mean vcs didn't find all membership ,so it won't start, since some membership may running, but this node don't know that, if it start vcs, may cause data corrupt and split brain.

    If you confirm vcs can start from this node, do following action:

    1. /opt/VRTS/bin/gabconfig -a -x

    2. hastart

     

  • Hi,

     

    Thanks a lot for  the response .

     

    as iam quite new to VCS just few questions.

     

    1. what exactly would gabconfig -a -x do , a breif description if possible because googling  didnt give exact info for it.

    2.what user do I need to run command as.

    3.when the servers rebooted, and we lost VCS , the oracle database were started manually , out of cluster (ip were enabled at os level), what would happen now when the VCS started , will it impact the databases ?? as this is on production.

     

    Thanks again.

  • Hi, Atul

     

    1.

      refer gabconfig manual:

     

         -c Configure the driver for use. Configuring the GAB driver
            enables client registrations and the joining of an
            already seeded group.

           -x Seed control port. This option affords protection from
            pre-existing network partitions. The control port (port
            a) propagates the seed to all configured systems. GAB
            must be seeded to enable the delivery of membership on
            client ports.

     

       -n system_count
            Count of systems in the cluster. A non-zero system count
            auto-seeds the cluster when all systems are present. The
            default is zero, for no auto-seeding.

    ===

    Normally, while gab start, it run: gabconfig -c -nX

    like if there is two node:


    gabconfig -c -n2

     

    But if while one node can't join, gab  will not start another the node .  

    if you want manual start gab, use

    gabconfig -a -x <<<tell  gab start, without check membership.

     

    after gab start, you can use hastart to start   vcs.

     

    2.  when oracle start outside  vcs, vcs start then, first it will probe all resources on each node, if it found oracle already online, it will mark oracle and related resources as online on the node, no other action required. It will not impact db .

     

     

  • gabconfig -a -x will perform a local seed and enable vcs to start.

    Important that you check heartbeat connections and ensure that they are connected as per config in /etc/llttab on each node.

    If the nodes are disconnected via the network but both connected to SAN (shared) storage, and no I/O fencing configured, it is extremely important that one of the nodes has VCS startup removed from automatic startup and remain shut down until network and heartbeats are reconnected.
    Best to disconnect this node from SAN as well.

    Confirm heartbeat connections at OS-level. (plumb temporary IP addresses on link 0 on both nodes and see if you can ping one another; repeat for link 1)
    Start LLT, then GAB.

    Confirm connectivity with 'llttconfig -nvv' and 'gabconfig -a' before starting vcs.