cancel
Showing results for 
Search instead for 
Did you mean: 

Veritas cluster is not starting

Atul_Khanna
Level 2

Hi ,

We have veritas cluster setup  for oracle database

Engine Ver : 5.1.10.0

On Linux Red hat 5.(2 server)

The  Server is undegoing domain migration.

what steps shall be necessary to be done on VCS end.

Thanks

 

 

the VCS is not starting,

 

hastart did not respond.

 

while server rebooted below are the logs that show , Iam not that aquintant with VCS , please advise.

$ /opt/VRTSvcs/bin/hastatus -sum
VCS WARNING V-16-1-10641 IpmHandle::open Cannot create AF_INET6 socket. errno = 97
VCS ERROR V-16-1-10600 Cannot connect to VCS engine
VCS WARNING V-16-1-11046 Local system not available
 

 

log at /var/VRTSvcs/log/engine_A.log

2016/01/24 19:21:15 VCS NOTICE V-16-1-11022 VCS engine (had) started
2016/01/24 19:21:15 VCS NOTICE V-16-1-11050 VCS engine version=5.1
2016/01/24 19:21:15 VCS NOTICE V-16-1-11051 VCS engine join version=5.1.10.0
2016/01/24 19:21:15 VCS NOTICE V-16-1-11052 VCS engine pstamp=5.1.100.000-5.1SP1GA-2010-09-30_23.30.00
2016/01/24 19:21:15 VCS INFO V-16-1-10196 Cluster logger started
2016/01/24 19:21:15 VCS NOTICE V-16-1-10114 Opening GAB library
2016/01/24 19:21:15 VCS NOTICE V-16-1-10619 'HAD' starting on: vdalpxorap002
2016/01/24 19:21:15 VCS INFO V-16-1-51138 Number of processors configured on this system are 32
2016/01/24 19:21:15 VCS WARNING V-16-1-51140 In a multi-CPU system, configure an adequately high value for the ShutdownTimeout attribute. This ensures that when a system panics, its service groups successfully fail over to other systems. For more information, refer to the VCS Administrator's Guide
2016/01/24 19:21:15 VCS WARNING V-16-1-10543 IpmServer::open Cannot create socket errno = 97
2016/01/24 19:21:16 VCS INFO V-16-1-10125 GAB timeout set to 30000 ms
2016/01/24 19:21:16 VCS NOTICE V-16-1-11057 GAB registration monitoring timeout set to 200000 ms
2016/01/24 19:21:16 VCS NOTICE V-16-1-11059 GAB registration monitoring action set to log system message
2016/01/24 19:21:30 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding
2016/01/24 20:23:59 VCS INFO V-16-1-10196 Cluster logger started
2016/01/24 20:23:59 VCS NOTICE V-16-1-11022 VCS engine (had) started
2016/01/24 20:23:59 VCS NOTICE V-16-1-11050 VCS engine version=5.1
2016/01/24 20:23:59 VCS NOTICE V-16-1-11051 VCS engine join version=5.1.10.0
2016/01/24 20:23:59 VCS NOTICE V-16-1-11052 VCS engine pstamp=5.1.100.000-5.1SP1GA-2010-09-30_23.30.00
2016/01/24 20:23:59 VCS NOTICE V-16-1-10114 Opening GAB library
2016/01/24 20:23:59 VCS NOTICE V-16-1-10619 'HAD' starting on: vdalpxorap002
2016/01/24 20:23:59 VCS INFO V-16-1-51138 Number of processors configured on this system are 32
2016/01/24 20:23:59 VCS WARNING V-16-1-51140 In a multi-CPU system, configure an adequately high value for the ShutdownTimeout attribute. This ensures that when a system panics, its service groups successfully fail over to other systems. For more information, refer to the VCS Administrator's Guide
2016/01/24 20:23:59 VCS WARNING V-16-1-10543 IpmServer::open Cannot create socket errno = 97
2016/01/24 20:23:59 VCS INFO V-16-1-10125 GAB timeout set to 30000 ms
2016/01/24 20:23:59 VCS NOTICE V-16-1-11057 GAB registration monitoring timeout set to 200000 ms
2016/01/24 20:23:59 VCS NOTICE V-16-1-11059 GAB registration monitoring action set to log system message
2016/01/24 20:24:14 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding
2016/01/25 00:18:45 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2016/01/25 04:18:46 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2016/01/25 08:18:49 VCS INFO V-16-1-53504 VCS Engine Alive message!!
 

1 ACCEPTED SOLUTION

Accepted Solutions

starflyfly
Level 6
Employee Accredited Certified

Hi, Atul

 

1.

  refer gabconfig manual:

 

     -c Configure the driver for use. Configuring the GAB driver
        enables client registrations and the joining of an
        already seeded group.

       -x Seed control port. This option affords protection from
        pre-existing network partitions. The control port (port
        a) propagates the seed to all configured systems. GAB
        must be seeded to enable the delivery of membership on
        client ports.

 

   -n system_count
        Count of systems in the cluster. A non-zero system count
        auto-seeds the cluster when all systems are present. The
        default is zero, for no auto-seeding.

===

Normally, while gab start, it run: gabconfig -c -nX

like if there is two node:


gabconfig -c -n2

 

But if while one node can't join, gab  will not start another the node .  

if you want manual start gab, use

gabconfig -a -x <<<tell  gab start, without check membership.

 

after gab start, you can use hastart to start   vcs.

 

2.  when oracle start outside  vcs, vcs start then, first it will probe all resources on each node, if it found oracle already online, it will mark oracle and related resources as online on the node, no other action required. It will not impact db .

 

 

View solution in original post

5 REPLIES 5

starflyfly
Level 6
Employee Accredited Certified

Hi,

 

Here is reason why vcs can't start:

2016/01/24 20:24:14 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding

..

that's mean vcs didn't find all membership ,so it won't start, since some membership may running, but this node don't know that, if it start vcs, may cause data corrupt and split brain.

If you confirm vcs can start from this node, do following action:

1. /opt/VRTS/bin/gabconfig -a -x

2. hastart

 

Atul_Khanna
Level 2

Hi,

 

Thanks a lot for  the response .

 

as iam quite new to VCS just few questions.

 

1. what exactly would gabconfig -a -x do , a breif description if possible because googling  didnt give exact info for it.

2.what user do I need to run command as.

3.when the servers rebooted, and we lost VCS , the oracle database were started manually , out of cluster (ip were enabled at os level), what would happen now when the VCS started , will it impact the databases ?? as this is on production.

 

Thanks again.

bobthesungeek76
Level 2

man -M /opt/VRTS/man gabconfig

starflyfly
Level 6
Employee Accredited Certified

Hi, Atul

 

1.

  refer gabconfig manual:

 

     -c Configure the driver for use. Configuring the GAB driver
        enables client registrations and the joining of an
        already seeded group.

       -x Seed control port. This option affords protection from
        pre-existing network partitions. The control port (port
        a) propagates the seed to all configured systems. GAB
        must be seeded to enable the delivery of membership on
        client ports.

 

   -n system_count
        Count of systems in the cluster. A non-zero system count
        auto-seeds the cluster when all systems are present. The
        default is zero, for no auto-seeding.

===

Normally, while gab start, it run: gabconfig -c -nX

like if there is two node:


gabconfig -c -n2

 

But if while one node can't join, gab  will not start another the node .  

if you want manual start gab, use

gabconfig -a -x <<<tell  gab start, without check membership.

 

after gab start, you can use hastart to start   vcs.

 

2.  when oracle start outside  vcs, vcs start then, first it will probe all resources on each node, if it found oracle already online, it will mark oracle and related resources as online on the node, no other action required. It will not impact db .

 

 

View solution in original post

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

gabconfig -a -x will perform a local seed and enable vcs to start.

Important that you check heartbeat connections and ensure that they are connected as per config in /etc/llttab on each node.

If the nodes are disconnected via the network but both connected to SAN (shared) storage, and no I/O fencing configured, it is extremely important that one of the nodes has VCS startup removed from automatic startup and remain shut down until network and heartbeats are reconnected.
Best to disconnect this node from SAN as well.

Confirm heartbeat connections at OS-level. (plumb temporary IP addresses on link 0 on both nodes and see if you can ping one another; repeat for link 1)
Start LLT, then GAB.

Confirm connectivity with 'llttconfig -nvv' and 'gabconfig -a' before starting vcs.