cancel
Showing results for 
Search instead for 
Did you mean: 

Cluster is not coming up after reboot

sourabhcredible
Level 2

Hi All,

I have installed Solaris with Veritas cluster on a blade Server. My cluster is working fine but when i give reboot to both the systems my cluster is not getting started by own I need to start it using hastart on all cluster nodes. if anybody knows any solution that will be really helpful....

 

 

Thanks & Regards,

3 REPLIES 3

Wally_Heim
Level 6
Employee

Hi Sourabhcredible,

I'm more of a windows guy, but it sounds like vcs is not set to automatically start at the given run level that you are booting to.  I know that Solaris uses startup scripts to determine what services to start and what to stop when changing run levels.  It just sounds like VCS startup scripts are not there.

 

Again, I'm a windows guy not a Unix guy so I'm not sure exactly how VCS services are started on Solaris.  I'm hoping that someone else familar with VCS on Solaris will speak up and correct me if I'm wrong here.

 

Thanks,

Wally

AlanTLR
Level 5

Sourabhcredible,

    You will want to check that your gabtab has the appropriate number of nodes in the settings.  If you have two nodes, you should see a "-n 2".  In my example below, I have 5 nodes in my cluster.

 ~ $ cat /etc/gabtab
/sbin/gabconfig -c -n5 

   If you're running the latest VCS, you will want to check that the VCS services are enabled.  If any are disabled, you'll want to enable them with 'svcadm enable'.  In the example below, I have 3 services disabled:

 
~ $ svcs -xv llt gab vcs vxfen
svc:/system/llt:default (Veritas Low Latency Transport (LLT) Init service)
 State: online since Thu Sep 22 16:40:50 2011
   See: man -M /opt/VRTSllt/man/man1m/ -s 1M lltconfig
   See: /var/svc/log/system-llt:default.log
Impact: None.

svc:/system/gab:default (Veritas Group Membership and Atomic Broadcast (GAB) Init service)
 State: disabled since Thu Sep 22 16:37:38 2011
Reason: Disabled by an administrator.
   See: http://sun.com/msg/SMF-8000-05
   See: man -M /opt/VRTS/man/man1m/ -s 1M gabconfig
Impact: 2 dependent services are not running:
        svc:/system/vcs:default
        svc:/system/vxfen:default

svc:/system/vcs:default (Veritas Cluster Server (VCS) Init service)
 State: offline since Thu Sep 22 16:37:38 2011
Reason: Service svc:/system/gab:default is disabled.
   See: http://sun.com/msg/SMF-8000-GE
  Path: svc:/system/vcs:default
          svc:/system/gab:default
   See: man -M /opt/VRTS/man/man1m/ -s 1M vcsconfig
Impact: This service is not running.

svc:/system/vxfen:default (Veritas I/O Fencing (VXFEN) Init service)
 State: offline since Thu Sep 22 16:37:38 2011
Reason: Service svc:/system/gab:default is disabled.
   See: http://sun.com/msg/SMF-8000-GE
  Path: svc:/system/vxfen:default
          svc:/system/gab:default
   See: man -M /opt/VRTS/man/man1m/ -s 1M vxfenconfig
Impact: 1 dependent service is not running:
        svc:/system/vcs:default

 

If you're running an older version of VCS (5.0), you will want to check your startup scripts in /etc/rc2.d:

 ~ $ ls -al /etc/rc*.d/*vcs /etc/rc*.d/*llt /etc/rc*.d/*vxfen /etc/rc*.d/*gab
-rwxr--r--   3 root     sys         2414 Sep 26  2006 /etc/rc0.d/K10vcs
-rwxr--r--   3 root     sys         4731 Sep 18  2006 /etc/rc0.d/K15vxfen
-rwxr--r--   3 root     sys         1979 Nov 10  2005 /etc/rc0.d/K49gab
-rwxr--r--   2 root     sys         1539 Sep 29  2005 /etc/rc2.d/S70llt
-rwxr--r--   3 root     sys         1979 Nov 10  2005 /etc/rc2.d/S92gab
-rwxr--r--   3 root     sys         4731 Sep 18  2006 /etc/rc2.d/S97vxfen
-rwxr--r--   3 root     sys         2414 Sep 26  2006 /etc/rc3.d/S99vcs

 

If all that checks OK, you will want to check your LLT and GAB statuses:

 /tmp $ lltstat -vvn | head -15
LLT node information:
    Node                 State    Link  Status  Address
     0 hsweb1            OPEN
                                  nxge2   UP      00:14:4F:6D:79:EA
                                  e1000g2   UP      00:14:4F:86:4A:68
   * 1 hsweb2            OPEN
                                  nxge2   UP      00:14:4F:6D:D6:2A
                                  e1000g2   UP      00:14:4F:81:B9:58
     2                   CONNWAIT
                                  nxge2   DOWN
                                  e1000g2   DOWN
     3                   CONNWAIT
                                  nxge2   DOWN
                                  e1000g2   DOWN
     4                   CONNWAIT
landuca@hsweb2 11:00:49
/tmp $ sudo gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   8cdb05 membership 01
Port b gen   8cdb09 membership 01
Port h gen   8cdb79 membership 01
landuca@hsweb2 11:00:53
/tmp $ sudo lltconfig -a list
Link 0 (nxge2):
  Node   0 hsweb1     :   00:14:4F:6D:79:EA
  Node   1 hsweb2     :   00:14:4F:6D:D6:2A  permanent

Link 1 (e1000g2):
  Node   0 hsweb1     :   00:14:4F:86:4A:68
  Node   1 hsweb2     :   00:14:4F:81:B9:58  permanent


Check console messages for any failures within the startup procedure.

mikebounds
Level 6
Partner Accredited

If you can start VCS using hastart, then I can't see this being a GAB issue as running hastart would not resolve the GAB seeding.  So issue is probably that hastart is disabled on boot up, so check setting as AlanTLR says above.

The other thing you can check is the engine log as if hastart is being run, then this will be in the engine log regardless of whether had fails imediately or waits for fencing or GAB.

Mike