04-21-2014 06:26 AM
Hi.
Solaris10 SPARC 5.0MP3 RP5.
llt gab vxfen and vcs is not starting at boot in Solaris10.
During the boot i see the bellow message on the console.
-------------------------------------------------------------------------------------------
VxVM sysboot INFO V-5-2-3390 Starting restore daemon...
LLT INFO V-14-1-10009 LLT Protocol available
GAB INFO V-15-1-20021 GAB available
----------------------------------------------------------------------------------------------
But the services are not up.
# /etc/init.d/llt status
LLT: is loaded but not configured.
# /etc/init.d/gab status
GAB: module not configured
But if i issue the command explicitly then they will start.
# /etc/init.d/llt start
Starting LLT...
Starting LLT done.
# /etc/init.d/llt status
LLT: is loaded and configured.
# /etc/init.d/gab start
Starting GAB...
Starting GAB done.
# /etc/init.d/gab status
GAB: module is configured
Now here i am facing problem with vxfen.
In the log i see the message "VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying..."
I tried sevaral times to start vxfen, but no luck.
But if i do "vxfenconfig -c" then it will come up.
# /sbin/vxfenconfig -c
VXFEN vxfenconfig NOTICE Driver will use SCSI-3 compliant disks.
And now the cluster is UP with "hastart".
I have to do this all the time in all the nodes whenever nodes/cluster reboots.
Someone please suggest what could be the issue and why the cluster services are not running at boot.
Thanks & Regards,
Shashi Kanth.
04-21-2014 06:35 AM
04-21-2014 09:49 AM
Could you try increasing SLEEP_INTERVAL in /sbin/vxfen-startup as mentioned in this technote:
http://www.symantec.com/business/support/index?page=content&id=TECH186884
04-21-2014 08:17 PM
Hi Shashi,
For the fencing issue, I think you didn't configure it properly with main.cf, vxfendg, vxfentab and vxfenmode.
For the LLT / GAB issue, I think it may be related with Solaris system service configure status. You can check Solaris service configured status with "svcs -l <service>"
LLT enabled status should be like:
# svcs -l llt
fmri svc:/system/llt:default
name Veritas Low Latency Transport (LLT) Init service
enabled true <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< enabled status should be "true"
state online <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< current status
next_state none
state_time Mon Dec 23 22:19:46 2013
logfile /var/svc/log/system-llt:default.log
restarter svc:/system/svc/restarter:default
dependency require_all/none svc:/system/filesystem/local (online)
dependency optional_all/none svc:/network/initial (online)
dependency optional_all/none svc:/network/routing/ndp:default (disabled)
04-21-2014 08:18 PM
To enable/disable Solaris services:
svcadm enable llt
svcadm disable llt
This will enable/disable service auto start/stop when OS booting.
04-21-2014 11:50 PM
There is no issue with the fencing in the cluster.
# /sbin/vxfenadm -d
I/O Fencing Cluster Information:
================================
Fencing Protocol Version: 201
Fencing Mode: SCSI3
Fencing SCSI3 Disk Policy: dmp
Cluster Members:
0 (hyi01sehost85.ind.hp.com)
* 1 (hyi01sehost83.ind.hp.com)
RFSM State Information:
node 0 in state 8 (running)
node 1 in state 8 (running)
# cat /etc/VRTSvcs/conf/config/main.cf
include "types.cf"
cluster SF50MP3Sol10 (
UserNames = { admin = gmnFmhMjnInnLvnHmk }
Administrators = { admin }
UseFence = SCSI3
)
system "hyi01sehost83.ind.hp.com" (
)
system "hyi01sehost85.ind.hp.com" (
)
Now i changed the SLEEP_INTERVAL parameter from 5 to 25 in /sbin/vxfen-startup file.
# cat /sbin/vxfen-startup | grep SLEEP_INTERVAL
SLEEP_INTERVAL=25
Now i rebooted both machines. But still i see the same issue.
# /etc/init.d/llt status
LLT: is loaded but not configured.
# /etc/init.d/gab status
GAB: module not configured
Now if i start all services manually then it will come up.
04-21-2014 11:51 PM
One point to inform is the SFM commands doesn't working for VCS.
# svcadm enable llt
svcadm: Pattern 'llt' doesn't match any instances
# svcs -l llt
svcs: Pattern 'llt' doesn't match any instances
04-22-2014 12:47 AM
Fencing is properly configured in the cluster.
I have increased the SLEEP_INTERVAL parameter in the file /sbin/vxfen-startup from 5 to 25, and even in /etc/init.d/vcs file i have added " sleep 180" as per the note http://www.symantec.com/business/support/index?page=content&id=TECH186884, but no luck.
In one cluster node i see the messages like bellow.
VxFEN driver not configured. Retrying...
2014/04/22 13:12:22 VCS CRITICAL V-16-1-10031 VxFEN driver not configured. VCS Stopping. Manually restart VCS after configuring fencing
Retry limit of 12 exhausted trying for vxvm-recover to come up. Giving up.
I found the issue could be with VxVM which is not starting at boot properly. I don't see ant VxVM services were running after boot.
On the console, during boot, i see the messages like bellow.
VxVM sysboot INFO V-5-2-3409 starting in boot mode...
VxVM sysboot INFO V-5-2-3390 Starting restore daemon...
But after boot up, i don't see any VxVM related services were running.
04-22-2014 01:27 AM
Are you saying that you don't find vxconfigd running ? are you able to execute any vx commands once server comes up ? do you see vxconfigd starting after sometime ?
though coordinator diskgroup is not imported but still the fencing module would be verifying the disks in coordinator diskgroup defined in /etc/vxfentab
can you paste below from both the nodes
# vxdisk -o alldgs list | grep -i fen (if you have given any other name for fencing dg, show us the disks)
# cat /etc/vxfentab
# cat /etc/vxfenmode
# cat /etc/vxfendg
also for a note, I would suggest to remove server names when you paste outputs (for your own security)
G
04-22-2014 01:46 AM
Hi,
check if following file exist,
ls -l /etc/vx/reconfig.d/state.d/install-db
If yes, rm this file, restart server, test again.
04-22-2014 01:49 AM
As the vxvm is starting in boot mode I would believe that install-db is not there
G
04-22-2014 02:33 AM
Now i manually brough the cluster by starting all services manually.
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A hyi01sehost83.ind.hp.com RUNNING 0
A hyi01sehost85.ind.hp.com RUNNING 0
# /sbin/vxfenadm -d
I/O Fencing Cluster Information:
================================
Fencing Protocol Version: 201
Fencing Mode: SCSI3
Fencing SCSI3 Disk Policy: dmp
Cluster Members:
* 0 (hyi01sehost85.ind.hp.com)
1 (hyi01sehost83.ind.hp.com)
RFSM State Information:
node 0 in state 8 (running)
node 1 in state 8 (running)
Now i rebooted all nodes.
# reboot
Now i see the trouble again.
# /etc/init.d/llt status
LLT: is loaded but not configured.
# /etc/init.d/gab status
GAB: module not configured
# /sbin/vxfenadm -d
VXFEN vxfenadm ERROR V-11-2-1115 Local node is not a member of cluster!
# ps -ef | grep vx
root 54 1 0 14:53:11 ? 0:01 vxconfigd -x syslog -m boot
root 1977 1834 0 15:02:41 pts/1 0:00 grep vx
root 674 1 0 14:53:23 ? 0:01 /sbin/vxesd
And now if i start all services manaully then they wll come up after sevaral tries.
04-22-2014 02:45 AM
Retry limit of 12 exhausted trying for vxvm-recover to come up. Giving up.
04-22-2014 09:24 AM
It looks like you have an issue with LLT not starting as LLT starts first so even if there is a problem with fencing, llt should still start.
As you are using an old version of VCS, then it looks like it is using the old /etc/init.d scripts rather than svcs.
If llt starts manually, then LLT config files must be ok so issue is probably /etc/init.d/llt is not being called by boot process or it is being called too soon (I am not sure how Solaris integrates svcs with legacy /etc/init.d scripts to make sure they start in the right order)
I would make a copy of /etc/init.d/llt and then edit to add something like:
echo "LLT call script called at: "`date` >> /var/tmp/llt.log
and edit line that says "lltconfig -c" to:
lltconfig -c >> /var/tmp/llt.log 2>&1
and then check /var/tmp/llt.log after booting.
Mike
05-27-2014 03:07 AM
Not sure if you have fixed or resolved this issue, but I have seen this if your private links are using bonded interfaces.
IF you are using bonded interfaces then the problem is the the bonded interfaces are not available when LLT starts and then as a result GAB want start and neither will VXFEN.
You can check /var/log/boot.log and you will see something like bonded interface not available.
If
05-27-2014 03:09 AM
Oops sorry, I didn't read the post properly. Bonded interfaces are only available in LINUX. Please ignore my post :)
05-27-2014 03:11 AM
Please ignore my post. If I had of read the post cleary I would have seen your running Solaris. Bonded interfaces are in LUNIX
05-27-2014 07:58 AM
VCS 5.0MP3RP5 do not support SMF for service management. It uses rc script.
Could you check whether all the rc sctipts are in place for the VRSTllt, VRTSgab and VRTSvcs packages?
Following commands will tell you wether any packaged RC scripts are missing or not.
#pkgchk VRTSllt
#pkgchk VRTSgab
#pkgchk VRTSvxfen
#pkgchk VRTSvcs
Ideally /etc/rc2.d/S92gab must be a link to /etc/init.d/gab, /etc/rc2.d/S70llt link to /etc/init.d/llt and /etc/rc2.d/S97vxfen link to /etc/init.d/vxfen.
Ensure that these rc sctipts in place.
If all the above RC scripts in place and still service is not starting, then it is possible that system is not going through run level 2 in your boot process. If that is the case, creates RC script links to the run level you are booting into.
05-27-2014 07:41 PM
Hi, Check 3 files in /etc/default:
bash-3.2# more llt
#
# This file is sourced :
# from /etc/init.d/llt for Solaris < 2.10
# from /lib/svc/method/llt for Solaris 2.10
#
# Set the two environment variables below as follows:
#
# 1 = start or stop llt
# 0 = do not start or stop llt
#
LLT_START=1 <<<<<<
LLT_STOP=1
bash-3.2# more gab
#
# This file is sourced :
# from /etc/init.d/gab for Solaris < 2.10
# from /lib/svc/method/gab for Solaris 2.10
#
# Set the two environment variables below as follows:
#
# 1 = start or stop gab
# 0 = do not start or stop gab
#
GAB_START=1 <<<<<<<<<<
GAB_STOP=1
bash-3.2# more vcs
# $Id: vcsconf_sun,v 1.5 2011/09/27 09:59:28 asontakk Exp $ #
# $Copyright: Copyright (c) 2012 Symantec Corporation.
# All rights reserved.
#
# THIS SOFTWARE CONTAINS CONFIDENTIAL INFORMATION AND TRADE SECRETS OF
# SYMANTEC CORPORATION. USE, DISCLOSURE OR REPRODUCTION IS PROHIBITED
# WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF SYMANTEC CORPORATION.
#
# The Licensed Software and Documentation are deemed to be commercial
# computer software as defined in FAR 12.212 and subject to restricted
# rights as defined in FAR Section 52.227-19 "Commercial Computer
# Software - Restricted Rights" and DFARS 227.7202, "Rights in
# Commercial Computer Software or Commercial Computer Software
# Documentation", as applicable, and any successor regulations. Any use,
# modification, reproduction release, performance, display or disclosure
# of the Licensed Software and Documentation by the U.S. Government
# shall be solely in accordance with the terms of this Agreement. $ #
#
# This file is sourced :
# from /etc/init.d/vcs for Solaris < 2.10
# from /lib/svc/method/vcs for Solaris 2.10
#
# option to vcs (i.e hastart)
# if ONENODE is set to _yes_, vcs will be started using -onenode option to
# form a single node cluster.
# possible values of ONENODE : yes/no (case sensitive)
ONENODE=no
# Set the two environment variables below as follows:
#
# 1 = start or stop VCS
# 0 = do not start or stop VCS
#
VCS_START=1 <<<<<<<<<<<<<
VCS_STOP=1
bash-3.2# pwd
/etc/default
bash-3.2#