Hi All,
Thanks for your responces, as suggested i tried to configure VCS manually
Following steps are performed :
on Dev node:
bash-3.00# cat /etc/llttab
set-node prod
set-cluster 101
link e1000g2 /dev/e1000g:2 - ether - -
link e1000g3 /dev/e1000g:3 - ether - -
bash-3.00# cat /etc/llthosts
1 dev
2 prod
bash-3.00# cat /etc/VRTSvcs/conf/config/main.cf
include "types.cf"
cluster rainbow (
)
system dev (
)
system prod (
)
bash-3.00# cat /etc/gabtab
/sbin/gabconfig -c -n2
bash-3.00# cat /etc/VRTSvcs/conf/sysname
prod
on prod node:
bash-3.00# cat /etc/llttab
set-node dev
set-cluster 101
link e1000g2 /dev/e1000g:2 - ether - -
link e1000g3 /dev/e1000g:3 - ether - -
bash-3.00# cat /etc/llthosts
1 dev
2 prod
bash-3.00# cat /etc/gabtab
/sbin/gabconfig -c -n2
bash-3.00# cat /etc/VRTSvcs/conf/config/main.cf
include "types.cf"
copied types.cf file from /etc/VRTSvcs/conf to /etc/VRTSvcs/conf/config
bash-3.00# cat /etc/VRTSvcs/conf/sysname
dev
after doing this tried to start llt and gab on both nodes using command
lltconfig -c
and
sh /etc/gabtab
but did not sucessful
so tried to start their SMF (i think in VCS6.0 they have removed /etc/rc2.d/S70llt and /etc/rc2.d/S92gab)
svcadm enable svc:/system/llt:default
svcadm enable svc:/system/gab:default
but still services was going in maintenance after analyzing the logs i found following error messages :
Feb 18 19:14:15 Executing start method ("/lib/svc/method/llt start") ]
This script is not allowed to start LLT. LLT_START is not 1
for this i changed value in following file:
bash-3.00# cat /etc/default/llt
#
# This file is sourced :
# from /etc/init.d/llt for Solaris < 2.10
# from /lib/svc/method/llt for Solaris 2.10
#
# Set the two environment variables below as follows:
#
# 1 = start or stop llt
# 0 = do not start or stop llt
#
LLT_START=1-----------> by default it was set to 0
LLT_STOP=1-----------> by default it was set to 0
same for gab
bash-3.00# cat /etc/default/gab
#
# This file is sourced :
# from /etc/init.d/gab for Solaris < 2.10
# from /lib/svc/method/gab for Solaris 2.10
#
# Set the two environment variables below as follows:
#
# 1 = start or stop gab
# 0 = do not start or stop gab
#
GAB_START=1-----------> by default it was set to 0
GAB_STOP=1-----------> by default it was set to 0
then my both services are up and running on both nodes :
bash-3.00# svcs -a|grep llt
online 9:07:25 svc:/system/llt:default
bash-3.00# svcs -a|grep gab
online 9:07:28 svc:/system/gab:default
then tried to bring VCS services online
but again it was going in to maintainace due to follwowing error :
Feb 18 23:05:26 dev Had[510]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10614 Cluster UUID is not configured or it is empty, on system dev - VCS Stopping. Manually Re
start VCS after configuring Cluster UUID.
to configure run following command on both nodes:
/opt/VRTSvcs/bin/uuidconfig.pl -clus -configure
also changed following values in /etc/default/vcs file:
VCS_START=1
VCS_STOP=1
now all my services are running but still i m getting following problem :
my gab is working properly on both nodes but llt is not communicating other node
output of gab is as following after starting cluster using hastart on both nodes:
bash-3.00# hastart
bash-3.00# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 12f3f02 membership ;1
Port h gen 12f3f09 membership ;1
bash-3.00# uname -n
dev
bash-3.00# hastart
bash-3.00# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A dev UNKNOWN 0
A prod RUNNING 0
bash-3.00# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 12f3f02 membership ; 2
Port h gen 12f3f0b membership ; 2
bash-3.00# uname -n
prod
but output of llt on dev node is :
Port h gen 12f3f09 membership ;1
bash-3.00# uname -n
dev
bash-3.00# lltstat -nl
LLT node information:
Node State Links
* 1 dev OPEN 2
LLT link information:
link 0 e1000g2 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 3514 txbytes 211939
rxpkts 937 rxbytes 68662
latehb 0 badcksum 0 errors 0
link 1 e1000g3 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 347 txbytes 24504
rxpkts 281 rxbytes 19328
latehb 0 badcksum 0 errors 0
and on prod :
bash-3.00# lltstat -nl
LLT node information:
Node State Links
* 2 prod OPEN 2
LLT link information:
link 0 e1000g2 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 3390 txbytes 180168
rxpkts 713 rxbytes 52320
latehb 0 badcksum 0 errors 0
link 1 e1000g3 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 444 txbytes 31270
rxpkts 257 rxbytes 15827
latehb 0 badcksum 0 errors 0
whereas it should see each other.
due to this may be i m getting output of hastatus -sum on prod:
bash-3.00# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A dev UNKNOWN 0
A prod RUNNING 0
bash-3.00# uname -n
prod
and on dev node output is :
bash-3.00# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A dev RUNNING 0
one more observation after starting cluster main.cf file on dev node is auto modified to
bash-3.00# cat /etc/VRTSvcs/conf/config/main.cf
include "types.cf"
cluster vcs (
)
system dev (
)
whereas only include "types.cf" line was present and we have added actual configuration on prod node.
in message file i can see following messages related to llt interfaces for other nodes:
Feb 21 09:07:17 dev e1000g: [ID 801725 kern.info] NOTICE: pci8086,100f - e1000g[3] : link up, 1000 Mbps, full duplex
Feb 21 09:07:17 dev e1000g: [ID 801725 kern.info] NOTICE: pci8086,100f - e1000g[2] : link up, 1000 Mbps, full duplex
Feb 21 09:07:27 dev genunix: [ID 644314 kern.notice] GAB INFO V-15-1-20026 Port a[GAB_Control (refcount 2)] registration waiting for seed port membership
Feb 21 09:07:41 dev syslog[542]: [ID 702911 daemon.notice] VCS INFO V-16-1-11240 Command Server: running with security OFF
Feb 21 09:07:42 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10619 'HAD' starting on: dev
Feb 21 09:07:42 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10620 Waiting for local cluster configuration status
Feb 21 09:07:42 dev genunix: [ID 122464 kern.notice] LLT INFO V-14-1-10499 recvarpreq link 1 for node 2 addr change from 00:00:00:00:00:00 to 00:0C:29:E2:CE:CB
Feb 21 09:07:42 dev genunix: [ID 122464 kern.notice] LLT INFO V-14-1-10499 recvarpreq link 0 for node 2 addr change from 00:00:00:00:00:00 to 00:0C:29:E2:CE:D5
Feb 21 09:07:42 dev genunix: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (e1000g2) node 2 active
Feb 21 09:07:44 dev genunix: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 1 (e1000g3) node 2 active
Feb 21 09:07:49 dev syslog[542]: [ID 702911 daemon.warning] WARNING V-365-1-1 This host is not entitled to run Veritas Storage Foundation/Veritas Cluster Server.
Feb 21 09:07:49 dev As set forth in the End User License Agreement (EULA) you must complete one of the two options set forth below. To comply with this condition of the EULA and stop logging of this message, you have 56 days to either:
Feb 21 09:07:49 dev - make this host managed by a Management Server (see http://go.symantec.com/sfhakeyless for details and free download), or
Feb 21 09:07:49 dev - add a valid license key matching the functionality in use on this host using the command 'vxlicinst' and validate using the command 'vxkeyless set NONE'.
Feb 21 09:07:49 dev genunix: [ID 272960 kern.notice] GAB INFO V-15-1-20036 Port a[GAB_Control (refcount 1)] gen 12f3f01 membership ;12
Feb 21 09:08:04 dev genunix: [ID 773945 kern.info] UltraDMA mode 2 selected
Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property
Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected
Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property
Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected
Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property
Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected
Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property
Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected
Feb 21 09:08:13 dev svc.startd[7]: [ID 122153 daemon.warning] svc:/application/stosreg:default: Method or service exit timed out. Killing contract 95.
Feb 21 09:08:13 dev svc.startd[7]: [ID 636263 daemon.warning] svc:/application/stosreg:default: Method "/lib/svc/method/svc-stosreg" failed due to signal KILL.
Feb 21 09:08:14 dev sendmail[584]: [ID 702911 mail.crit] My unqualified host name (dev) unknown; sleeping for retry
Feb 21 09:08:17 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10625 Local cluster configuration valid
Feb 21 09:08:17 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11034 Registering for cluster membership
Feb 21 09:08:17 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11035 Waiting for cluster membership
Feb 21 09:08:22 dev genunix: [ID 272960 kern.notice] GAB INFO V-15-1-20036 Port h[GAB_USER_CLIENT (refcount 0)] gen 12f3f04 membership ;12
Feb 21 09:08:22 dev Had[497]: [ID 702911 daemon.notice] VCS INFO V-16-1-10077 Received new cluster membership
Feb 21 09:08:23 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10086 System dev (Node '1') is in Regular Membership - Membership: 0x6
Feb 21 09:08:23 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10086 System (Node '2') is in Regular Membership - Membership: 0x6
Feb 21 09:08:26 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10073 Building from local configuration
Feb 21 09:08:26 dev genunix: [ID 577146 kern.notice] NOTICE: VXFEN INFO V-11-1-VxFEN unloaded
Feb 21 09:08:27 dev genunix: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (e1000g3) node 2 in trouble
Feb 21 09:08:27 dev rootnex: [ID 349649 kern.info] xsvc0 at root
Feb 21 09:08:27 dev genunix: [ID 936769 kern.info] xsvc0 is /xsvc
Feb 21 09:08:31 dev pseudo: [ID 129642 kern.info] pseudo-device: devinfo0
Feb 21 09:08:31 dev genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo@0
Feb 21 09:08:31 dev unix: [ID 954099 kern.info] NOTICE: IRQ19 is being shared by drivers with different interrupt levels.
Feb 21 09:08:31 dev This may result in reduced system performance.
Feb 21 09:08:31 dev pci_pci: [ID 370704 kern.info] PCI-device: pci1274,1371@1, audioens0
Feb 21 09:08:31 dev genunix: [ID 936769 kern.info] audioens0 is /pci@0,0/pci15ad,790@11/pci1274,1371@1
Feb 21 09:08:33 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 8 sec (281)
Feb 21 09:08:34 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 9 sec (281)
Feb 21 09:08:35 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 10 sec (281)
Feb 21 09:08:36 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 11 sec (281)
Feb 21 09:08:37 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 12 sec (281)
Feb 21 09:08:38 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 13 sec (281)
Feb 21 09:08:39 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 4 more to go.
Feb 21 09:08:39 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 14 sec (281)
Feb 21 09:08:39 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 3 more to go.
Feb 21 09:08:40 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 2 more to go.
Feb 21 09:08:40 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 15 sec (281)
Feb 21 09:08:40 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 1 more to go.
Feb 21 09:08:41 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 0 more to go.
Feb 21 09:08:41 dev genunix: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 1 (e1000g3) node 2 expired
Feb 21 09:08:41 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10066 Entering RUNNING state
Feb 21 09:08:47 dev genunix: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (e1000g2) node 2 in trouble
Feb 21 09:08:49 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-50311 VCS Engine: running with security OFF
Feb 21 09:08:54 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 8 sec (410)
Feb 21 09:08:55 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 10 sec (411)
Feb 21 09:08:56 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 11 sec (412)
Feb 21 09:08:51 dev Had[497]: [ID 702911 daemon.alert] VCS WARNING V-16-1-40184 HAD Self Check: Excessive delay in the HAD heartbeat to GAB
Feb 21 09:08:57 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 12 sec (412)
Feb 21 09:08:58 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 13 sec (413)
Feb 21 09:08:59 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 14 sec (413)
same messages are observed on other node where as output of dladm show-dev is as follows:
bash-3.00# uname -n
prod
bash-3.00# dladm show-dev
e1000g0 link: up speed: 1000 Mbps duplex: full
e1000g1 link: up speed: 1000 Mbps duplex: full
e1000g2 link: up speed: 1000 Mbps duplex: full----link used for llt
e1000g3 link: up speed: 1000 Mbps duplex: full---link used for llt
bash-3.00# uname -n
dev
bash-3.00# dladm show-dev
e1000g0 link: up speed: 1000 Mbps duplex: full
e1000g1 link: up speed: 1000 Mbps duplex: full
e1000g2 link: up speed: 1000 Mbps duplex: full----link used for llt
e1000g3 link: up speed: 1000 Mbps duplex: full----link used for llt
if anybody knows solution to above problem pls guide me i think i m one step behind my cluster configuration . Thanks for your support.
Anish