02-13-2012 09:23 PM
Hi All,
I have setup VCS Lab on my own laptop Environment is as follows:
Veritas Workstation 8.0
2 x VMS with 1.5 GB RAM each, Solaris 10 U9 64-bit OS. With All pre-requisites like 3 NICS.
1 for Public interface
2 for Private interface i.e. for I/O fencing.
Authentication without root password.
for shared Storage I have created one more VM with Solaris 10 OS and few disks are shared using iscsi protocol to two node cluster.
I am able to install x_86 VCS 5.1 on this two node cluster sucessfully.
But when I am trying to configure cluster using ./installvcs -configure I observed following error :
Veritas Cluster Server 5.1 SP1 Configure Program [m
[7m prod dev [m
[1m 1[0m) Configure heartbeat links using LLT over Ethernet
[1m 2[0m) Configure heartbeat links using LLT over UDP
[1m 3[0m) Automatically detect configuration for LLT over Ethernet
[1m b[0m) Back to previous menu
[1mHow would you like to configure heartbeat links? [1-3,b,q,?] (3)[0m 1
Discovering NICs on prod ............................................................................................ Discovered e1000g0 e1000g1 e1000g2 e1000g3
[1mEnter the NIC for the first private heartbeat link on prod: [b,q,?] (e1000g2)[0m
[1mWould you like to configure a second private heartbeat link? [y,n,q,b,?] (n)[0m y
[1mEnter the NIC for the second private heartbeat link on prod: [b,q,?] (e1000g3)[0m
[1mWould you like to configure a third private heartbeat link? [y,n,q,b,?] (n)[0m
[1mDo you want to configure an additional low-priority heartbeat link? [y,n,q,b,?] (n)[0m
[1mAre you using the same NICs for private heartbeat links on all systems? [y,n,q,b,?] (y)[0m
Can't use string ("VCS51") as a HASH ref while "strict refs" in use at ../scripts/CPIP/Prod/VCS51.pm line 3870, <STDIN> line 10.
I tried to google above problem but could not find the answer , if anybody encounter such error and resolved this pls help me to resolve.
For any further info. require pls reply.
Thanks & Regards
Anish
Solved! Go to Solution.
02-17-2012 11:50 AM
If you are configuring manually there there are a few more things you need to do:
Mike
02-15-2012 06:19 AM
Hi Anish,
Sorry, I cannot help you directly with this issue since it is outside of my area of expertise. I would recommend that you contact Symantec Technical Support and open a support case. This should be something that we should be able to handle quickly via that support method.
Thanks,
Wally
02-15-2012 06:36 AM
Thanks Wally for responce, but unfortunately I am using this VCS cluster without lic i.e. under evaluation period and this setup is just to learn VCS concept.
02-15-2012 08:45 AM
Can you provide tar/zip of install logs directory. This is in /opt/VRTS/install/logs and the directory should be called installvcs-{some-id}. If you have run multiple times then you will have more than one, so you need to look at the time stamp to send the one that you want.
Mike
02-15-2012 09:03 PM
Hi Mike,
Thanks for showing interest in this problem, I did not find any logs in /opt/VRTS/install path, But during installation and configuration logs are located in /var/tmp/ those logs are attached for your ref.
Pls reply if u needs any more info.
Thnx,
Anish
02-16-2012 01:40 AM
Had a look at logs and couldn't find anything unusual about what you entered and log files don't give any more info about the error. Normally when installer bails out, there is more info in the logs and the logs get moved from /var to /opt, but I guess this error just stops it dead as after the message "Are you using the same NICs for private heartbeat links on all systems", there is nothing else in the logs - it doesn't even show the "Can't use string" error.
If you attach the script that fails, I'll have a quick look to see if I can tell what it is complaining about. In the directory that contains the product directories ("cluster_server", "storage_foundation" etc) you should see the scrtips directory so grab file "scripts/CPIP/Prod/VCS51.pm"
Mike
02-16-2012 02:32 AM
Hi Mike,
Attaching Script which is creating problem.
I found this in following path
/VRTS_SF_HA_Solutions_5.1_SP1_Solaris_x64/dvd2-sol_x64/scripts/CPIP/Prod
Since .pm extention is not allowed to upload I am converting this in text format.
Pls let me know if you need any more information.
Thanks for your support!!
Anish
02-16-2012 03:34 AM
Anish,
I think you may have hit a bug, although I don't understand why other people have not got it too, so if someone from Symantec is watching this thread perhaps they can comment on info below:
Your installation bombs out at line 3870 which is:
$cfg->{$sysi}{bonded_nics}=$cfg->{${$edr->{systems}}[0]->{sys}}{bonded_nics};
In 5.1 (without SP1) you get asked if NIC is bonded, but if you grep for "ask_bonded" in your 5.1SP1 VCS51.pm then you just get:
So in my code I have:
$hb = $prod->ask_hbnic_sys($sysi,1,$rsn,$rpn);
and you have same code, but without the "$ayn=$prod->ask_bonded_nic($hb);"
We know "ask_bonded_nic" is not called, not just by code, but because the installer did not ask you. Now it could be 5.1SP1 is more sophisticated and it doesn't need to ask you as it works it out itself (and I don't know perl, so I can't tell if this is the case) or it could be that bonded_nics variable is not set and as the line of code that fails uses this variable, this is why it fails.
Mike
02-16-2012 04:08 AM
Hi Mike,
Ok. Nice Explanation. Just for your ref. I also tried installing VCS 6.0 which is available in Trialware on Symentec site. But there also I m getting same problem but that is on different Line No.4870 (Not sure).
Now i am confused is this really a bug or I am making any mistake :).
I do not have 5.1 (without SP1) or otherwise i would have tired that also.
Thanks Mike Once again!
Regards,
Anish
02-16-2012 04:55 AM
Could you have a look at line 4870 to see what is says.
Mike
02-16-2012 09:31 PM
Hi
Here is error from VCS 6.0
1) Configure heartbeat links using LLT over Ethernet
2) Configure heartbeat links using LLT over UDP
3) Automatically detect configuration for LLT over Ethernet
b) Back to previous menu
How would you like to configure heartbeat links? [1-3,b,q,?] (1)
Discovering NICs on prod ............................................................................................ Discovered e1000g0 e1000g1 e1000g2 e1000g3
Enter the NIC for the first private heartbeat link on prod: [b,q,?] (e1000g1) e1000g2
Would you like to configure a second private heartbeat link? [y,n,q,b,?] (n) y
Enter the NIC for the second private heartbeat link on prod: [b,q,?] (e1000g1) e1000g3
Would you like to configure a third private heartbeat link? [y,n,q,b,?] (n)
Do you want to configure an additional low-priority heartbeat link? [y,n,q,b,?] (n)
Are you using the same NICs for private heartbeat links on all systems? [y,n,q,b,?] (y)
Can't use string ("VCS60") as a HASH ref while "strict refs" in use at ../scripts/CPIP/Prod/VCS60.pm line 4714, <STDIN> line 6.
bash-3.00#
Following are the lines from script (Error Line Highlighted) did not find much difference between 5.1 (SP1) and 6.0, Also attaching same script FYR:
# ask for all heartbeat links
sub ask_hbnics {
my($cfg,$padv,$prod,$sys,$sysi,$msg,$cprod);
my($all,$ayn,%hbn,@en,$dsn,$hb,$hb2,$hb3,$hb4,$hbl,$ip,$port,$rpn,$rsn,$udp_port,$used_port);
return '' if (Cfg::opt('responsefile'));
$prod=shift;
$cfg=Obj::cfg();
$cprod=CPIC::get('prod');
$used_port = [];
for my $sys (@{CPIC::get('systems')}) {
$sysi=$sys->{sys};
$padv=$sys->padv;
if ($all) {
$hbn{lltlink1}{$sysi}=$en[1];
$hbn{lltlink2}{$sysi}=$en[2] if ($en[2]);
$hbn{lltlink3}{$sysi}=$en[3] if ($en[3]);
$hbn{lltlink4}{$sysi}=$en[4] if ($en[4]);
$hbn{lltlinklowpri1}{$sysi}=$en[$prod->{max_hipri_links}+1] if ($en[$prod->{max_hipri_links}+1]);
$cfg->{$sysi}{bonded_nics}=$cfg->{${CPIC::get('systems')}[0]->{sys}}{bonded_nics};
} else {
undef(@en);
$rsn=$rpn=[];
if (EDRu::inarr($sys,@{CPIC::get('systems')})) {
Msg::n();
$msg=Msg::new("Discovering NICs on $sysi", 40, 2398, "$sysi");
$msg->left;
$padv=$sys->padv;
$rsn=$padv->systemnics_sys($sys,1);
$rpn=$padv->gatewaynics_sys($sys);
EDRu::arruniq(@$rsn);
$dsn=join(' ',@$rsn);
if ($#$rsn<0) {
$msg=Msg::new("No NICs discovered", 40, 2399);
$msg->right;
} else {
$msg=Msg::new("Discovered $dsn", 40, 2400, "$dsn");
$msg->right;
#$msg=Msg::new("\nTo use aggregated interfaces for private heartbeat, enter the name of an aggregated interface. \nTo use a NIC for private heartbea
t, enter a NIC which is not part of an aggregated interface.\n");
#$msg->print;
Actually I want to try pervious version like 5.0 , can anybody know where I can download this?
Thanks & regards,
Anish
02-17-2012 08:05 AM
Anish asked me to respond to this. I would recommend manual configuration by intalling the product with './installvcs -installonly' and then creating the 3 files you need to get LLT and GAB working manually. (It's easy :)
/etc/llthosts:
0 hosta
1 hostb
/etc/gabtab:
/sbin/gabconfig -c -n2
/etc/llttab:
set-cluster 10
set-node /etc/nodename
set-timer peertrouble:400
link nxge0 /dev/nxge:0 - ether - -
link nxge4 /dev/nxge:4 - ether - -
link-lowpri vnet0 /dev/vnet:0 - ether - -
start
You will need to change the network device names to match your configuration in the llttab file.
02-17-2012 08:38 AM
Hi,
I've escalated this to support to see if this is a known bug or if there is additional info they can provide for you. Will update when I hear back.
Best,
Kimberley
02-17-2012 08:48 AM
Thanks Seann, As Suggested I will try to Configure VCS manually and post the result.
Thanks Kimberley, I would like to hear on this problem.
02-17-2012 11:50 AM
If you are configuring manually there there are a few more things you need to do:
Mike
02-20-2012 09:22 PM
Hi All,
Thanks for your responces, as suggested i tried to configure VCS manually
Following steps are performed :
on Dev node:
bash-3.00# cat /etc/llttab
set-node prod
set-cluster 101
link e1000g2 /dev/e1000g:2 - ether - -
link e1000g3 /dev/e1000g:3 - ether - -
bash-3.00# cat /etc/llthosts
1 dev
2 prod
bash-3.00# cat /etc/VRTSvcs/conf/config/main.cf
include "types.cf"
cluster rainbow (
)
system dev (
)
system prod (
)
bash-3.00# cat /etc/gabtab
/sbin/gabconfig -c -n2
bash-3.00# cat /etc/VRTSvcs/conf/sysname
prod
on prod node:
bash-3.00# cat /etc/llttab
set-node dev
set-cluster 101
link e1000g2 /dev/e1000g:2 - ether - -
link e1000g3 /dev/e1000g:3 - ether - -
bash-3.00# cat /etc/llthosts
1 dev
2 prod
bash-3.00# cat /etc/gabtab
/sbin/gabconfig -c -n2
bash-3.00# cat /etc/VRTSvcs/conf/config/main.cf
include "types.cf"
copied types.cf file from /etc/VRTSvcs/conf to /etc/VRTSvcs/conf/config
bash-3.00# cat /etc/VRTSvcs/conf/sysname
dev
after doing this tried to start llt and gab on both nodes using command
lltconfig -c
and
sh /etc/gabtab
but did not sucessful
so tried to start their SMF (i think in VCS6.0 they have removed /etc/rc2.d/S70llt and /etc/rc2.d/S92gab)
svcadm enable svc:/system/llt:default
svcadm enable svc:/system/gab:default
but still services was going in maintenance after analyzing the logs i found following error messages :
Feb 18 19:14:15 Executing start method ("/lib/svc/method/llt start") ]
This script is not allowed to start LLT. LLT_START is not 1
for this i changed value in following file:
bash-3.00# cat /etc/default/llt
#
# This file is sourced :
# from /etc/init.d/llt for Solaris < 2.10
# from /lib/svc/method/llt for Solaris 2.10
#
# Set the two environment variables below as follows:
#
# 1 = start or stop llt
# 0 = do not start or stop llt
#
LLT_START=1-----------> by default it was set to 0
LLT_STOP=1-----------> by default it was set to 0
same for gab
bash-3.00# cat /etc/default/gab
#
# This file is sourced :
# from /etc/init.d/gab for Solaris < 2.10
# from /lib/svc/method/gab for Solaris 2.10
#
# Set the two environment variables below as follows:
#
# 1 = start or stop gab
# 0 = do not start or stop gab
#
GAB_START=1-----------> by default it was set to 0
GAB_STOP=1-----------> by default it was set to 0
then my both services are up and running on both nodes :
bash-3.00# svcs -a|grep llt
online 9:07:25 svc:/system/llt:default
bash-3.00# svcs -a|grep gab
online 9:07:28 svc:/system/gab:default
then tried to bring VCS services online
but again it was going in to maintainace due to follwowing error :
Feb 18 23:05:26 dev Had[510]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10614 Cluster UUID is not configured or it is empty, on system dev - VCS Stopping. Manually Re
start VCS after configuring Cluster UUID.
to configure run following command on both nodes:
/opt/VRTSvcs/bin/uuidconfig.pl -clus -configure
also changed following values in /etc/default/vcs file:
VCS_START=1
VCS_STOP=1
now all my services are running but still i m getting following problem :
my gab is working properly on both nodes but llt is not communicating other node
output of gab is as following after starting cluster using hastart on both nodes:
bash-3.00# hastart
bash-3.00# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 12f3f02 membership ;1
Port h gen 12f3f09 membership ;1
bash-3.00# uname -n
dev
bash-3.00# hastart
bash-3.00# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A dev UNKNOWN 0
A prod RUNNING 0
bash-3.00# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 12f3f02 membership ; 2
Port h gen 12f3f0b membership ; 2
bash-3.00# uname -n
prod
but output of llt on dev node is :
Port h gen 12f3f09 membership ;1
bash-3.00# uname -n
dev
bash-3.00# lltstat -nl
LLT node information:
Node State Links
* 1 dev OPEN 2
LLT link information:
link 0 e1000g2 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 3514 txbytes 211939
rxpkts 937 rxbytes 68662
latehb 0 badcksum 0 errors 0
link 1 e1000g3 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 347 txbytes 24504
rxpkts 281 rxbytes 19328
latehb 0 badcksum 0 errors 0
and on prod :
bash-3.00# lltstat -nl
LLT node information:
Node State Links
* 2 prod OPEN 2
LLT link information:
link 0 e1000g2 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 3390 txbytes 180168
rxpkts 713 rxbytes 52320
latehb 0 badcksum 0 errors 0
link 1 e1000g3 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 444 txbytes 31270
rxpkts 257 rxbytes 15827
latehb 0 badcksum 0 errors 0
whereas it should see each other.
due to this may be i m getting output of hastatus -sum on prod:
bash-3.00# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A dev UNKNOWN 0
A prod RUNNING 0
bash-3.00# uname -n
prod
and on dev node output is :
bash-3.00# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A dev RUNNING 0
one more observation after starting cluster main.cf file on dev node is auto modified to
bash-3.00# cat /etc/VRTSvcs/conf/config/main.cf
include "types.cf"
cluster vcs (
)
system dev (
)
whereas only include "types.cf" line was present and we have added actual configuration on prod node.
in message file i can see following messages related to llt interfaces for other nodes:
Feb 21 09:07:17 dev e1000g: [ID 801725 kern.info] NOTICE: pci8086,100f - e1000g[3] : link up, 1000 Mbps, full duplex
Feb 21 09:07:17 dev e1000g: [ID 801725 kern.info] NOTICE: pci8086,100f - e1000g[2] : link up, 1000 Mbps, full duplex
Feb 21 09:07:27 dev genunix: [ID 644314 kern.notice] GAB INFO V-15-1-20026 Port a[GAB_Control (refcount 2)] registration waiting for seed port membership
Feb 21 09:07:41 dev syslog[542]: [ID 702911 daemon.notice] VCS INFO V-16-1-11240 Command Server: running with security OFF
Feb 21 09:07:42 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10619 'HAD' starting on: dev
Feb 21 09:07:42 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10620 Waiting for local cluster configuration status
Feb 21 09:07:42 dev genunix: [ID 122464 kern.notice] LLT INFO V-14-1-10499 recvarpreq link 1 for node 2 addr change from 00:00:00:00:00:00 to 00:0C:29:E2:CE:CB
Feb 21 09:07:42 dev genunix: [ID 122464 kern.notice] LLT INFO V-14-1-10499 recvarpreq link 0 for node 2 addr change from 00:00:00:00:00:00 to 00:0C:29:E2:CE:D5
Feb 21 09:07:42 dev genunix: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (e1000g2) node 2 active
Feb 21 09:07:44 dev genunix: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 1 (e1000g3) node 2 active
Feb 21 09:07:49 dev syslog[542]: [ID 702911 daemon.warning] WARNING V-365-1-1 This host is not entitled to run Veritas Storage Foundation/Veritas Cluster Server.
Feb 21 09:07:49 dev As set forth in the End User License Agreement (EULA) you must complete one of the two options set forth below. To comply with this condition of the EULA and stop logging of this message, you have 56 days to either:
Feb 21 09:07:49 dev - make this host managed by a Management Server (see http://go.symantec.com/sfhakeyless for details and free download), or
Feb 21 09:07:49 dev - add a valid license key matching the functionality in use on this host using the command 'vxlicinst' and validate using the command 'vxkeyless set NONE'.
Feb 21 09:07:49 dev genunix: [ID 272960 kern.notice] GAB INFO V-15-1-20036 Port a[GAB_Control (refcount 1)] gen 12f3f01 membership ;12
Feb 21 09:08:04 dev genunix: [ID 773945 kern.info] UltraDMA mode 2 selected
Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property
Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected
Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property
Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected
Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property
Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected
Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property
Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected
Feb 21 09:08:13 dev svc.startd[7]: [ID 122153 daemon.warning] svc:/application/stosreg:default: Method or service exit timed out. Killing contract 95.
Feb 21 09:08:13 dev svc.startd[7]: [ID 636263 daemon.warning] svc:/application/stosreg:default: Method "/lib/svc/method/svc-stosreg" failed due to signal KILL.
Feb 21 09:08:14 dev sendmail[584]: [ID 702911 mail.crit] My unqualified host name (dev) unknown; sleeping for retry
Feb 21 09:08:17 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10625 Local cluster configuration valid
Feb 21 09:08:17 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11034 Registering for cluster membership
Feb 21 09:08:17 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11035 Waiting for cluster membership
Feb 21 09:08:22 dev genunix: [ID 272960 kern.notice] GAB INFO V-15-1-20036 Port h[GAB_USER_CLIENT (refcount 0)] gen 12f3f04 membership ;12
Feb 21 09:08:22 dev Had[497]: [ID 702911 daemon.notice] VCS INFO V-16-1-10077 Received new cluster membership
Feb 21 09:08:23 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10086 System dev (Node '1') is in Regular Membership - Membership: 0x6
Feb 21 09:08:23 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10086 System (Node '2') is in Regular Membership - Membership: 0x6
Feb 21 09:08:26 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10073 Building from local configuration
Feb 21 09:08:26 dev genunix: [ID 577146 kern.notice] NOTICE: VXFEN INFO V-11-1-VxFEN unloaded
Feb 21 09:08:27 dev genunix: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (e1000g3) node 2 in trouble
Feb 21 09:08:27 dev rootnex: [ID 349649 kern.info] xsvc0 at root
Feb 21 09:08:27 dev genunix: [ID 936769 kern.info] xsvc0 is /xsvc
Feb 21 09:08:31 dev pseudo: [ID 129642 kern.info] pseudo-device: devinfo0
Feb 21 09:08:31 dev genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo@0
Feb 21 09:08:31 dev unix: [ID 954099 kern.info] NOTICE: IRQ19 is being shared by drivers with different interrupt levels.
Feb 21 09:08:31 dev This may result in reduced system performance.
Feb 21 09:08:31 dev pci_pci: [ID 370704 kern.info] PCI-device: pci1274,1371@1, audioens0
Feb 21 09:08:31 dev genunix: [ID 936769 kern.info] audioens0 is /pci@0,0/pci15ad,790@11/pci1274,1371@1
Feb 21 09:08:33 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 8 sec (281)
Feb 21 09:08:34 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 9 sec (281)
Feb 21 09:08:35 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 10 sec (281)
Feb 21 09:08:36 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 11 sec (281)
Feb 21 09:08:37 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 12 sec (281)
Feb 21 09:08:38 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 13 sec (281)
Feb 21 09:08:39 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 4 more to go.
Feb 21 09:08:39 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 14 sec (281)
Feb 21 09:08:39 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 3 more to go.
Feb 21 09:08:40 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 2 more to go.
Feb 21 09:08:40 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 15 sec (281)
Feb 21 09:08:40 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 1 more to go.
Feb 21 09:08:41 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 0 more to go.
Feb 21 09:08:41 dev genunix: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 1 (e1000g3) node 2 expired
Feb 21 09:08:41 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10066 Entering RUNNING state
Feb 21 09:08:47 dev genunix: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (e1000g2) node 2 in trouble
Feb 21 09:08:49 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-50311 VCS Engine: running with security OFF
Feb 21 09:08:54 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 8 sec (410)
Feb 21 09:08:55 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 10 sec (411)
Feb 21 09:08:56 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 11 sec (412)
Feb 21 09:08:51 dev Had[497]: [ID 702911 daemon.alert] VCS WARNING V-16-1-40184 HAD Self Check: Excessive delay in the HAD heartbeat to GAB
Feb 21 09:08:57 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 12 sec (412)
Feb 21 09:08:58 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 13 sec (413)
Feb 21 09:08:59 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 14 sec (413)
same messages are observed on other node where as output of dladm show-dev is as follows:
bash-3.00# uname -n
prod
bash-3.00# dladm show-dev
e1000g0 link: up speed: 1000 Mbps duplex: full
e1000g1 link: up speed: 1000 Mbps duplex: full
e1000g2 link: up speed: 1000 Mbps duplex: full----link used for llt
e1000g3 link: up speed: 1000 Mbps duplex: full---link used for llt
bash-3.00# uname -n
dev
bash-3.00# dladm show-dev
e1000g0 link: up speed: 1000 Mbps duplex: full
e1000g1 link: up speed: 1000 Mbps duplex: full
e1000g2 link: up speed: 1000 Mbps duplex: full----link used for llt
e1000g3 link: up speed: 1000 Mbps duplex: full----link used for llt
if anybody knows solution to above problem pls guide me i think i m one step behind my cluster configuration . Thanks for your support.
Anish
02-21-2012 02:25 AM
LLT config looks ok, so problem is probably with configuration of networks in VMWare. Could you plumb in IPs on heartbeats temporarily so that you can test HB links using ping.
Can you also provide output of 2 commands below:
lltstat -nvv
eeprom | grep mac
Mike
03-01-2012 07:53 AM
Hi Mike,
Sorry for late reply. It is sucessful now.
Actually I was troubleshooting for llt and gab errors only.
As per your post it was right the problem was because of interface only. I was using VMWare workstation 8.0 and there i found that when i add more than three host-only netwrok adapter last interface i.e. e1000g3 was not communicating to other nodes due to which my llt and gab was failing. I tried so many thing finally moved VMware server 2.0 there i dont found any limitations and my configuration is sucessful. Here is the output from both nodes:
I think this must be expected output (Pls correct if i missed somewhere):
on dev node:
bash-3.00# lltstat -nl
LLT node information:
Node State Links
* 1 dev OPEN 2
2 prod OPEN 2
LLT link information:
link 0 e1000g1 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 15584 txbytes 1158103
rxpkts 15977 rxbytes 1377027
latehb 5 badcksum 0 errors 0
link 1 e1000g2 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 15416 txbytes 1138179
rxpkts 15892 rxbytes 1389545
latehb 5 badcksum 0 errors 0
bash-3.00# uname -n
dev
bash-3.00# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 15f9501 membership ;12
Port h gen 15f9509 membership ;12
bash-3.00# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A dev RUNNING 0
A prod RUNNING 0
bash-3.00#
on prod node:
bash-3.00# lltstat -nl
LLT node information:
Node State Links
1 dev OPEN 2
* 2 prod OPEN 2
LLT link information:
link 0 e1000g1 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 15558 txbytes 1356519
rxpkts 16607 rxbytes 1226122
latehb 17 badcksum 0 errors 0
link 1 e1000g2 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 15642 txbytes 1344401
rxpkts 16768 rxbytes 1244054
latehb 17 badcksum 0 errors 0
bash-3.00# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 15f9501 membership ;12
Port h gen 15f9509 membership ;12
bash-3.00# uname -n
prod
bash-3.00# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A dev RUNNING 0
A prod RUNNING 0
bash-3.00#
Thank you all for supporting .
I know this is just start and long way ahead , I will keep you posted if sometime i will be trouble ....... Thnx once again.
Regards,
Anish