Solved: I recommend manual configuration

Anish · ‎02-13-2012

Hi All,

I have setup VCS Lab on my own laptop Environment is as follows:

Veritas Workstation 8.0

2 x VMS with 1.5 GB RAM each, Solaris 10 U9 64-bit OS. With All pre-requisites like 3 NICS.

1 for Public interface

2 for Private interface i.e. for I/O fencing.

Authentication without root password.

for shared Storage I have created one more VM with Solaris 10 OS and few disks are shared using iscsi protocol to two node cluster.

I am able to install x_86 VCS 5.1 on this two node cluster sucessfully.

But when I am trying to configure cluster using ./installvcs -configure I observed following error :

Veritas Cluster Server 5.1 SP1 Configure Program                                                            [m
[7m                                                                                prod dev                                                                               [m

[1m     1[0m) Configure heartbeat links using LLT over Ethernet
[1m     2[0m) Configure heartbeat links using LLT over UDP
[1m     3[0m) Automatically detect configuration for LLT over Ethernet
[1m     b[0m) Back to previous menu

[1mHow would you like to configure heartbeat links? [1-3,b,q,?] (3)[0m 1

    Discovering NICs on prod ............................................................................................ Discovered e1000g0 e1000g1 e1000g2 e1000g3

[1mEnter the NIC for the first private heartbeat link on prod: [b,q,?] (e1000g2)[0m
[1mWould you like to configure a second private heartbeat link? [y,n,q,b,?] (n)[0m y
[1mEnter the NIC for the second private heartbeat link on prod: [b,q,?] (e1000g3)[0m
[1mWould you like to configure a third private heartbeat link? [y,n,q,b,?] (n)[0m
[1mDo you want to configure an additional low-priority heartbeat link? [y,n,q,b,?] (n)[0m
[1mAre you using the same NICs for private heartbeat links on all systems? [y,n,q,b,?] (y)[0m
Can't use string ("VCS51") as a HASH ref while "strict refs" in use at ../scripts/CPIP/Prod/VCS51.pm line 3870, <STDIN> line 10.

I tried to google above problem but could not find the answer , if anybody encounter such error and resolved this pls help me to resolve.

For any further info. require pls reply.

Thanks & Regards

Anish

mikebounds · ‎02-17-2012

If you are configuring manually there there are a few more things you need to do:

Create /etc/VRTSvcs/conf/sysname file containing hostname
Check contents of /etc/sysconfig/vcs for line that says ONENODE= to make sure this says no
You may need to copy types.cf file from /etc/VRTSvcs/conf to /etc/VRTSvcs/conf/config
Create a main.cf file on one of the nodes in /etc/VRTSvcs/conf to /etc/VRTSvcs/conf/config as follows:
include "types.cf"
cluster rainbow (
)
system prod ()
system dev ()
Start LLT - I think path is - /etc/rc2.d/S70llt start (both nodes)
Start GAB - I think path is - /etc/rc2.d/S92gab start
Start VCS on the node you created main.cf - /opt/VRTSvcs/bin/hastart
Start VCS on the other node /opt/VRTSvcs/bin/hastart
Run /opt/VRTSvcs/bin/hauser -add admin -priv Administrator

Mike

View solution in original post

Wally_Heim · ‎02-15-2012

Hi Anish,

Sorry, I cannot help you directly with this issue since it is outside of my area of expertise. I would recommend that you contact Symantec Technical Support and open a support case. This should be something that we should be able to handle quickly via that support method.

Thanks,

Wally

Anish · ‎02-15-2012

Thanks Wally for responce, but unfortunately I am using this VCS cluster without lic i.e. under evaluation period and this setup is just to learn VCS concept.

mikebounds · ‎02-15-2012

Can you provide tar/zip of install logs directory. This is in /opt/VRTS/install/logs and the directory should be called installvcs-{some-id}. If you have run multiple times then you will have more than one, so you need to look at the time stamp to send the one that you want.

Mike

Anish · ‎02-15-2012

Hi Mike,

Thanks for showing interest in this problem, I did not find any logs in /opt/VRTS/install path, But during installation and configuration logs are located in /var/tmp/ those logs are attached for your ref.

Pls reply if u needs any more info.

Thnx,

Anish

mikebounds · ‎02-16-2012

Had a look at logs and couldn't find anything unusual about what you entered and log files don't give any more info about the error. Normally when installer bails out, there is more info in the logs and the logs get moved from /var to /opt, but I guess this error just stops it dead as after the message "Are you using the same NICs for private heartbeat links on all systems", there is nothing else in the logs - it doesn't even show the "Can't use string" error.

If you attach the script that fails, I'll have a quick look to see if I can tell what it is complaining about. In the directory that contains the product directories ("cluster_server", "storage_foundation" etc) you should see the scrtips directory so grab file "scripts/CPIP/Prod/VCS51.pm"

Mike

Anish · ‎02-16-2012

Hi Mike,

Attaching Script which is creating problem.

I found this in following path

/VRTS_SF_HA_Solutions_5.1_SP1_Solaris_x64/dvd2-sol_x64/scripts/CPIP/Prod

Since .pm extention is not allowed to upload I am converting this in text format.

Pls let me know if you need any more information.

Thanks for your support!!

Anish

mikebounds · ‎02-16-2012

Anish,

I think you may have hit a bug, although I don't understand why other people have not got it too, so if someone from Symantec is watching this thread perhaps they can comment on info below:

Your installation bombs out at line 3870 which is:

$cfg->{$sysi}{bonded_nics}=$cfg->{${$edr->{systems}}[0]->{sys}}{bonded_nics};

In 5.1 (without SP1) you get asked if NIC is bonded, but if you grep for "ask_bonded" in your 5.1SP1 VCS51.pm then you just get:

sub ask_bonded_nic {

So function is defined, but is is not called, where as is my 5.1 (no SP1) VCS51.pm, if you run the same grep, you get:

sub ask_bonded_nic {

$ayn=$prod->ask_bonded_nic($hb);

So in my code I have:

$hb = $prod->ask_hbnic_sys($sysi,1,$rsn,$rpn);

return $hb if ($hb eq $edr->{msg}{back});

$hbn{lltlink1}{$sys} = $en[1] = $hb;

if ($padv->is_bonded_nic_sys($sysi,$hb)){

$ayn=$prod->ask_bonded_nic($hb);

push(@{$cfg->{$sys}{bonded_nics}},$hb) if ($ayn);

}

and you have same code, but without the "$ayn=$prod->ask_bonded_nic($hb);"

We know "ask_bonded_nic" is not called, not just by code, but because the installer did not ask you. Now it could be 5.1SP1 is more sophisticated and it doesn't need to ask you as it works it out itself (and I don't know perl, so I can't tell if this is the case) or it could be that bonded_nics variable is not set and as the line of code that fails uses this variable, this is why it fails.

Mike

Anish · ‎02-16-2012

Hi Mike,

Ok. Nice Explanation. Just for your ref. I also tried installing VCS 6.0 which is available in Trialware on Symentec site. But there also I m getting same problem but that is on different Line No.4870 (Not sure).

Now i am confused is this really a bug or I am making any mistake :).

I do not have 5.1 (without SP1) or otherwise i would have tired that also.

Thanks Mike Once again!

Regards,

Anish

mikebounds · ‎02-16-2012

Could you have a look at line 4870 to see what is says.

Mike

Anish · ‎02-16-2012

Hi

Here is error from VCS 6.0

1) Configure heartbeat links using LLT over Ethernet
     2) Configure heartbeat links using LLT over UDP
     3) Automatically detect configuration for LLT over Ethernet
     b) Back to previous menu

How would you like to configure heartbeat links? [1-3,b,q,?] (1)

    Discovering NICs on prod ............................................................................................ Discovered e1000g0 e1000g1 e1000g2 e1000g3

Enter the NIC for the first private heartbeat link on prod: [b,q,?] (e1000g1) e1000g2
Would you like to configure a second private heartbeat link? [y,n,q,b,?] (n) y
Enter the NIC for the second private heartbeat link on prod: [b,q,?] (e1000g1) e1000g3
Would you like to configure a third private heartbeat link? [y,n,q,b,?] (n)
Do you want to configure an additional low-priority heartbeat link? [y,n,q,b,?] (n)
Are you using the same NICs for private heartbeat links on all systems? [y,n,q,b,?] (y)
Can't use string ("VCS60") as a HASH ref while "strict refs" in use at ../scripts/CPIP/Prod/VCS60.pm line 4714, <STDIN> line 6.
bash-3.00#

Following are the lines from script (Error Line Highlighted) did not find much difference between 5.1 (SP1) and 6.0, Also attaching same script FYR:

# ask for all heartbeat links
sub ask_hbnics {
    my($cfg,$padv,$prod,$sys,$sysi,$msg,$cprod);
    my($all,$ayn,%hbn,@en,$dsn,$hb,$hb2,$hb3,$hb4,$hbl,$ip,$port,$rpn,$rsn,$udp_port,$used_port);
    return '' if (Cfg::opt('responsefile'));
    $prod=shift;

    $cfg=Obj::cfg();
    $cprod=CPIC::get('prod');
    $used_port = [];
    for my $sys (@{CPIC::get('systems')}) {
        $sysi=$sys->{sys};
        $padv=$sys->padv;
        if ($all) {
            $hbn{lltlink1}{$sysi}=$en[1];
            $hbn{lltlink2}{$sysi}=$en[2] if ($en[2]);
            $hbn{lltlink3}{$sysi}=$en[3] if ($en[3]);
            $hbn{lltlink4}{$sysi}=$en[4] if ($en[4]);
            $hbn{lltlinklowpri1}{$sysi}=$en[$prod->{max_hipri_links}+1] if ($en[$prod->{max_hipri_links}+1]);
            $cfg->{$sysi}{bonded_nics}=$cfg->{${CPIC::get('systems')}[0]->{sys}}{bonded_nics};
        } else {
            undef(@en);
            $rsn=$rpn=[];
            if (EDRu::inarr($sys,@{CPIC::get('systems')})) {
                Msg::n();
                $msg=Msg::new("Discovering NICs on $sysi", 40, 2398, "$sysi");
                $msg->left;
                $padv=$sys->padv;
                $rsn=$padv->systemnics_sys($sys,1);
                $rpn=$padv->gatewaynics_sys($sys);
                EDRu::arruniq(@$rsn);
                $dsn=join(' ',@$rsn);
                if ($#$rsn<0) {
                    $msg=Msg::new("No NICs discovered", 40, 2399);
                    $msg->right;
                } else {
                    $msg=Msg::new("Discovered $dsn", 40, 2400, "$dsn");
                    $msg->right;
                    #$msg=Msg::new("\nTo use aggregated interfaces for private heartbeat, enter the name of an aggregated interface. \nTo use a NIC for private heartbea
t, enter a NIC which is not part of an aggregated interface.\n");
                    #$msg->print;

Actually I want to try pervious version like 5.0 , can anybody know where I can download this?

Thanks & regards,

Anish

S_Herdejurgen · ‎02-17-2012

Anish asked me to respond to this. I would recommend manual configuration by intalling the product with './installvcs -installonly' and then creating the 3 files you need to get LLT and GAB working manually. (It's easy :)

/etc/llthosts:
0 hosta
1 hostb

/etc/gabtab:
/sbin/gabconfig -c -n2

/etc/llttab:
set-cluster 10
set-node /etc/nodename
set-timer peertrouble:400
link nxge0 /dev/nxge:0 - ether - -
link nxge4 /dev/nxge:4 - ether - -
link-lowpri vnet0 /dev/vnet:0 - ether - -
start

You will need to change the network device names to match your configuration in the llttab file.

Kimberley · ‎02-17-2012

Hi,

I've escalated this to support to see if this is a known bug or if there is additional info they can provide for you. Will update when I hear back.

Best,

Kimberley

Anish · ‎02-17-2012

Thanks Seann, As Suggested I will try to Configure VCS manually and post the result.

Thanks Kimberley, I would like to hear on this problem.

mikebounds · ‎02-17-2012

If you are configuring manually there there are a few more things you need to do:

Create /etc/VRTSvcs/conf/sysname file containing hostname
Check contents of /etc/sysconfig/vcs for line that says ONENODE= to make sure this says no
You may need to copy types.cf file from /etc/VRTSvcs/conf to /etc/VRTSvcs/conf/config
Create a main.cf file on one of the nodes in /etc/VRTSvcs/conf to /etc/VRTSvcs/conf/config as follows:
include "types.cf"
cluster rainbow (
)
system prod ()
system dev ()
Start LLT - I think path is - /etc/rc2.d/S70llt start (both nodes)
Start GAB - I think path is - /etc/rc2.d/S92gab start
Start VCS on the node you created main.cf - /opt/VRTSvcs/bin/hastart
Start VCS on the other node /opt/VRTSvcs/bin/hastart
Run /opt/VRTSvcs/bin/hauser -add admin -priv Administrator

Mike

Anish · ‎02-20-2012

Hi All,

Thanks for your responces, as suggested i tried to configure VCS manually

Following steps are performed :

on Dev node:

bash-3.00# cat /etc/llttab

set-node prod

set-cluster 101

link e1000g2 /dev/e1000g:2 - ether - -

link e1000g3 /dev/e1000g:3 - ether - -

bash-3.00# cat /etc/llthosts

1 dev

2 prod

bash-3.00# cat /etc/VRTSvcs/conf/config/main.cf

include "types.cf"

cluster rainbow (

)

system dev (

)

system prod (

)

bash-3.00# cat /etc/gabtab

/sbin/gabconfig -c -n2

bash-3.00# cat /etc/VRTSvcs/conf/sysname

prod

on prod node:

bash-3.00# cat /etc/llttab

set-node dev

set-cluster 101

link e1000g2 /dev/e1000g:2 - ether - -

link e1000g3 /dev/e1000g:3 - ether - -

bash-3.00# cat /etc/llthosts

1 dev

2 prod

bash-3.00# cat /etc/gabtab

/sbin/gabconfig -c -n2

bash-3.00# cat /etc/VRTSvcs/conf/config/main.cf

include "types.cf"

copied types.cf file from /etc/VRTSvcs/conf to /etc/VRTSvcs/conf/config

bash-3.00# cat /etc/VRTSvcs/conf/sysname

dev

after doing this tried to start llt and gab on both nodes using command

lltconfig -c

and

sh /etc/gabtab

but did not sucessful

so tried to start their SMF (i think in VCS6.0 they have removed /etc/rc2.d/S70llt and /etc/rc2.d/S92gab)

svcadm enable svc:/system/llt:default

svcadm enable svc:/system/gab:default

but still services was going in maintenance after analyzing the logs i found following error messages :

Feb 18 19:14:15 Executing start method ("/lib/svc/method/llt start") ]

This script is not allowed to start LLT. LLT_START is not 1

for this i changed value in following file:

bash-3.00# cat /etc/default/llt

#

# This file is sourced :

# from /etc/init.d/llt for Solaris < 2.10

# from /lib/svc/method/llt for Solaris 2.10

#

# Set the two environment variables below as follows:

#

# 1 = start or stop llt

# 0 = do not start or stop llt

#

LLT_START=1-----------> by default it was set to 0

LLT_STOP=1-----------> by default it was set to 0

same for gab

bash-3.00# cat /etc/default/gab

#

# This file is sourced :

# from /etc/init.d/gab for Solaris < 2.10

# from /lib/svc/method/gab for Solaris 2.10

#

# Set the two environment variables below as follows:

#

# 1 = start or stop gab

# 0 = do not start or stop gab

#

GAB_START=1-----------> by default it was set to 0

GAB_STOP=1-----------> by default it was set to 0

then my both services are up and running on both nodes :

bash-3.00# svcs -a|grep llt

online 9:07:25 svc:/system/llt:default

bash-3.00# svcs -a|grep gab

online 9:07:28 svc:/system/gab:default

then tried to bring VCS services online

but again it was going in to maintainace due to follwowing error :

Feb 18 23:05:26 dev Had[510]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10614 Cluster UUID is not configured or it is empty, on system dev - VCS Stopping. Manually Re

start VCS after configuring Cluster UUID.

to configure run following command on both nodes:

/opt/VRTSvcs/bin/uuidconfig.pl -clus -configure

also changed following values in /etc/default/vcs file:

VCS_START=1

VCS_STOP=1

now all my services are running but still i m getting following problem :

my gab is working properly on both nodes but llt is not communicating other node

output of gab is as following after starting cluster using hastart on both nodes:

bash-3.00# hastart

bash-3.00# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen 12f3f02 membership ;1

Port h gen 12f3f09 membership ;1

bash-3.00# uname -n

dev

bash-3.00# hastart

bash-3.00# hastatus -sum

-- SYSTEM STATE

-- System State Frozen

A dev UNKNOWN 0

A prod RUNNING 0

bash-3.00# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen 12f3f02 membership ; 2

Port h gen 12f3f0b membership ; 2

bash-3.00# uname -n

prod

but output of llt on dev node is :

Port h gen 12f3f09 membership ;1

bash-3.00# uname -n

dev

bash-3.00# lltstat -nl

LLT node information:

Node State Links

* 1 dev OPEN 2

LLT link information:

link 0 e1000g2 on etherfp hipri

mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6

txpkts 3514 txbytes 211939

rxpkts 937 rxbytes 68662

latehb 0 badcksum 0 errors 0

link 1 e1000g3 on etherfp hipri

mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6

txpkts 347 txbytes 24504

rxpkts 281 rxbytes 19328

latehb 0 badcksum 0 errors 0

and on prod :

bash-3.00# lltstat -nl

LLT node information:

Node State Links

* 2 prod OPEN 2

LLT link information:

link 0 e1000g2 on etherfp hipri

mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6

txpkts 3390 txbytes 180168

rxpkts 713 rxbytes 52320

latehb 0 badcksum 0 errors 0

link 1 e1000g3 on etherfp hipri

mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6

txpkts 444 txbytes 31270

rxpkts 257 rxbytes 15827

latehb 0 badcksum 0 errors 0

whereas it should see each other.

due to this may be i m getting output of hastatus -sum on prod:

bash-3.00# hastatus -sum

-- SYSTEM STATE

-- System State Frozen

A dev UNKNOWN 0

A prod RUNNING 0

bash-3.00# uname -n

prod

and on dev node output is :

bash-3.00# hastatus -sum

-- SYSTEM STATE

-- System State Frozen

A dev RUNNING 0

one more observation after starting cluster main.cf file on dev node is auto modified to

bash-3.00# cat /etc/VRTSvcs/conf/config/main.cf

include "types.cf"

cluster vcs (

)

system dev (

)

whereas only include "types.cf" line was present and we have added actual configuration on prod node.

in message file i can see following messages related to llt interfaces for other nodes:

Feb 21 09:07:17 dev e1000g: [ID 801725 kern.info] NOTICE: pci8086,100f - e1000g[3] : link up, 1000 Mbps, full duplex

Feb 21 09:07:17 dev e1000g: [ID 801725 kern.info] NOTICE: pci8086,100f - e1000g[2] : link up, 1000 Mbps, full duplex

Feb 21 09:07:27 dev genunix: [ID 644314 kern.notice] GAB INFO V-15-1-20026 Port a[GAB_Control (refcount 2)] registration waiting for seed port membership

Feb 21 09:07:41 dev syslog[542]: [ID 702911 daemon.notice] VCS INFO V-16-1-11240 Command Server: running with security OFF

Feb 21 09:07:42 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10619 'HAD' starting on: dev

Feb 21 09:07:42 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10620 Waiting for local cluster configuration status

Feb 21 09:07:42 dev genunix: [ID 122464 kern.notice] LLT INFO V-14-1-10499 recvarpreq link 1 for node 2 addr change from 00:00:00:00:00:00 to 00:0C:29:E2:CE:CB

Feb 21 09:07:42 dev genunix: [ID 122464 kern.notice] LLT INFO V-14-1-10499 recvarpreq link 0 for node 2 addr change from 00:00:00:00:00:00 to 00:0C:29:E2:CE:D5

Feb 21 09:07:42 dev genunix: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (e1000g2) node 2 active

Feb 21 09:07:44 dev genunix: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 1 (e1000g3) node 2 active

Feb 21 09:07:49 dev syslog[542]: [ID 702911 daemon.warning] WARNING V-365-1-1 This host is not entitled to run Veritas Storage Foundation/Veritas Cluster Server.

Feb 21 09:07:49 dev As set forth in the End User License Agreement (EULA) you must complete one of the two options set forth below. To comply with this condition of the EULA and stop logging of this message, you have 56 days to either:

Feb 21 09:07:49 dev - make this host managed by a Management Server (see http://go.symantec.com/sfhakeyless for details and free download), or

Feb 21 09:07:49 dev - add a valid license key matching the functionality in use on this host using the command 'vxlicinst' and validate using the command 'vxkeyless set NONE'.

Feb 21 09:07:49 dev genunix: [ID 272960 kern.notice] GAB INFO V-15-1-20036 Port a[GAB_Control (refcount 1)] gen 12f3f01 membership ;12

Feb 21 09:08:04 dev genunix: [ID 773945 kern.info] UltraDMA mode 2 selected

Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property

Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected

Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property

Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected

Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property

Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected

Feb 21 09:08:04 dev genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property

Feb 21 09:08:04 dev genunix: [ID 882269 kern.info] PIO mode 4 selected

Feb 21 09:08:13 dev svc.startd[7]: [ID 122153 daemon.warning] svc:/application/stosreg:default: Method or service exit timed out. Killing contract 95.

Feb 21 09:08:13 dev svc.startd[7]: [ID 636263 daemon.warning] svc:/application/stosreg:default: Method "/lib/svc/method/svc-stosreg" failed due to signal KILL.

Feb 21 09:08:14 dev sendmail[584]: [ID 702911 mail.crit] My unqualified host name (dev) unknown; sleeping for retry

Feb 21 09:08:17 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10625 Local cluster configuration valid

Feb 21 09:08:17 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11034 Registering for cluster membership

Feb 21 09:08:17 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11035 Waiting for cluster membership

Feb 21 09:08:22 dev genunix: [ID 272960 kern.notice] GAB INFO V-15-1-20036 Port h[GAB_USER_CLIENT (refcount 0)] gen 12f3f04 membership ;12

Feb 21 09:08:22 dev Had[497]: [ID 702911 daemon.notice] VCS INFO V-16-1-10077 Received new cluster membership

Feb 21 09:08:23 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10086 System dev (Node '1') is in Regular Membership - Membership: 0x6

Feb 21 09:08:23 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10086 System (Node '2') is in Regular Membership - Membership: 0x6

Feb 21 09:08:26 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10073 Building from local configuration

Feb 21 09:08:26 dev genunix: [ID 577146 kern.notice] NOTICE: VXFEN INFO V-11-1-VxFEN unloaded

Feb 21 09:08:27 dev genunix: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (e1000g3) node 2 in trouble

Feb 21 09:08:27 dev rootnex: [ID 349649 kern.info] xsvc0 at root

Feb 21 09:08:27 dev genunix: [ID 936769 kern.info] xsvc0 is /xsvc

Feb 21 09:08:31 dev pseudo: [ID 129642 kern.info] pseudo-device: devinfo0

Feb 21 09:08:31 dev genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo@0

Feb 21 09:08:31 dev unix: [ID 954099 kern.info] NOTICE: IRQ19 is being shared by drivers with different interrupt levels.

Feb 21 09:08:31 dev This may result in reduced system performance.

Feb 21 09:08:31 dev pci_pci: [ID 370704 kern.info] PCI-device: pci1274,1371@1, audioens0

Feb 21 09:08:31 dev genunix: [ID 936769 kern.info] audioens0 is /pci@0,0/pci15ad,790@11/pci1274,1371@1

Feb 21 09:08:33 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 8 sec (281)

Feb 21 09:08:34 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 9 sec (281)

Feb 21 09:08:35 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 10 sec (281)

Feb 21 09:08:36 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 11 sec (281)

Feb 21 09:08:37 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 12 sec (281)

Feb 21 09:08:38 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 13 sec (281)

Feb 21 09:08:39 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 4 more to go.

Feb 21 09:08:39 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 14 sec (281)

Feb 21 09:08:39 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 3 more to go.

Feb 21 09:08:40 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 2 more to go.

Feb 21 09:08:40 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (e1000g3) node 2 inactive 15 sec (281)

Feb 21 09:08:40 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 1 more to go.

Feb 21 09:08:41 dev genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (e1000g3) node 2. 0 more to go.

Feb 21 09:08:41 dev genunix: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 1 (e1000g3) node 2 expired

Feb 21 09:08:41 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10066 Entering RUNNING state

Feb 21 09:08:47 dev genunix: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (e1000g2) node 2 in trouble

Feb 21 09:08:49 dev Had[497]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-50311 VCS Engine: running with security OFF

Feb 21 09:08:54 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 8 sec (410)

Feb 21 09:08:55 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 10 sec (411)

Feb 21 09:08:56 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 11 sec (412)

Feb 21 09:08:51 dev Had[497]: [ID 702911 daemon.alert] VCS WARNING V-16-1-40184 HAD Self Check: Excessive delay in the HAD heartbeat to GAB

Feb 21 09:08:57 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 12 sec (412)

Feb 21 09:08:58 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 13 sec (413)

Feb 21 09:08:59 dev genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (e1000g2) node 2 inactive 14 sec (413)

same messages are observed on other node where as output of dladm show-dev is as follows:

bash-3.00# uname -n

prod

bash-3.00# dladm show-dev

e1000g0 link: up speed: 1000 Mbps duplex: full

e1000g1 link: up speed: 1000 Mbps duplex: full

e1000g2 link: up speed: 1000 Mbps duplex: full----link used for llt

e1000g3 link: up speed: 1000 Mbps duplex: full---link used for llt

bash-3.00# uname -n

dev

bash-3.00# dladm show-dev

e1000g0 link: up speed: 1000 Mbps duplex: full

e1000g1 link: up speed: 1000 Mbps duplex: full

e1000g2 link: up speed: 1000 Mbps duplex: full----link used for llt

e1000g3 link: up speed: 1000 Mbps duplex: full----link used for llt

if anybody knows solution to above problem pls guide me i think i m one step behind my cluster configuration . Thanks for your support.

Anish

mikebounds · ‎02-21-2012

LLT config looks ok, so problem is probably with configuration of networks in VMWare. Could you plumb in IPs on heartbeats temporarily so that you can test HB links using ping.

Can you also provide output of 2 commands below:

lltstat -nvv
 eeprom | grep mac

Mike

Anish · ‎03-01-2012

Hi Mike,

Sorry for late reply. It is sucessful now.

Actually I was troubleshooting for llt and gab errors only.

As per your post it was right the problem was because of interface only. I was using VMWare workstation 8.0 and there i found that when i add more than three host-only netwrok adapter last interface i.e. e1000g3 was not communicating to other nodes due to which my llt and gab was failing. I tried so many thing finally moved VMware server 2.0 there i dont found any limitations and my configuration is sucessful. Here is the output from both nodes:

I think this must be expected output (Pls correct if i missed somewhere):

on dev node:

bash-3.00# lltstat -nl
LLT node information:
    Node                 State    Links
   * 1 dev               OPEN        2
     2 prod              OPEN        2
LLT link information:
link 0 e1000g1 on etherfp hipri
        mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
        txpkts 15584 txbytes 1158103
        rxpkts 15977 rxbytes 1377027
        latehb 5 badcksum 0 errors 0
link 1 e1000g2 on etherfp hipri
        mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
        txpkts 15416 txbytes 1138179
        rxpkts 15892 rxbytes 1389545
        latehb 5 badcksum 0 errors 0

bash-3.00# uname -n
dev

bash-3.00# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 15f9501 membership ;12
Port h gen 15f9509 membership ;12

bash-3.00# hastatus -sum

-- SYSTEM STATE
-- System State Frozen

A dev RUNNING 0
A prod RUNNING 0
bash-3.00#

on prod node:

bash-3.00# lltstat -nl
LLT node information:
    Node                 State    Links
     1 dev               OPEN        2
   * 2 prod              OPEN        2
LLT link information:
link 0 e1000g1 on etherfp hipri
        mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
        txpkts 15558 txbytes 1356519
        rxpkts 16607 rxbytes 1226122
        latehb 17 badcksum 0 errors 0
link 1 e1000g2 on etherfp hipri
        mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
        txpkts 15642 txbytes 1344401
        rxpkts 16768 rxbytes 1244054
        latehb 17 badcksum 0 errors 0

bash-3.00# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 15f9501 membership ;12
Port h gen 15f9509 membership ;12
bash-3.00# uname -n
prod
bash-3.00# hastatus -sum

-- SYSTEM STATE
-- System State Frozen

A dev RUNNING 0
A prod RUNNING 0
bash-3.00#

Thank you all for supporting .

I know this is just start and long way ahead , I will keep you posted if sometime i will be trouble ....... Thnx once again.

Regards,

Anish

VOX

Error During VCS configuration