cancel
Showing results for 
Search instead for 
Did you mean: 

LLT: node1 in trouble

br1
Level 3

Hello All,

Recently this message started appearing on the server.

Oct 14 08:38:15 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble
Oct 14 08:38:15 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active
Oct 14 08:38:42 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble
Oct 14 08:38:43 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active
Oct 14 08:38:45 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble
Oct 14 08:38:45 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active
Oct 14 08:38:55 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble
Oct 14 08:38:56 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active
Oct 14 08:39:01 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble
Oct 14 08:39:05 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active
Oct 14 08:39:05 db1 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019 delayed hb 600 ticks from 1 link 0 (ce1)
Oct 14 08:39:05 db1 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost 11 hb seq 19344285 from 1 link 0 (ce1)

The messages dates back to sept 20 till today. Message is from Oct 14. 

bash-2.05$ lltstat -nvv|head
LLT node information:

    Node                 State    Link  Status  Address

   * 0 db1          OPEN
                               ce1   UP      00:03:BA:93:
                               ce6   UP      00:03:BA:85:

     1 db2          OPEN
                                ce1   UP      00:03:BA:93:
                                ce6   UP      00:03:BA:95:

     2                   CONNWAIT
                                 ce1   DOWN

Any advice is greatly apperciated, thank you.

4 REPLIES 4

g_lee
Level 6

Are either of these low-pri links?

Starting with 5.0MP3, Symantec has added the feature of one-way link detection in LLT (Etrack Incident# 1031514)

See LBN for 5.0 (Solaris) - refer to the following recommendation:

"If the following messages are seen frequently for an LLT lo-pri link, then change the LLT tunable named peertrouble to 400. Its default value is 200."

http://www.symantec.com/business/support/index?page=content&id=TECH46439

If this isn't the case, please provide the following:

- OS/platform
- VCS version
- Output of # cat /etc/llttab
- any other network related error messages seen on the cluster node(s)

 

Daniel_Matheus
Level 4
Employee Accredited Certified

Do you use bonded NICs?

How are the LLT links connected, crossover or through switches/vlans?

Do both links have seperat network connection?

If both use the same network this will also confuse LLT.

Are there any IP adresses for public LAN configured on the LLT links?

Please also make sure that each NIC has a unique MAC address, in your output the MAC addresses of both ce1 NICs look the same, but they are truncated so I can't tell for sure.

 

regards,
Dan

 

br1
Level 3

Hi,

The OS is Solaris 9 SPARC
VCS version 4.0

bash-2.05$ cat /etc/llttab
set-node db1
set-cluster 10
link ce1 /dev/ce:1 - ether - -
link ce6 /dev/ce:6 - ether - -

This is a 2 node cluster.
This was seen on the secondary node 
Oct 25 09:46:49 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 0 in trouble
Oct 25 09:46:55 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 0 inactive 8 sec (22635904)
Oct 25 09:46:56 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 0 inactive 9 sec (22635904)
Oct 25 09:46:57 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 0 inactive 10 sec (22635904)
Oct 25 09:46:58 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 0 inactive 11 sec (22635904)
Oct 25 09:46:59 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 0 inactive 12 sec (22635904)
Oct 25 09:47:00 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 0 inactive 13 sec (22635904)
Oct 25 09:47:01 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 0 inactive 14 sec (22635904)
Oct 25 09:47:02 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 0 inactive 15 sec (22635904)
Oct 25 09:47:03 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 0 inactive 16 sec (22635904)
Oct 25 09:47:03 db1 llt: [ID 911753 kern.notice] LLT INFO V-14-1-10033 link 0 (ce1) node 0 expired
Oct 25 09:53:09 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 0 active
Oct 25 09:53:14 db1 gab: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port h gen   1c7510 membership 01
Oct 25 09:53:14 db1 gab: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port a gen   1c750d membership 01
Oct 25 09:53:14 db1 Had[712]: [ID 702911 daemon.notice] VCS INFO V-16-1-10077 Received new cluster membership
Oct 25 09:53:14 db1 Had[712]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10086 System db1 (Node '1') is in Regular Membership - Membership: 0x3
 
 
On the Primary Node:
Oct 25 09:46:18 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 8 sec (5972550)
Oct 25 09:46:19 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 9 sec (5972550)
Oct 25 09:46:20 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 10 sec (5972550)
Oct 25 09:46:21 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 11 sec (5972550)
Oct 25 09:46:22 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 12 sec (5972550)
Oct 25 09:46:23 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 13 sec (5972550)
Oct 25 09:46:24 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 14 sec (5972550)
Oct 25 09:46:25 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 15 sec (5972550)
Oct 25 09:46:26 db1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 16 sec (5972550)
Oct 25 09:46:26 db1 llt: [ID 911753 kern.notice] LLT INFO V-14-1-10033 link 0 (ce1) node 1 expired
Oct 25 09:46:31 db1 gab: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port a gen   1c750d membership 01
Oct 25 09:46:31 db1 gab: [ID 608499 kern.notice] GAB INFO V-15-1-20037 Port a gen   1c750d   jeopardy ;1
Oct 25 09:46:31 db1 gab: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port h gen   1c7510 membership 01
Oct 25 09:46:31 db1 gab: [ID 608499 kern.notice] GAB INFO V-15-1-20037 Port h gen   1c7510   jeopardy ;1
Oct 25 09:46:31 db1 Had[1982]: [ID 702911 daemon.notice] VCS INFO V-16-1-10077 Received new cluster membership
Oct 25 09:46:31 db1 Had[1982]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10087 System db2 (Node '1') is in Regardy Membership - Membership: 0x3, Jeopardy: 0x2

 

jcyemm
Level 2
Accredited Certified

As asked before, how are the links connected? Crossover cables, intermediate hub (gotta ask), intermediate switch, ?

Is there anything else on the same fabric as ce1 that you can use to test each ce1 interface?

The messages say that node 1 link 0 has the problem. Note that, in the case of a two-node cluster, node 0 will *never* be reported as the problem, even if it really is.

First thing you can do to stop jeopardy state would be to define a link-lowpri on another interface - use UDP if needed.

Now, for troubleshooting ...

First, I would check messages file to see if you are losing physical link.

Second, test network connectivity between the two ce1 links (put IP addresses on them and do a ping test).

Third, if you've got another interface on the same ce1 fabric, try connecting to that interface ...

Fourth, now that you've got the link-lowpri in place, try new crossover cable between ce1 links.

Good luck, these are rarely easy. I have solved them more than once by buying a cheap switch at a big box store and hooking the cables up to that ... especially when the Network Group repeatedly claims "it's not us" ... until suddenly and inarguably it is ;)