01-20-2013 11:42 AM
My cluster was configured some time ago and had two dedicated and one low priority LLT links.
About a week ago I found that one of the dedicated links is down. It turned out that ports on the switch were disabled.
So, I asked to enable them and OS on both nodes started to see interfaces running. But the cluster still shows that one link is down.
Please find below the lltstat output from both nodes and llttab files from both nodes:
As you can see each node sees its own links but only two from the other node. I thought that restart of the cluster and perhaps the node will help, but not sure. I also did not want to restart aplication if it is not necessary.
My question is will it help to stop cluster with hastop -all -force command and then restart or reconfigure LLT links or I really have to stop everything, fix LLT links and then start the app. Only one node of the cluster is active the other does not have any application services running.
node-ora01:root ~ # lltstat -n
LLT node information:
Node State Links
* 0 node-ora01 OPEN 3
1 node-ora02 OPEN 2
node-ora01:root ~ # cat /etc/llttab
set-node node-ora01
set-cluster 3415
link eth1 eth-9c:8e:99:fa:21:0a - ether - -
link eth3 eth-9c:8e:99:fa:21:0e - ether - -
link-lowpri bond0 bond0 - ether - -
node-ora02:root ~ # lltstat -n
LLT node information:
Node State Links
0 node-ora01 OPEN 2
* 1 node-ora02 OPEN 3
node-ora02:root ~ # cat /etc/llttab
set-node node-ora02
set-cluster 3415
link eth1 eth-9c:8e:99:f9:ec:bc - ether - -
link eth3 eth-9c:8e:99:f9:ec:c0 - ether - -
link-lowpri bond0 bond0 - ether - -
Solved! Go to Solution.
01-31-2013 01:48 AM
I did get down time for the cluster but it turned out that something wrong with ports on the switch or cables.
So far only node itself sees NIC is up and running, other node does not.
I also found how to enable or disable LLT link without restarting the whole stack or the cluster:
lltconfig -u eth3
lltconfig -t eth3 -d eth3
Thank you all.
Leonid
01-20-2013 01:05 PM
Can you post output of "lltstat -nvv from each node. It sounds as though the connection is broke between a pair of the interfaces - you can test this by plumbing in IPs to on broken link to see if you ping.
Mike
01-20-2013 11:31 PM
Please find below lltstat -nvv output:
01-21-2013 12:25 AM
This output means the connection between eth3 is down. The output should be interpreted as follows:
01-21-2013 11:41 PM
Yes, I understand that.
The way I tried to restart it today morning did not help. Perhaps, I did not follow the right sequence.
I'll try it on my test cluster first and then try again.
Leonid
01-22-2013 01:26 AM
Perhaps I wasn't clear enough in my last post - this is not an issue with the cluster or the node - it is an issue with the cables or the switch. You do not need to restart anything on the host for it to see a network connection that was previously broken. The only possible problem with the host is if the eth3 interfaces on each machine are running at different speeds, but if it was working previously and you have not changed anything, then this is very unlikely. As I said earlier, to verify link is down, what I would do is to plumb IPs on the interfaces - example:
First test eth1 works with IPs (i.e you are testing there are no firewalls that allows LLT and not ping)
plumb 1.1.1.1, mask 255.255.255.0 on eth1, node-ora01
plumb 1.1.1.2, mask 255.255.255.0 on eth1,node-ora02
Then test you can ping 1.1.1.2 from node-ora01 and if it doesn't work, you could try ssh, traceroute or "telnet 1.1.1.2 port" to see if other ports work. You can test connection the other way too.
Once you have verified this works - then test eth3:
plumb 1.1.3.1, mask 255.255.255.0 on eth3, node-ora01
plumb 1.1.3.2, mask 255.255.255.0 on eth3, node-ora02
Repeat connection tests for 1.1.3.2 from node-ora01
Mike
01-22-2013 01:32 AM
I see. No, I am not giving up. ;)
I just take a step back to see what I can do. I will do some testings with IP addresses and check speed.
Then try to play it on my test cluster. I'll report in couple of days what become a solution.
Leonid
01-30-2013 03:33 PM
Hi Leonid,
If you are only running VCS with gab ports a and h, then yes you can force stop VCS with applications
running and then restart GAB / LLT for eth3 to come up. But as others mentioned, make sure eth3
is connected between nodes via pinging an temp IP.
Regards
Srini
01-31-2013 01:48 AM
I did get down time for the cluster but it turned out that something wrong with ports on the switch or cables.
So far only node itself sees NIC is up and running, other node does not.
I also found how to enable or disable LLT link without restarting the whole stack or the cluster:
lltconfig -u eth3
lltconfig -t eth3 -d eth3
Thank you all.
Leonid