Forum Discussion

Zahid_Haseeb's avatar
Zahid_Haseeb
Moderator
12 years ago

Aggragation resource of NIC fauled abnormally

Environment
 
OS = SOlaris10
HA/VCS = 6.0
SIngle Node in CLuster
 
I have two NIC's aggregated. Each NIC is connected with an individual Switch. When I remove one NIC'c cable all things fine, But when I unplugged both cables and fault occur. I only plugged gingle NIC cable. The Resource cleared the fault itself. But just in 2 minutes it got faulted again and clear fault again. I am not able to understand this dehaviour.
 
2012/11/14 22:41:04 VCS INFO V-16-1-10299 Resource AGGR (Owner: Unspecified, Group: PHX-APP) is online on Node-B (Not initiated by VCS)
2012/11/14 22:42:24 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:42:24 VCS ERROR V-16-1-54031 Resource AGGR (Owner: Unspecified, Group: PHX-APP) is FAULTED on sys NODE-B
2012/11/14 22:42:24 VCS INFO V-16-6-15015 (NODE-B) hatrigger:/opt/VRTSvcs/bin/triggers/resfault is not a trigger scripts directory or can not be executed
2012/11/14 22:43:44 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:44:44 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline

2012/11/14 22:45:44 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:46:43 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:47:44 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:48:44 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:49:44 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:50:44 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:51:44 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:52:44 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:53:44 VCS WARNING V-16-10001-7505 (NODE-B) NIC:AGGR:monitor:(2870) is less than or equal to (2870): Resource is offline
2012/11/14 22:54:38 VCS INFO V-16-1-10299 Resource AGGR (Owner: Unspecified, Group: PHX-APP) is online on NODE-B (Not initiated by VCS)
 

 

  • V-16-10001-7505 indicate that number of packets received does not grow after ping to broadcast address.
    Check if packet statitics grows by running "netstat -in -I device_name -f protocol(inet or inet6)". Also check packets send from and reveiced to this NIC using snoop or so.

  • V-16-10001-7505 indicate that number of packets received does not grow after ping to broadcast address.
    Check if packet statitics grows by running "netstat -in -I device_name -f protocol(inet or inet6)". Also check packets send from and reveiced to this NIC using snoop or so.

  • Thanks Yasuhisa for your kind words. I add the Default gateway/Next hope IP Address under the NIC Resource attribute of NetworkHost and I feel the things work 100% fine. I am watching the behaviour of NIC now.

  • Question #1

    Let suppose I am not using the Network Host attribute.. Does applying the Solaris 10 Kernel patch ( Solaris 10 kernel patch 118822-30 with bge driver patch 122027-04 as per the below TN) is the solution ? 

    http://www.symantec.com/business/support/index?page=content&pmv=print&impressions=&viewlocale=&id=TECH48969

    Question #2

    The above TN talks about the bgex NIC's. Will this work with the NIC's name "cex"  ?

    Question #3

    For some reason my Passive Node Aggrigated NIC IP has an IP adderss0.0.0.0 (controlled by Application Resource). Because of this the passive Node will not have any connectivity to the LAN. So does non Network connectivity cannot make Fault the Aggrigated NIC resource (If I installed the Solaris kernel patch and Clear the Network Host Attribute)?

    =================

    Before applying patch as per TN

    #uname -a
    SunOS labdxb01 5.10 Generic_118822-29 sun4u sparc SUNW,A70

    After applying patch as per TN

    bash-3.00# uname -a
    SunOS labdxb01 5.10 Generic_118822-30 sun4u sparc SUNW,A70

    My Solaris 10 patch level

    #uname -a

    SunOS NODE_NAME 5.10 Generic_147440-12 sun4u sparc SUNW,Sun-Fire-V240 Solaris

  • Digging Solaris Patch READMEs, I found that this bge driver update is accumulated by 118833-36. You must have applied 118833-36 because this is required by 147440-12 - your current kernel. And, this bug is applicable to only bge. No need to consider this TN for your case.

    For some reason my Passive Node Aggrigated NIC IP has an IP adderss0.0.0.0 (controlled by Application Resource). Because of this the passive Node will not have any connectivity to the LAN.

    I'm out of my office today and have no access to lab machines. I can not check, but i believe this is a case. Without IP, agent and OS can not recognize/handle ICMP target, so NIC agent can not work correctly. You have to configure at lease one physical IP address on it. Or, consider to make your own agent which can cooperate with your application resource and monitor NIC without NIC agent