MULTINICB resource faulty and not getting cleared
Hi Team,
I am seeing MultinicB resource fault as shown below
D Ossfs Proxy ossfs_p1 et-coreg-admin2
D PubLan MultiNICB pub_mnic et-coreg-admin2
D Sybase1 Proxy syb1_p1 et-coreg-admin2
Pub_mnic is faulted and in turn proxy resources that mirror the status of MUltinICB resources.
Below error seen on 3rd June
Jun 3 10:39:17 et-coreg-admin2 in.mpathd[6604]: [ID 168056 daemon.error] All Interfaces in group pub_mnic have failed
Jun 3 10:39:18 et-coreg-admin2 Had[6102]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys et-coreg-admin2
As of now interfaces seems ok and network is ok.
I want to clear this resource but being a Persistent resource it should recover itself once network issue resolved.
# ifconfig -a
lo0: flags=1001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8131 index 1
inet 117.0.0.1 netmask ff000000
bnxe0: flags=19040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 1
inet 10.106.111.66 netmask ffffff80 broadcast 10.106.111.117
groupname pub_mnic
ether 14:58:d0:54:18:18
bnxe0:1: flags=11000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FAILED> mtu 1500 index 1
inet 10.106.111.70 netmask ffffff80 broadcast 10.106.111.117
bnxe1: flags=19040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 3
inet 10.106.111.68 netmask ffffff80 broadcast 10.106.111.117
groupname pub_mnic
ether 14:58:d0:54:18:1c
hares -display pub_mnic
#Resource Attribute System Value
pub_mnic Group global PubLan
pub_mnic Type global MultiNICB
pub_mnic AutoStart global 1
pub_mnic Critical global 1
pub_mnic Enabled global 1
pub_mnic LastOnline global admin1
pub_mnic MonitorOnly global 0
pub_mnic ResourceOwner global
pub_mnic TriggerEvent global 0
pub_mnic ArgListValues admin1 UseMpathd 1 1 MpathdCommand 1 /usr/lib/inet/in.mpathd ConfigCheck 1 1 MpathdRestart 1 1 Device 4 bnxe0 0 bnxe1 1 NetworkHosts 1 10.106.111.51 LinkTestRatio 1 1 IgnoreLinkStatus 1 1 NetworkTimeout 1 100 OnlineTestRepeatCount 1 3 OfflineTestRepeatCount 1 3 NoBroadcast 1 0 DefaultRouter 1 0.0.0.0 Failback 1 0 GroupName 1 "" Protocol 1 IPv4
pub_mnic ArgListValues admin1 UseMpathd 1 1 MpathdCommand 1 /usr/lib/inet/in.mpathd ConfigCheck 1 1 MpathdRestart 1 1 Device 4 bnxe0 0 bnxe1 1 NetworkHosts 1 10.106.111.51 LinkTestRatio 1 1 IgnoreLinkStatus 1 1 NetworkTimeout 1 100 OnlineTestRepeatCount 1 3 OfflineTestRepeatCount 1 3 NoBroadcast 1 0 DefaultRouter 1 0.0.0.0 Failback 1 0 GroupName 1 "" Protocol 1 IPv4
pub_mnic ConfidenceLevel admin1 0
pub_mnic ConfidenceLevel admin1 0
pub_mnic ConfidenceMsg admin1
pub_mnic ConfidenceMsg admin1
pub_mnic Flags admin1
pub_mnic Flags admin1
pub_mnic IState admin1 not waiting
pub_mnic IState admin1 not waiting
pub_mnic MonitorMethod admin1 Traditional
pub_mnic MonitorMethod admin1 Traditional
pub_mnic Probed admin1 1
pub_mnic Probed admin1 1
pub_mnic Start admin1 0
pub_mnic Start admin1 0
pub_mnic State admin1 ONLINE
pub_mnic State admin1 FAULTED
pub_mnic ComputeStats global 0
pub_mnic ConfigCheck global 1
pub_mnic DefaultRouter global 0.0.0.0
pub_mnic Failback global 0
pub_mnic GroupName global
pub_mnic IgnoreLinkStatus global 1
pub_mnic LinkTestRatio global 1
pub_mnic MpathdCommand global /usr/lib/inet/in.mpathd
pub_mnic MpathdRestart global 1
pub_mnic NetworkHosts global 10.106.111.51
pub_mnic NetworkTimeout global 100
pub_mnic NoBroadcast global 0
pub_mnic OfflineTestRepeatCount global 3
pub_mnic OnlineTestRepeatCount global 3
pub_mnic Protocol global IPv4
pub_mnic TriggerResStateChange global 0
pub_mnic UseMpathd global 1
pub_mnic ContainerInfo admin1 Type Name Enabled
pub_mnic ContainerInfo admin1 Type Name Enabled
pub_mnic Device admin1 bnxe0 0 bnxe1 1
pub_mnic Device admin1 bnxe0 0 bnxe1 1
pub_mnic MonitorTimeStats admin1 Avg 0 TS
pub_mnic MonitorTimeStats admin1 Avg 0 TS
pub_mnic ResourceInfo admin1 State Valid Msg TS
pub_mnic ResourceInfo admin1 State Stale Msg TS
Please help to solve this asap
I see the failed flag is still present in the ifconfig output. Clear the FAILED flag from OS ( if_mpadm -d <interface>, if_mpadm -r <interface> ) and try clearing the fault once again.
# ifconfig -a
lo0: flags=1001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8131 index 1
inet 117.0.0.1 netmask ff000000
bnxe0: flags=19040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 1
inet 10.106.111.66 netmask ffffff80 broadcast 10.106.111.117
groupname pub_mnic
ether 14:58:d0:54:18:18
bnxe0:1: flags=11000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FAILED> mtu 1500 index 1
inet 10.106.111.70 netmask ffffff80 broadcast 10.106.111.117
bnxe1: flags=19040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 3
inet 10.106.111.68 netmask ffffff80 broadcast 10.106.111.117
groupname pub_mnic
ether 14:58:d0:54:18:1c