Forum Discussion

symsonu's avatar
symsonu
Level 6
12 years ago

Need help in Nic resource fault

 

We are experiencing below mentioned faults every now and then that affects our backup.

setup is like we have BkupLan service group in which we have NIC resource (bkup_nic) and Phantom resource .

Two service groups OSSfs and sybase1 have proxy resource monitoring this NIC resource.
and both are configured with Ip resources Ossbak_ip1 and sybbak_ip1.

The issue is Nic resource failed with below message and in turn proxy and ip resources failed.
Then NIC comes back and proxy and ip come back  too.
Below is the configuration too
Please advice why it goen offline/online frequently

2013/10/24 22:32:03 VCS WARNING V-16-10001-7506 (ossadm2) NIC:bkup_nic:monitor:Resource is offline. No Network Host could be reached

2013/10/24 22:32:03 VCS INFO V-16-2-13716 (ossadm2) Resource(bkup_nic): Output of the completed operation (monitor)
==============================================
Broken Pipe
==============================================

2013/10/24 22:32:03 VCS ERROR V-16-1-10303 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is FAULTED (timed out) on sys ossadm2
2013/10/24 22:32:03 VCS INFO V-16-6-0 (ossadm2) resfault:(resfault) Invoked with arg0=ossadm2, arg1=bkup_nic, arg2=ONLINE
2013/10/24 22:32:03 VCS INFO V-16-0 (ossadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=ossadm2 ,arg2=bkup_nic2013/10/24 22:32:03 VCS INFO V-16-6-15002 (ossadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault ossadm2 bkup_nic ONLINE  successfully
2013/10/24 22:32:20 VCS ERROR V-16-1-10303 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys ossadm2

 

 

 

I chnage the Tolerance limit of  NIC na dproxy agent to 2

 

root@ossadm1> hatype -display NIC

NIC          ToleranceLimit         2

oot@ossadm1> hatype -display Proxy

Proxy        ToleranceLimit         2

 

But still it failed like below

 

:57:28 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is FAULTED (timed out) on sys ossadm2
Nov  6 00:58:01 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource syb1bak_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys ossadm2
Nov  6 00:58:01 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys ossadm2

 

It didnot say That syb1bak_p1 fauled and tolerance limit 2 not reached  something like this to stay up for another monitor interval

Do we need to set anything else so that proxy resoources dnt fault immediately and wait for sometime.

 

  • You resource looks as though it is timing out - i.e the monitor is NOT returning "This resource is down" - the monitor times out, so the state is unknown and in this case the attribute FaultOnMonitorTimeout is applicable which is set to 4 by default ,so it should have to timeout 4 times in a row before faulting, but you can increase FaultOnMonitorTimeout if you want.

    Mike

  • You resource looks as though it is timing out - i.e the monitor is NOT returning "This resource is down" - the monitor times out, so the state is unknown and in this case the attribute FaultOnMonitorTimeout is applicable which is set to 4 by default ,so it should have to timeout 4 times in a row before faulting, but you can increase FaultOnMonitorTimeout if you want.

    Mike

  • Youcan also look at the hosts mentioned in NetworkHosts attribute is reachable from you cluster node or not.

    The message "Resource is offline. No Network Host could be reached" comes when hosts in NetworkHosts are not pingable or trafiic is not observed through the interface during the ping.