cancel
Showing results for 
Search instead for 
Did you mean: 

Need help in Nic resource fault

symsonu
Level 6

 

We are experiencing below mentioned faults every now and then that affects our backup.

setup is like we have BkupLan service group in which we have NIC resource (bkup_nic) and Phantom resource .

Two service groups OSSfs and sybase1 have proxy resource monitoring this NIC resource.
and both are configured with Ip resources Ossbak_ip1 and sybbak_ip1.

The issue is Nic resource failed with below message and in turn proxy and ip resources failed.
Then NIC comes back and proxy and ip come back  too.
Below is the configuration too
Please advice why it goen offline/online frequently

2013/10/24 22:32:03 VCS WARNING V-16-10001-7506 (ossadm2) NIC:bkup_nic:monitor:Resource is offline. No Network Host could be reached

2013/10/24 22:32:03 VCS INFO V-16-2-13716 (ossadm2) Resource(bkup_nic): Output of the completed operation (monitor)
==============================================
Broken Pipe
==============================================

2013/10/24 22:32:03 VCS ERROR V-16-1-10303 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is FAULTED (timed out) on sys ossadm2
2013/10/24 22:32:03 VCS INFO V-16-6-0 (ossadm2) resfault:(resfault) Invoked with arg0=ossadm2, arg1=bkup_nic, arg2=ONLINE
2013/10/24 22:32:03 VCS INFO V-16-0 (ossadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=ossadm2 ,arg2=bkup_nic2013/10/24 22:32:03 VCS INFO V-16-6-15002 (ossadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault ossadm2 bkup_nic ONLINE  successfully
2013/10/24 22:32:20 VCS ERROR V-16-1-10303 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys ossadm2

 

 

 

I chnage the Tolerance limit of  NIC na dproxy agent to 2

 

root@ossadm1> hatype -display NIC

NIC          ToleranceLimit         2

oot@ossadm1> hatype -display Proxy

Proxy        ToleranceLimit         2

 

But still it failed like below

 

:57:28 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is FAULTED (timed out) on sys ossadm2
Nov  6 00:58:01 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource syb1bak_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys ossadm2
Nov  6 00:58:01 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys ossadm2

 

It didnot say That syb1bak_p1 fauled and tolerance limit 2 not reached  something like this to stay up for another monitor interval

Do we need to set anything else so that proxy resoources dnt fault immediately and wait for sometime.

 

1 ACCEPTED SOLUTION

Accepted Solutions

mikebounds
Level 6
Partner Accredited

You resource looks as though it is timing out - i.e the monitor is NOT returning "This resource is down" - the monitor times out, so the state is unknown and in this case the attribute FaultOnMonitorTimeout is applicable which is set to 4 by default ,so it should have to timeout 4 times in a row before faulting, but you can increase FaultOnMonitorTimeout if you want.

Mike

View solution in original post

2 REPLIES 2

mikebounds
Level 6
Partner Accredited

You resource looks as though it is timing out - i.e the monitor is NOT returning "This resource is down" - the monitor times out, so the state is unknown and in this case the attribute FaultOnMonitorTimeout is applicable which is set to 4 by default ,so it should have to timeout 4 times in a row before faulting, but you can increase FaultOnMonitorTimeout if you want.

Mike

sajith_cr
Level 4
Employee

Youcan also look at the hosts mentioned in NetworkHosts attribute is reachable from you cluster node or not.

The message "Resource is offline. No Network Host could be reached" comes when hosts in NetworkHosts are not pingable or trafiic is not observed through the interface during the ping.