Need help in Nic resource fault
We are experiencing below mentioned faults every now and then that affects our backup.
setup is like we have BkupLan service group in which we have NIC resource (bkup_nic) and Phantom resource .
Two service groups OSSfs and sybase1 have proxy resource monitoring this NIC resource.
and both are configured with Ip resources Ossbak_ip1 and sybbak_ip1.
The issue is Nic resource failed with below message and in turn proxy and ip resources failed.
Then NIC comes back and proxy and ip come back too.
Below is the configuration too
Please advice why it goen offline/online frequently
2013/10/24 22:32:03 VCS WARNING V-16-10001-7506 (ossadm2) NIC:bkup_nic:monitor:Resource is offline. No Network Host could be reached
2013/10/24 22:32:03 VCS INFO V-16-2-13716 (ossadm2) Resource(bkup_nic): Output of the completed operation (monitor)
==============================================
Broken Pipe
==============================================
2013/10/24 22:32:03 VCS ERROR V-16-1-10303 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is FAULTED (timed out) on sys ossadm2
2013/10/24 22:32:03 VCS INFO V-16-6-0 (ossadm2) resfault:(resfault) Invoked with arg0=ossadm2, arg1=bkup_nic, arg2=ONLINE
2013/10/24 22:32:03 VCS INFO V-16-0 (ossadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=ossadm2 ,arg2=bkup_nic2013/10/24 22:32:03 VCS INFO V-16-6-15002 (ossadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault ossadm2 bkup_nic ONLINE successfully
2013/10/24 22:32:20 VCS ERROR V-16-1-10303 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys ossadm2
I chnage the Tolerance limit of NIC na dproxy agent to 2
root@ossadm1> hatype -display NIC
NIC ToleranceLimit 2
oot@ossadm1> hatype -display Proxy
Proxy ToleranceLimit 2
But still it failed like below
:57:28 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is FAULTED (timed out) on sys ossadm2
Nov 6 00:58:01 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource syb1bak_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys ossadm2
Nov 6 00:58:01 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys ossadm2
It didnot say That syb1bak_p1 fauled and tolerance limit 2 not reached something like this to stay up for another monitor interval
Do we need to set anything else so that proxy resoources dnt fault immediately and wait for sometime.
You resource looks as though it is timing out - i.e the monitor is NOT returning "This resource is down" - the monitor times out, so the state is unknown and in this case the attribute FaultOnMonitorTimeout is applicable which is set to 4 by default ,so it should have to timeout 4 times in a row before faulting, but you can increase FaultOnMonitorTimeout if you want.
Mike