08-13-2012 01:05 PM
Environment
Solaris 9
Two Node Cluster
SFHA installed
Engine_A.Log
2012/08/11 07:25:59 VCS ERROR V-16-1-10303 Resource XXX (Owner: Unspecified, Group: xx-sg) is FAULTED (timed out) on sys SEC-XXX
Dmesg
Aug 11 07:24:56 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link down
Aug 11 07:25:00 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link up 100Mbps Full-Duplex
Aug 11 07:25:23 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link down
Aug 11 07:25:28 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link up 100Mbps Full-Duplex
Aug 11 07:25:59 SEC-XXX Had[3797]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource XXX (Owner: Unspecified, Group: xxx-sg) is FAULTED (timed out) on sys SEC-XXX
Aug 11 07:26:38 SEC-XXX Had[3797]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10205 Group xxx-sg is faulted on system SEC-XXX
Aug 11 07:26:55 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link down
Aug 11 07:27:00 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link up 100Mbps Full-Duplex
Aug 11 07:28:00 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link down
Aug 11 07:28:05 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link up 100Mbps Full-Duplex
Aug 11 07:28:11 SEC-XXX Had[3797]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource XXX (Owner: Unspecified, Group: xxx-sg) is FAULTED (timed out) on sys SEC-XXX
Aug 11 07:29:11 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link down
Aug 11 07:29:15 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link up 100Mbps Full-Duplex
Aug 11 07:29:17 SEC-XXX bge: [ID 801593 kern.notice] NOTICE: bge3: link down
It seems that bge3 faulted (if we see the dmesg logs) thats why the service group failed over to partner node. The bge3 is not a public NIC. (its a NIC through which a hardware device is connected which verify the application queries. This Hardware device is connected to switch and from switch two ethernet cables connected on each nodes bge3 because both nodes can see this device via bge3 ) But as per the error code ""VCS ERROR V-16-1-10303"" its saying something different as per the below TN. Comments required on the above logs
https://sort.symantec.com/ecls/umi/V-16-1-10303
08-13-2012 08:29 PM
V-16-1-10303 is a generic message logged by VCS engine (had) for any resource which faults due to entrypoint timeout. In the technote the V-16-1-10303 ERROR message is logged for cvm_clus resource.
Hope this helps!
Regards,
Venkat
08-14-2012 01:21 AM
That VCS log entry is generic and can apply to different resource types.
In this instance, I assume that the problem resource was dependent on the network
cheers
tony
08-14-2012 02:21 AM
Did you have more messages before this one - I would guess after bge3 failed the VCS monitor timed out for the application which uses bge3 - if the monitor times out 4 times in a row (determined by resource Type attribute FaultOnMonitorTimeOuts), then the resource fails. If this was the case then you should have seen messages in the VCS engine log to this affect.
Mike
08-14-2012 02:27 AM
The resource which was faulted is actually a NIC but not public nither private
08-14-2012 07:29 AM
If bge3 resource is marked as Critical (default), failure will cause failover.
Please review these topics in VCS Admin Guide (see https://sort.symantec.com/documentation )
Controlling VCS behavior
VCS behavior on resource faults
08-14-2012 11:36 AM
Thanks all for kind words
The bge3 is confgured as a resource in Service Group.
====
08-16-2012 09:55 PM
Did you have more messages before this one -
- if the monitor times out 4 times in a row
Logs are attached for reference:
Yes I see more messages and its three time in a row.
08-17-2012 03:19 AM
The log shows that the NIC went offline on SEC-XXX, causing Faulted state.
(SEC-XXX) NIC:XXX:monitor:.......: Resource is offline
Resource XXX .... is FAULTED (timed out) on sys SEC-XXX
VCS then did what it is supposed to do: Offline the rest of the SG, and failover to PRI-XXX :
Initiating Offline of Resource VirtualIP.....
Initiating Offline of Resource XXX-APP ....
Initiating Offline of Resource Mount ....
Initiating Offline of Resource VMDG .....
Group xxx-sg is faulted on system SEC-XXX
Group xxx-sg is offline on system SEC-XXX
Evaluating PRI-XXX as potential target node for group xxx-sg
...
Initiating Online of Resource VMDG .... on System PRI-XXX
Initiating Online of Resource Mount ...
Initiating Online of Resource XXX-APP ... on System PRI-XXX
Initiating Online of Resource VirtualIP ...
Group xxx-sg is online on system PRI-XXX
Group xxx-sg failed over to system PRI-XXX
So, VCS did what it was supposed to do.
You need to troubleshoot bge3 on SEC-XXX.