Forum Discussion

symsonu's avatar
Level 6
12 years ago

need to find rootcause for service failure in veritas clusster

Hello Guys,

I need to find the rootcause for the service failure in our veritas cluster.

service groups didnot failover to other node.

Below are the logs as i can see all this strated with NIcs failure and IPMULTINICB resource going  faulty.

if anyone can help me here

engine logs


2013/05/17 12:41:41 VCS INFO V-16-2-13075 (pk-ercoss1) Resource(ossfs_ip) has reported unexpected OFFLINE 1 times, which is still within the Tol
2013/05/17 12:41:42 VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys pk-ercoss1
2013/05/17 12:41:42 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pub_p (Owner: Unspecified, Group: PubLan) on System pk-ercoss1
2013/05/17 12:41:42 VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys pk-ercoss2
2013/05/17 12:41:42 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pub_p (Owner: Unspecified, Group: PubLan) on System pk-ercoss2
2013/05/17 12:41:43 VCS INFO V-16-6-0 (pk-ercoss1) resfault:(resfault) Invoked with arg0=pk-ercoss1, arg1=pub_mnic, arg2=ONLINE
2013/05/17 12:41:43 VCS INFO V-16-6-0 (pk-ercoss2) resfault:(resfault) Invoked with arg0=pk-ercoss2, arg1=pub_mnic, arg2=ONLINE
2013/05/17 12:41:43 VCS INFO V-16-0 (pk-ercoss1) resfault:( Invoked with arg0=/ericsson/core/cluster/scripts/, arg1=pk-er
coss1 ,arg2=pub_mnic
2013/05/17 12:41:43 VCS INFO V-16-0 (pk-ercoss2) resfault:( Invoked with arg0=/ericsson/core/cluster/scripts/, arg1=pk-er
coss2 ,arg2=pub_mnic
2013/05/17 12:41:43 VCS INFO V-16-6-15002 (pk-ercoss1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault pk-ercoss1 pub_mnic ONLIN
E  successfully
2013/05/17 12:41:43 VCS INFO V-16-6-15002 (pk-ercoss2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault pk-ercoss2 pub_mnic ONLIN
E  successfully
2013/05/17 12:41:43 VCS INFO V-16-1-10305 Resource pub_p (Owner: Unspecified, Group: PubLan) is offline on pk-ercoss2 (VCS initiated)
2013/05/17 12:41:43 VCS ERROR V-16-1-10205 Group PubLan is faulted on system pk-ercoss2
2013/05/17 12:41:43 VCS NOTICE V-16-1-10446 Group PubLan is offline on system pk-ercoss2
2013/05/17 12:41:43 VCS INFO V-16-1-10493 Evaluating pk-ercoss2 as potential target node for group PubLan
2013/05/17 12:41:43 VCS INFO V-16-1-50010 Group PubLan is online or faulted on system pk-ercoss2
2013/05/17 12:41:43 VCS INFO V-16-1-10493 Evaluating pk-ercoss1 as potential target node for group PubLan
2013/05/17 12:41:43 VCS INFO V-16-1-50010 Group PubLan is online or faulted on system pk-ercoss1
2013/05/17 12:41:43 VCS NOTICE V-16-1-10235 Restart is set for group PubLan. Group will be brought online if fault on persistent resource clears
. If group is brought online anywhere else from AutoStartList or manually, then Restart will be reset
2013/05/17 12:41:43 VCS INFO V-16-1-10305 Resource pub_p (Owner: Unspecified, Group: PubLan) is offline on pk-ercoss1 (VCS initiated)
2013/05/17 12:41:43 VCS ERROR V-16-1-10205 Group PubLan is faulted on system pk-ercoss1
2013/05/17 12:41:43 VCS NOTICE V-16-1-10446 Group PubLan is offline on system pk-ercoss1
2013/05/17 12:41:43 VCS INFO V-16-1-10493 Evaluating pk-ercoss2 as potential target node for group PubLan
2013/05/17 12:41:43 VCS INFO V-16-1-50010 Group PubLan is online or faulted on system pk-ercoss2
2013/05/17 12:41:43 VCS INFO V-16-1-10493 Evaluating pk-ercoss1 as potential target node for group PubLan
2013/05/17 12:41:43 VCS INFO V-16-1-50010 Group PubLan is online or faulted on system pk-ercoss1
2013/05/17 12:41:43 VCS NOTICE V-16-1-10235 Restart is set for group PubLan. Group will be brought online if fault on persistent resource clears
. If group is brought online anywhere else from AutoStartList or manually, then Restart will be reset
2013/05/17 12:41:44 VCS INFO V-16-6-0 (pk-ercoss2) postoffline:(postoffline) Invoked with arg0=pk-ercoss2, arg1=PubLan
2013/05/17 12:41:44 VCS INFO V-16-6-0 (pk-ercoss2) postoffline:Executing /ericsson/core/cluster/scripts/ with arg0=pk-ercoss2, arg
2013/05/17 12:41:44 VCS INFO V-16-6-0 (pk-ercoss1) postoffline:(postoffline) Invoked with arg0=pk-ercoss1, arg1=PubLan
2013/05/17 12:41:44 VCS INFO V-16-6-0 (pk-ercoss1) postoffline:Executing /ericsson/core/cluster/scripts/ with arg0=pk-ercoss1, arg
2013/05/17 12:41:44 VCS INFO V-16-6-0 (pk-ercoss2) done
2013/05/17 12:41:44 VCS INFO V-16-6-0 (pk-ercoss2) postoffline:Completed execution of /ericsson/core/cluster/scripts/ for group Pu
2013/05/17 12:41:44 VCS INFO V-16-6-0 (pk-ercoss1) done
2013/05/17 12:41:44 VCS INFO V-16-6-15002 (pk-ercoss2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline pk-ercoss2 PubLan   su
2013/05/17 12:41:44 VCS INFO V-16-6-0 (pk-ercoss1) postoffline:Completed execution of /ericsson/core/cluster/scripts/ for group Pu
2013/05/17 12:41:44 VCS INFO V-16-6-15002 (pk-ercoss1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline pk-ercoss1 PubLan   su
2013/05/17 12:41:49 VCS INFO V-16-2-13075 (pk-ercoss1) Resource(snmp_ip) has reported unexpected OFFLINE 1 times, which is still within the Tole
2013/05/17 12:41:50 VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys pk-ercoss2
2013/05/17 12:41:50 VCS NOTICE V-16-1-10300 Initiating Offline of Resource stop_sybase (Owner: Unspecified, Group: Sybase1) on System pk-ercoss2
2013/05/17 12:41:50 VCS ERROR V-16-1-10303 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys pk-ercoss2
2013/05/17 12:41:50 VCS INFO V-16-6-0 (pk-ercoss2) resfault:(resfault) Invoked with arg0=pk-ercoss2, arg1=syb1_p1, arg2=ONLINE
2013/05/17 12:41:50 VCS INFO V-16-6-0 (pk-ercoss2) resfault:(resfault) Invoked with arg0=pk-ercoss2, arg1=ossfs_p1, arg2=ONLINE
2013/05/17 12:41:50 VCS INFO V-16-10001-88 (pk-ercoss2) Application:stop_sybase:offline:Executed [/ericsson/core/cluster/scripts/
stop] successfully.
2013/05/17 12:41:50 VCS INFO V-16-0 (pk-ercoss2) resfault:( Invoked with arg0=/ericsson/core/cluster/scripts/, arg1=pk-er
coss2 ,arg2=syb1_p1
2013/05/17 12:41:50 VCS INFO V-16-0 (pk-ercoss2) resfault:( Invoked with arg0=/ericsson/core/cluster/scripts/, arg1=pk-er
coss2 ,arg2=ossfs_p1
2013/05/17 12:41:50 VCS INFO V-16-6-15002 (pk-ercoss2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault pk-ercoss2 syb1_p1 ONLINE
2013/05/17 12:41:50 VCS INFO V-16-6-15002 (pk-ercoss2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault pk-ercoss2 ossfs_p1 ONLIN
E  successfully
2013/05/17 12:41:53 VCS INFO V-16-1-10305 Resource stop_sybase (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
2013/05/17 12:41:53 VCS NOTICE V-16-1-10300 Initiating Offline of Resource masterdataservice_BACKUP (Owner: Unspecified, Group: Sybase1) on Syst
em pk-ercoss2
2013/05/17 12:42:00 VCS NOTICE V-16-20018-26 (pk-ercoss2) SybaseBk:masterdataservice_BACKUP:offline:Sybase Backup service masterdataservice_BACK
UP has been stopped
2013/05/17 12:42:00 VCS INFO V-16-2-13716 (pk-ercoss2) Resource(masterdataservice_BACKUP): Output of the completed operation (offline)
Backup Server: The Backup Server will go down immediately.
Terminating sessions.

2013/05/17 12:42:00 VCS WARNING V-16-20018-301 (pk-ercoss2) SybaseBk:masterdataservice_BACKUP:monitor:Open for backupserver failed, setting cook
ie to NULL
2013/05/17 12:42:00 VCS INFO V-16-1-10305 Resource masterdataservice_BACKUP (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS i
2013/05/17 12:42:00 VCS NOTICE V-16-1-10300 Initiating Offline of Resource masterdataservice (Owner: Unspecified, Group: Sybase1) on System pk-e
2013/05/17 12:42:02 VCS NOTICE V-16-20018-18 (pk-ercoss2) Sybase:masterdataservice:offline:Sybase service masterdataservice has been stopped
2013/05/17 12:42:03 VCS INFO V-16-2-13716 (pk-ercoss2) Resource(masterdataservice): Output of the completed operation (offline)
Server SHUTDOWN by request.
ASE is terminating this process.
        ct_results(): network packet layer: internal net library error: Net-Library operation terminated due to disconnect

2013/05/17 12:42:03 VCS WARNING V-16-20018-301 (pk-ercoss2) Sybase:masterdataservice:monitor:Open for dataserver failed, setting cookie to NULL
2013/05/17 12:42:03 VCS INFO V-16-1-10305 Resource masterdataservice (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiate
2013/05/17 12:42:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb1_ip (Owner: Unspecified, Group: Sybase1) on System pk-ercoss2
2013/05/17 12:42:03 VCS INFO V-16-1-10305 Resource syb1_ip (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
2013/05/17 12:42:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sybmaster_mount (Owner: Unspecified, Group: Sybase1) on System pk-erc
2013/05/17 12:42:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercoss
2013/05/17 12:42:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercos
2013/05/17 12:42:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-erco
2013/05/17 12:42:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-erc
2013/05/17 12:42:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmsyblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-erco
2013/05/17 12:42:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmsybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-erc
2013/05/17 12:42:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource dbdumps_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercos
2013/05/17 12:42:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb1bak_ip (Owner: Unspecified, Group: Sybase1) on System pk-ercoss2
2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource sybmaster_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource syblog_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)


message file


May 17 12:41:37 pk-ercoss1 in.mpathd[6024]: [ID 594170 daemon.error] NIC failure detected on oce9 of group pub_mnic
May 17 12:41:37 pk-ercoss1 in.mpathd[6024]: [ID 832587 daemon.error] Successfully failed over from NIC oce9 to NIC oce0
May 17 12:41:38 pk-ercoss1 in.mpathd[6024]: [ID 168056 daemon.error] All Interfaces in group pub_m
 have failed
May 17 12:41:42 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is
FAULTED (timed out) on sys pk-ercoss1
May 17 12:41:42 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is
FAULTED (timed out) on sys pk-ercoss2
May 17 12:41:43 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10205 Group PubLan is faulted on system pk-ercoss2
May 17 12:41:43 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10205 Group PubLan is faulted on system pk-ercoss1
May 17 12:41:50 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is
FAULTED (timed out) on sys pk-ercoss2
May 17 12:41:50 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is F
AULTED (timed out) on sys pk-ercoss2
May 17 12:41:56 pk-ercoss1 in.mpathd[6024]: [ID 299542 daemon.error] NIC repair detected on oce0 of group pub_mnic
May 17 12:41:56 pk-ercoss1 in.mpathd[6024]: [ID 620804 daemon.error] Successfully failed back to NIC oce0
May 17 12:41:56 pk-ercoss1 in.mpathd[6024]: [ID 237757 daemon.error] At least 1 interface (oce0) of group pub_mnic has repaired
May 17 12:41:57 pk-ercoss1 in.mpathd[6024]: [ID 299542 daemon.error] NIC repair detected on oce9 of group pub_mnic
May 17 12:41:57 pk-ercoss1 in.mpathd[6024]: [ID 620804 daemon.error] Successfully failed back to NIC oce9
May 17 12:42:10 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10205 Group Sybase1 is faulted on system pk-ercoss2
May 17 12:42:11 pk-ercoss1 AgentFramework[5819]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 Thread(5) Agent is calling clean for resource(
ossfs_ip) because the resource became OFFLINE unexpectedly, on its own.
May 17 12:42:11 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 (pk-ercoss1) Agent is calling clean for resource(ossfs_ip
) because the resource became OFFLINE unexpectedly, on its own.
May 17 12:42:11 pk-ercoss1 AgentFramework[5819]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13068 Thread(5) Resource(ossfs_ip) - clean completed
May 17 12:42:12 pk-ercoss1 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3a40@1c/pci103c,3245@0/sd@1,0 (sd9):
May 17 12:42:12 pk-ercoss1      drive offline
May 17 12:42:12 pk-ercoss1 vxdmp: [ID 480808 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 30/0x240 belonging to the dmpnode 264/0x40
due to open failure
May 17 12:42:12 pk-ercoss1 vxdmp: [ID 824220 kern.notice] NOTICE: VxVM vxdmp V-5-0-111 disabled dmpnode 264/0x40
May 17 12:42:12 pk-ercoss1 vxdmp: [ID 238993 kern.notice] NOTICE: VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x108/0x42
May 17 12:42:12 pk-ercoss1 vxvm:vxconfigd: [ID 702911 daemon.notice] V-5-1-11401 : dg import with I/O fence enabled
May 17 12:42:12 pk-ercoss1 vxvm:vxconfigd: [ID 702911 daemon.notice] V-5-1-11401 sybasedg: dg import with I/O fence enabled
May 17 12:42:15 pk-ercoss1 AgentFramework[5819]: [ID 702911 daemon.notice] VCS ERROR V-16-10001-5004 IPMultiNICB:syb1_ip:online:Can not online.
No interfaces available
May 17 12:42:15 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-10001-5004 (pk-ercoss1) IPMultiNICB:syb1_ip:online:Can not online
. No interfaces available
May 17 12:42:19 pk-ercoss1 AgentFramework[5819]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 Thread(5) Agent is calling clean for resource(
snmp_ip) because the resource became OFFLINE unexpectedly, on its own.
May 17 12:42:19 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 (pk-ercoss1) Agent is calling clean for resource(snmp_ip)
 because the resource became OFFLINE unexpectedly, on its own.
May 17 12:42:19 pk-ercoss1 AgentFramework[5819]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13068 Thread(5) Resource(snmp_ip) - clean completed
May 17 12:42:21 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is F
AULTED (timed out) on sys pk-ercoss1
May 17 12:42:21 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is
FAULTED (timed out) on sys pk-ercoss1
May 17 12:42:50 pk-ercoss1 svc.startd[9]: [ID 652011 daemon.warning] svc:/ericsson/eric_3pp/activemq_oss_loggingbroker:default: Method "/ericsso
n/activemq/bin/ stopActiveMqLogger" failed with exit status 1.
May 17 12:42:50 pk-ercoss1 svc.startd[9]: [ID 652011 daemon.warning] svc:/ericsson/eric_3pp/activemq:default: Method "/ericsson/activemq/bin/act stopActiveMq" failed with exit status 1.
May 17 12:43:16 pk-ercoss1 AgentFramework[5819]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13066 Thread(4) Agent is calling clean for resource(
syb1_ip) because the resource is not up even after online completed.
May 17 12:43:16 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13066 (pk-ercoss1) Agent is calling clean for resource(syb1_ip)
 because the resource is not up even after online completed.
May 17 12:43:16 pk-ercoss1 AgentFramework[5819]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13068 Thread(4) Resource(syb1_ip) - clean completed
May 17 12:43:16 pk-ercoss1 AgentFramework[5819]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13072 Thread(4) Resource(syb1_ip): Agent is retrying
 online (attempt number 1 of 1).
May 17 12:43:28 pk-ercoss1 vxdmp: [ID 238993 kern.notice] NOTICE: VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x108/0x42
May 17 12:43:28 pk-ercoss1 vxvm:vxconfigd: [ID 702911 daemon.notice] V-5-1-11401 : dg import with I/O fence enabled
May 17 12:43:28 pk-ercoss1 vxvm:vxconfigd: [ID 702911 daemon.notice] V-5-1-11401 sybasedg: dg import with I/O fence enabled
May 17 12:43:30 pk-ercoss1 svc.startd[9]: [ID 122153 daemon.warning] svc:/ericsson/eric_ep/TBS:default: Method or service exit timed out.  Killi
ng contract 322855.
May 17 12:43:30 pk-ercoss1 svc.startd[9]: [ID 636263 daemon.warning] svc:/ericsson/eric_ep/TBS:default: Method "/etc/init.d/TBS stop" failed due
 to signal KILL.
May 17 12:44:31 pk-ercoss1 svc.startd[9]: [ID 122153 daemon.warning] svc:/ericsson/eric_ep/TBS:default: Method or service exit timed out.  Killi
ng contract 322861.
May 17 12:44:31 pk-ercoss1 svc.startd[9]: [ID 636263 daemon.warning] svc:/ericsson/eric_ep/TBS:default: Method "/etc/init.d/TBS stop" failed due
 to signal KILL.
May 17 12:44:49 pk-ercoss1 AgentFramework[5813]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 Thread(13) Agent is calling clean for resource
(tomcat) because the resource became OFFLINE unexpectedly, on its own.
May 17 12:44:49 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 (pk-ercoss1) Agent is calling clean for resource(tomcat)
because the resource became OFFLINE unexpectedly, on its own.
May 17 12:44:49 pk-ercoss1 AgentFramework[5813]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13068 Thread(13) Resource(tomcat) - clean completed
May 17 12:44:49 pk-ercoss1 AgentFramework[5813]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13073 Thread(13) Resource(tomcat) became OFFLINE une
xpectedly on its own. Agent is restarting (attempt number 1 of 2) the resource.
May 17 12:44:49 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13073 (pk-ercoss1) Resource(tomcat) became OFFLINE unexpectedly
 on its own. Agent is restarting (attempt number 1 of 2) the resource.
May 17 12:45:32 pk-ercoss1 svc.startd[9]: [ID 122153 daemon.warning] svc:/ericsson/eric_ep/TBS:default: Method or service exit timed out.  Killi
ng contract 322865.
May 17 12:45:32 pk-ercoss1 svc.startd[9]: [ID 636263 daemon.warning] svc:/ericsson/eric_ep/TBS:default: Method "/etc/init.d/TBS stop" failed due
 to signal KILL.
May 17 12:45:32 pk-ercoss1 svc.startd[9]: [ID 748625 daemon.error] ericsson/eric_ep/TBS:default failed: transitioned to maintenance (see 'svcs -
xv' for details)
May 17 12:46:18 pk-ercoss1 su: [ID 810491 auth.crit] 'su sybase' failed for sybase on /dev/???
May 17 12:47:19 pk-ercoss1 svc.startd[9]: [ID 122153 daemon.warning] svc:/ericsson/eric_3pp/glassfish:default: Method or service exit timed out.
  Killing contract 540.
May 17 12:47:19 pk-ercoss1 AgentFramework[5813]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13011 Thread(14) Resource(glassfish): offline proced
ure did not complete within the expected time.
May 17 12:47:19 pk-ercoss1 AgentFramework[5813]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13063 Thread(14) Agent is calling clean for resource
(glassfish) because offline did not complete within the expected time.
May 17 12:47:19 pk-ercoss1 Had[5742]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13063 (pk-ercoss1) Agent is calling clean for resource(glassfis
h) because offline did not complete within the expected time.
May 17 12:47:20 pk-ercoss1 svc.startd[9]: [ID 748625 daemon.error] ericsson/eric_3pp/glassfish:default failed: transitioned to maintenance (see
'svcs -xv' for details)
May 17 12:47:21 pk-ercoss1 AgentFramework[5813]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13068 Thread(14) Resource(glassfish) - clean complet
ed successfully.


hastatus output at present


-- System               State                Frozen              

A  pk-ercoss1           RUNNING              0                    
A  pk-ercoss2           RUNNING              0                    

-- Group           System               Probed     AutoDisabled    State          

B  BkupLan         pk-ercoss1           Y          N               ONLINE         
B  BkupLan         pk-ercoss2           Y          N               ONLINE         
B  DDCMon          pk-ercoss1           Y          N               ONLINE         
B  DDCMon          pk-ercoss2           Y          N               PARTIAL        
B  Oss             pk-ercoss1           Y          N               ONLINE         
B  Oss             pk-ercoss2           Y          N               OFFLINE        
B  Ossfs           pk-ercoss1           Y          N               ONLINE         
B  Ossfs           pk-ercoss2           Y          N               OFFLINE        
B  PrivLan         pk-ercoss1           Y          N               ONLINE         
B  PrivLan         pk-ercoss2           Y          N               ONLINE         
B  PubLan          pk-ercoss1           Y          N               ONLINE         
B  PubLan          pk-ercoss2           Y          N               ONLINE         
B  StorLan         pk-ercoss1           Y          N               ONLINE         
B  StorLan         pk-ercoss2           Y          N               ONLINE         
B  Sybase1         pk-ercoss1           Y          N               OFFLINE        
B  Sybase1         pk-ercoss2           Y          N               ONLINE         


pk-ercoss1{root} # hagrp -resources PubLan
pk-ercoss1{root} # hares -display pub_mnic
#Resource    Attribute              System     Value
pub_mnic     Group                  global     PubLan
pub_mnic     Type                   global     MultiNICB
pub_mnic     AutoStart              global     1
pub_mnic     Critical               global     1
pub_mnic     Enabled                global     1
pub_mnic     LastOnline             global     pk-ercoss2
pub_mnic     MonitorOnly            global     0
pub_mnic     ResourceOwner          global     
pub_mnic     TriggerEvent           global     0
pub_mnic     ArgListValues          pk-ercoss1 UseMpathd        1       1       MpathdCommand   1       /usr/lib/inet/in.mpathd ConfigCheck     1       1       MpathdRestart   1       1       Device  4       oce0    0       oce9    1       NetworkHosts    1    LinkTestRatio   1       1       IgnoreLinkStatus        1       1       NetworkTimeout  1       100     OnlineTestRepeatCount   1       3       OfflineTestRepeatCount  1       3       NoBroadcast     1       0       DefaultRouter   1 Failback        1       0       GroupName       1       ""      Protocol        1       IPv4
pub_mnic     ArgListValues          pk-ercoss2 UseMpathd        1       1       MpathdCommand   1       /usr/lib/inet/in.mpathd ConfigCheck     1       1       MpathdRestart   1       1       Device  4       oce0    0       oce9    1       NetworkHosts    1    LinkTestRatio   1       1       IgnoreLinkStatus        1       1       NetworkTimeout  1       100     OnlineTestRepeatCount   1       3       OfflineTestRepeatCount  1       3       NoBroadcast     1       0       DefaultRouter   1 Failback        1       0       GroupName       1       ""      Protocol        1       IPv4
pub_mnic     ConfidenceLevel        pk-ercoss1 0
pub_mnic     ConfidenceLevel        pk-ercoss2 0
pub_mnic     ConfidenceMsg          pk-ercoss1
pub_mnic     ConfidenceMsg          pk-ercoss2
pub_mnic     Flags                  pk-ercoss1
pub_mnic     Flags                  pk-ercoss2
pub_mnic     IState                 pk-ercoss1 not waiting
pub_mnic     IState                 pk-ercoss2 not waiting
pub_mnic     MonitorMethod          pk-ercoss1 Traditional
pub_mnic     MonitorMethod          pk-ercoss2 Traditional
pub_mnic     Probed                 pk-ercoss1 1
pub_mnic     Probed                 pk-ercoss2 1
pub_mnic     Start                  pk-ercoss1 0
pub_mnic     Start                  pk-ercoss2 0
pub_mnic     State                  pk-ercoss1 ONLINE
pub_mnic     State                  pk-ercoss2 ONLINE
pub_mnic     ComputeStats           global     0
pub_mnic     ConfigCheck            global     1
pub_mnic     DefaultRouter          global
pub_mnic     Failback               global     0
pub_mnic     GroupName              global     
pub_mnic     IgnoreLinkStatus       global     1
pub_mnic     LinkTestRatio          global     1
pub_mnic     MpathdCommand          global     /usr/lib/inet/in.mpathd
pub_mnic     MpathdRestart          global     1
pub_mnic     NetworkHosts           global
pub_mnic     NetworkTimeout         global     100
pub_mnic     NoBroadcast            global     0
pub_mnic     OfflineTestRepeatCount global     3
pub_mnic     OnlineTestRepeatCount  global     3
pub_mnic     Protocol               global     IPv4
pub_mnic     TriggerResStateChange  global     0
pub_mnic     UseMpathd              global     1
pub_mnic     ContainerInfo          pk-ercoss1 Type             Name            Enabled
pub_mnic     ContainerInfo          pk-ercoss2 Type             Name            Enabled
pub_mnic     Device                 pk-ercoss1 oce0     0       oce9    1
pub_mnic     Device                 pk-ercoss2 oce0     0       oce9    1
pub_mnic     MonitorTimeStats       pk-ercoss1 Avg      0       TS      
pub_mnic     MonitorTimeStats       pk-ercoss2 Avg      0       TS      
pub_mnic     ResourceInfo           pk-ercoss1 State    Valid   Msg             TS      
pub_mnic     ResourceInfo           pk-ercoss2 State    Valid   Msg             TS      
pk-ercoss1{root} # hares -display pub_p
#Resource    Attribute             System     Value
pub_p        Group                 global     PubLan
pub_p        Type                  global     Phantom
pub_p        AutoStart             global     1
pub_p        Critical              global     1
pub_p        Enabled               global     1
pub_p        LastOnline            global     pk-ercoss1
pub_p        MonitorOnly           global     0
pub_p        ResourceOwner         global     
pub_p        TriggerEvent          global     0
pub_p        ArgListValues         pk-ercoss1 ""
pub_p        ArgListValues         pk-ercoss2 ""
pub_p        ConfidenceLevel       pk-ercoss1 100
pub_p        ConfidenceLevel       pk-ercoss2 100
pub_p        ConfidenceMsg         pk-ercoss1
pub_p        ConfidenceMsg         pk-ercoss2
pub_p        Flags                 pk-ercoss1
pub_p        Flags                 pk-ercoss2
pub_p        IState                pk-ercoss1 not waiting
pub_p        IState                pk-ercoss2 not waiting
pub_p        MonitorMethod         pk-ercoss1 Traditional
pub_p        MonitorMethod         pk-ercoss2 Traditional
pub_p        Probed                pk-ercoss1 1
pub_p        Probed                pk-ercoss2 1
pub_p        Start                 pk-ercoss1 1
pub_p        Start                 pk-ercoss2 1
pub_p        State                 pk-ercoss1 ONLINE
pub_p        State                 pk-ercoss2 ONLINE
pub_p        ComputeStats          global     0
pub_p        TriggerResStateChange global     0
pub_p        ContainerInfo         pk-ercoss1 Type              Name            Enabled
pub_p        ContainerInfo         pk-ercoss2 Type              Name            Enabled
pub_p        MonitorTimeStats      pk-ercoss1 Avg       0       TS      
pub_p        MonitorTimeStats      pk-ercoss2 Avg       0       TS      
pub_p        ResourceInfo          pk-ercoss1 State     Valid   Msg             TS      
pub_p        ResourceInfo          pk-ercoss2 State     Valid   Msg             TS

  • OK - I have enough for sequence of events now:


    Solairs (mpathd) detects network outage:
    May 17 12:41:38 pk-ercoss1 in.mpathd[6024]: [ID 168056 daemon.error] All Interfaces in group pub_mnic have failed
    VCS detects network outage (with will be within 10 seconds of mpathd detection with default MonitorInterval = 10 for MultiNICB):
    2013/05/17 12:41:42 VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys pk-ercoss1
    2013/05/17 12:41:42 VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys pk-ercoss2
    Resource syb1_p1 in group Sybase1 faults on pk-ercoss2 so VCS faults the group:
    2013/05/17 12:41:50 VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys pk-ercoss2
    Solairs (mpathd) detects network is fixed:
    May 17 12:41:57 pk-ercoss1 in.mpathd[6024]: [ID 620804 daemon.error] Successfully failed back to NIC oce9
    Group Sybase1 has tried to failover to pk-ercoss1, but IP cannot online as VCS has not detected network is fixed yet:
    2013/05/17 12:42:15 VCS ERROR V-16-10001-5004 (pk-ercoss1) IPMultiNICB:syb1_ip:online:Can not online. No interfaces available
    VCS detects network is fixed (with will be within 60 seconds of mpathd detection with default OfflineMonitorInterval = 60 for MultiNICB):
    2013/05/17 12:42:42 VCS INFO V-16-1-10299 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is online on pk-ercoss2 (Not initiated by VCS)
    2013/05/17 12:42:43 VCS INFO V-16-1-10299 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is online on pk-ercoss1 (Not initiated by VCS)


    As MultiNICB is a persistent resource, its state changes to online when NIC is repaired so does not require administrative intervention.


9 Replies

  • OK - I have enough for sequence of events now:


    Solairs (mpathd) detects network outage:
    May 17 12:41:38 pk-ercoss1 in.mpathd[6024]: [ID 168056 daemon.error] All Interfaces in group pub_mnic have failed
    VCS detects network outage (with will be within 10 seconds of mpathd detection with default MonitorInterval = 10 for MultiNICB):
    2013/05/17 12:41:42 VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys pk-ercoss1
    2013/05/17 12:41:42 VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys pk-ercoss2
    Resource syb1_p1 in group Sybase1 faults on pk-ercoss2 so VCS faults the group:
    2013/05/17 12:41:50 VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys pk-ercoss2
    Solairs (mpathd) detects network is fixed:
    May 17 12:41:57 pk-ercoss1 in.mpathd[6024]: [ID 620804 daemon.error] Successfully failed back to NIC oce9
    Group Sybase1 has tried to failover to pk-ercoss1, but IP cannot online as VCS has not detected network is fixed yet:
    2013/05/17 12:42:15 VCS ERROR V-16-10001-5004 (pk-ercoss1) IPMultiNICB:syb1_ip:online:Can not online. No interfaces available
    VCS detects network is fixed (with will be within 60 seconds of mpathd detection with default OfflineMonitorInterval = 60 for MultiNICB):
    2013/05/17 12:42:42 VCS INFO V-16-1-10299 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is online on pk-ercoss2 (Not initiated by VCS)
    2013/05/17 12:42:43 VCS INFO V-16-1-10299 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is online on pk-ercoss1 (Not initiated by VCS)


    As MultiNICB is a persistent resource, its state changes to online when NIC is repaired so does not require administrative intervention.


  • Hi Mike,


    Yes , NICs on both the nodes got faulted at the same time and PUBLAN parallel service group faulted on both  the nodes.

    In such scenario , how the VCS behaves . As NIC is persistent resource and its state changes to online if NIC is repaired then will it require administrative intervention or VCS will online  all the  parent resources resources once the NIC is corected.


    Below are the  further event logs


    2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource syblog_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
    2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
    2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
    2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
    2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource fmsyblog_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
    2013/05/17 12:42:05 VCS INFO V-16-1-10305 Resource syb1bak_ip (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
    2013/05/17 12:42:06 VCS INFO V-16-1-10305 Resource dbdumps_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
    2013/05/17 12:42:06 VCS INFO V-16-1-10305 Resource fmsybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
    2013/05/17 12:42:06 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sybasedg (Owner: Unspecified, Group: Sybase1) on System pk-ercoss2
    2013/05/17 12:42:10 VCS INFO V-16-1-10305 Resource sybasedg (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss2 (VCS initiated)
    2013/05/17 12:42:10 VCS ERROR V-16-1-10205 Group Sybase1 is faulted on system pk-ercoss2
    2013/05/17 12:42:10 VCS NOTICE V-16-1-10446 Group Sybase1 is offline on system pk-ercoss2
    2013/05/17 12:42:10 VCS INFO V-16-1-10493 Evaluating pk-ercoss2 as potential target node for group Sybase1
    2013/05/17 12:42:10 VCS INFO V-16-1-50010 Group Sybase1 is online or faulted on system pk-ercoss2
    2013/05/17 12:42:10 VCS INFO V-16-1-10493 Evaluating pk-ercoss1 as potential target node for group Sybase1
    2013/05/17 12:42:10 VCS INFO V-16-6-15025 (pk-ercoss1) hatrigger:invoking nfs_preonline
    2013/05/17 12:42:10 VCS INFO V-16-6-15076 (pk-ercoss1) hatrigger:invoking regular preonline trigger if it exists
    2013/05/17 12:42:10 VCS INFO V-16-6-0 (pk-ercoss2) postoffline:(postoffline) Invoked with arg0=pk-ercoss2, arg1=Sybase1
    2013/05/17 12:42:10 VCS INFO V-16-6-0 (pk-ercoss2) postoffline:Executing /ericsson/core/cluster/scripts/ with arg0=pk-ercoss2, arg
    2013/05/17 12:42:10 VCS INFO V-16-6-0 (pk-ercoss1) preonline:(preonline) Invoked with arg0=pk-ercoss1, arg1=Sybase1, arg2=FAULT, arg3=pk-ercoss2
    2013/05/17 12:42:10 VCS INFO V-16-6-0 (pk-ercoss1) preonline:Executing /ericsson/core/cluster/scripts/ with arg0=pk-ercoss1, arg1=Sy
    2013/05/17 12:42:10 VCS INFO V-16-6-0 (pk-ercoss2) done
    2013/05/17 12:42:10 VCS INFO V-16-6-0 (pk-ercoss2) postoffline:Completed execution of /ericsson/core/cluster/scripts/ for group Sy
    2013/05/17 12:42:10 VCS INFO V-16-6-0 (pk-ercoss1) done
    2013/05/17 12:42:10 VCS INFO V-16-6-15002 (pk-ercoss2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline pk-ercoss2 Sybase1   s
    2013/05/17 12:42:10 VCS INFO V-16-6-0 (pk-ercoss1) preonline:Completed execution of /ericsson/core/cluster/scripts/ for group Sybase
    1. Onlining group Sybase1 with -nopre option
    2013/05/17 12:42:10 VCS INFO V-16-1-50135 User root fired command: hagrp -online Sybase1  pk-ercoss1  pk-ercoss2  from localhost
    2013/05/17 12:42:10 VCS NOTICE V-16-1-10166 Initiating manual online of group Sybase1 on system pk-ercoss1
    2013/05/17 12:42:10 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group Sybase1 on all nodes
    2013/05/17 12:42:10 VCS NOTICE V-16-1-10187 Received -nopre online command for group Sybase1 on system pk-ercoss1
    2013/05/17 12:42:10 VCS NOTICE V-16-1-10301 Initiating Online of Resource sybasedg (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:42:10 VCS NOTICE V-16-1-10301 Initiating Online of Resource syb1_ip (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:42:10 VCS NOTICE V-16-1-10301 Initiating Online of Resource syb1bak_ip (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:42:10 VCS INFO V-16-6-15002 (pk-ercoss1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/preonline pk-ercoss1 Sybase1 FAULT
     pk-ercoss2 successfully
    2013/05/17 12:42:11 VCS ERROR V-16-2-13067 (pk-ercoss1) Agent is calling clean for resource(ossfs_ip) because the resource became OFFLINE unexpe
    ctedly, on its own.
    2013/05/17 12:42:11 VCS INFO V-16-2-13068 (pk-ercoss1) Resource(ossfs_ip) - clean completed successfully.
    2013/05/17 12:42:11 VCS INFO V-16-1-10307 Resource ossfs_ip (Owner: Unspecified, Group: Ossfs) is offline on pk-ercoss1 (Not initiated by VCS)
    2013/05/17 12:42:11 VCS NOTICE V-16-1-10300 Initiating Offline of Resource stop_oss (Owner: Unspecified, Group: Ossfs) on System pk-ercoss1
    2013/05/17 12:42:11 VCS INFO V-16-6-0 (pk-ercoss1) resfault:(resfault) Invoked with arg0=pk-ercoss1, arg1=ossfs_ip, arg2=ONLINE
    2013/05/17 12:42:11 VCS INFO V-16-0 (pk-ercoss1) resfault:( Invoked with arg0=/ericsson/core/cluster/scripts/, arg1=pk-er
    coss1 ,arg2=ossfs_ip
    2013/05/17 12:42:11 VCS INFO V-16-6-15002 (pk-ercoss1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault pk-ercoss1 ossfs_ip ONLIN
    E  successfully
    2013/05/17 12:42:12 VCS WARNING V-16-10001-1013 (pk-ercoss1) DiskGroup:sybasedg:online:Diskgroups will be imported with reservations
    2013/05/17 12:42:13 VCS INFO V-16-1-10298 Resource syb1bak_ip (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:14 VCS NOTICE V-16-10001-1009 (pk-ercoss1) DiskGroup:sybasedg:online:vxdg import succeeded on Disk Group sybasedg
    2013/05/17 12:42:14 VCS NOTICE V-16-10001-1092 (pk-ercoss1) DiskGroup:sybasedg:online:Volumes in Disk Group sybasedg are started automatically a
    s part of import command, the system level autostartvolumes flag is set to on
    2013/05/17 12:42:15 VCS ERROR V-16-10001-5004 (pk-ercoss1) IPMultiNICB:syb1_ip:online:Can not online. No interfaces available


    why it could not bring syb1_ip online as we can see above that NIC were back at  12:41:57


    2013/05/17 12:42:15 VCS INFO V-16-1-10298 Resource sybasedg (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:15 VCS NOTICE V-16-1-10301 Initiating Online of Resource dbdumps_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercoss
    2013/05/17 12:42:15 VCS NOTICE V-16-1-10301 Initiating Online of Resource fmsybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-erco
    2013/05/17 12:42:15 VCS NOTICE V-16-1-10301 Initiating Online of Resource fmsyblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercos
    2013/05/17 12:42:15 VCS NOTICE V-16-1-10301 Initiating Online of Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-erco
    2013/05/17 12:42:15 VCS NOTICE V-16-1-10301 Initiating Online of Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercos
    2013/05/17 12:42:15 VCS NOTICE V-16-1-10301 Initiating Online of Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercoss
    2013/05/17 12:42:15 VCS NOTICE V-16-1-10301 Initiating Online of Resource syblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:42:15 VCS NOTICE V-16-1-10301 Initiating Online of Resource sybmaster_mount (Owner: Unspecified, Group: Sybase1) on System pk-erco
    2013/05/17 12:42:16 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss  pk-ercoss2  from localhost
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10208 Initiating switch of group Oss from system pk-ercoss1 to system pk-ercoss2
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cron (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmria (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource trapdist (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource restart_mc (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource activemq (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource activemq_oss_loggingbroker (Owner: Unspecified, Group: Oss) on System
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource apache (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource imgr_tomcat (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource alex (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vrsnt_log_mon (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb_log_mon (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb_proc_mon (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource glassfish (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:16 VCS INFO V-16-10001-1 (pk-ercoss1) for Oss to go Offline
    2013/05/17 12:42:16 VCS INFO V-16-10001-88 (pk-ercoss1) Application:restart_mc:offline:Executed [/ericsson/core/cluster/scripts/ st
    op] successfully.
    2013/05/17 12:42:16 VCS INFO V-16-10001-88 (pk-ercoss1) Application:cron:offline:Executed [/ericsson/core/cluster/scripts/ stop] successf
    2013/05/17 12:42:17 VCS INFO V-16-10001-88 (pk-ercoss1) Application:fmria:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsson
    /eric_ep/riadaemon:default] successfully.
    2013/05/17 12:42:17 VCS INFO V-16-10001-88 (pk-ercoss1) Application:apache:offline:Executed [/ericsson/core/cluster/scripts/ stop /network
    /http:apache2] successfully.
    2013/05/17 12:42:17 VCS INFO V-16-10001-88 (pk-ercoss1) Application:trapdist:offline:Executed [/ericsson/core/cluster/scripts/ stop /erics
    son/eric_ep/trapd:default] successfully.
    2013/05/17 12:42:17 VCS INFO V-16-10001-88 (pk-ercoss1) Application:alex:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsson/
    eric_3pp/alex:default] successfully.
    2013/05/17 12:42:17 VCS INFO V-16-10001-88 (pk-ercoss1) Application:vrsnt_log_mon:offline:Executed [/ericsson/core/cluster/scripts/ stop /
    ericsson/eric_3pp/versant_log_monitor:default] successfully.
    2013/05/17 12:42:18 VCS INFO V-16-1-10298 Resource fmsybdata_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:18 VCS INFO V-16-1-10298 Resource fmsyblog_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:18 VCS INFO V-16-10001-88 (pk-ercoss1) Application:syb_log_mon:offline:Executed [/ericsson/core/cluster/scripts/ stop /er
    icsson/eric_3pp/sybase_log_monitor:default] successfully.
    2013/05/17 12:42:18 VCS INFO V-16-1-10298 Resource dbdumps_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:18 VCS INFO V-16-1-10298 Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:18 VCS INFO V-16-1-10298 Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:18 VCS INFO V-16-1-10298 Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:18 VCS INFO V-16-1-10298 Resource syblog_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:18 VCS INFO V-16-1-10298 Resource sybmaster_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:18 VCS INFO V-16-10001-88 (pk-ercoss1) Application:syb_proc_mon:offline:Executed [/ericsson/core/cluster/scripts/ stop /e
    ricsson/eric_3pp/sybase_process_monitor:default] successfully.
    2013/05/17 12:42:19 VCS ERROR V-16-2-13067 (pk-ercoss1) Agent is calling clean for resource(snmp_ip) because the resource became OFFLINE unexpec
    tedly, on its own.
    2013/05/17 12:42:19 VCS INFO V-16-2-13068 (pk-ercoss1) Resource(snmp_ip) - clean completed successfully.
    2013/05/17 12:42:19 VCS INFO V-16-1-10307 Resource snmp_ip (Owner: Unspecified, Group: Ossfs) is offline on pk-ercoss1 (Not initiated by VCS)
    2013/05/17 12:42:19 VCS INFO V-16-6-0 (pk-ercoss1) resfault:(resfault) Invoked with arg0=pk-ercoss1, arg1=snmp_ip, arg2=ONLINE
    2013/05/17 12:42:19 VCS INFO V-16-0 (pk-ercoss1) resfault:( Invoked with arg0=/ericsson/core/cluster/scripts/, arg1=pk-er
    coss1 ,arg2=snmp_ip
    2013/05/17 12:42:19 VCS INFO V-16-6-15002 (pk-ercoss1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault pk-ercoss1 snmp_ip ONLINE
    2013/05/17 12:42:19 VCS INFO V-16-1-10305 Resource restart_mc (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:19 VCS NOTICE V-16-1-10300 Initiating Offline of Resource smssr (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:20 VCS INFO V-16-10001-88 (pk-ercoss1) Application:imgr_tomcat:offline:Executed [/ericsson/core/cluster/scripts/ stop /er
    icsson/eric_3pp/tomcat:default] successfully.
    2013/05/17 12:42:20 VCS INFO V-16-1-10305 Resource apache (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:20 VCS INFO V-16-1-10305 Resource cron (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:21 VCS ERROR V-16-1-10303 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys pk-ercoss1
    2013/05/17 12:42:21 VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys pk-ercoss1
    2013/05/17 12:42:21 VCS INFO V-16-1-10305 Resource trapdist (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:21 VCS INFO V-16-6-0 (pk-ercoss1) resfault:(resfault) Invoked with arg0=pk-ercoss1, arg1=syb1_p1, arg2=ONLINE
    2013/05/17 12:42:21 VCS INFO V-16-6-0 (pk-ercoss1) resfault:(resfault) Invoked with arg0=pk-ercoss1, arg1=ossfs_p1, arg2=ONLINE
    2013/05/17 12:42:21 VCS INFO V-16-0 (pk-ercoss1) resfault:( Invoked with arg0=/ericsson/core/cluster/scripts/, arg1=pk-er
    coss1 ,arg2=syb1_p1
    2013/05/17 12:42:21 VCS INFO V-16-0 (pk-ercoss1) resfault:( Invoked with arg0=/ericsson/core/cluster/scripts/, arg1=pk-er
    coss1 ,arg2=ossfs_p1
    2013/05/17 12:42:21 VCS INFO V-16-6-15002 (pk-ercoss1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault pk-ercoss1 syb1_p1 ONLINE
    2013/05/17 12:42:21 VCS INFO V-16-6-15002 (pk-ercoss1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault pk-ercoss1 ossfs_p1 ONLIN
    E  successfully
    2013/05/17 12:42:21 VCS INFO V-16-1-10305 Resource alex (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:21 VCS INFO V-16-1-10305 Resource fmria (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:21 VCS INFO V-16-10001-88 (pk-ercoss1) Application:smssr:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsson
    /ossrc/ssr:default] successfully.
    2013/05/17 12:42:22 VCS INFO V-16-1-10305 Resource syb_log_mon (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:22 VCS INFO V-16-1-10305 Resource vrsnt_log_mon (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:23 VCS INFO V-16-1-10305 Resource syb_proc_mon (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:23 VCS INFO V-16-1-10305 Resource imgr_tomcat (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:23 VCS INFO V-16-1-10305 Resource imgr_tomcat (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:24 VCS INFO V-16-1-10305 Resource smssr (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:24 VCS NOTICE V-16-1-10300 Initiating Offline of Resource supervisor (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:27 VCS INFO V-16-10001-88 (pk-ercoss1) Application:supervisor:offline:Executed [/ericsson/core/cluster/scripts/ stop /eri
    csson/ossrc/ssrProcessSupervisor:default] successfully.
    2013/05/17 12:42:30 VCS INFO V-16-1-10305 Resource supervisor (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:30 VCS NOTICE V-16-1-10300 Initiating Offline of Resource tbs (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:42:42 VCS INFO V-16-1-10299 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is online on pk-ercoss2 (Not initiated by VCS)
    2013/05/17 12:42:42 VCS NOTICE V-16-1-10229 Group PubLan - Trying to online resources of group that were online prior to fault on node pk-ercoss
    2 . Persistent resource went online on node pk-ercoss2
    2013/05/17 12:42:42 VCS NOTICE V-16-1-10301 Initiating Online of Resource pub_p (Owner: Unspecified, Group: PubLan) on System pk-ercoss2
    2013/05/17 12:42:43 VCS INFO V-16-1-10299 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is online on pk-ercoss1 (Not initiated by VCS)
    2013/05/17 12:42:43 VCS NOTICE V-16-1-10229 Group PubLan - Trying to online resources of group that were online prior to fault on node pk-ercoss
    1 . Persistent resource went online on node pk-ercoss1
    2013/05/17 12:42:43 VCS NOTICE V-16-1-10301 Initiating Online of Resource pub_p (Owner: Unspecified, Group: PubLan) on System pk-ercoss1
    2013/05/17 12:42:43 VCS INFO V-16-1-10298 Resource pub_p (Owner: Unspecified, Group: PubLan) is online on pk-ercoss2 (VCS initiated)
    2013/05/17 12:42:43 VCS NOTICE V-16-1-10447 Group PubLan is online on system pk-ercoss2
    2013/05/17 12:42:44 VCS INFO V-16-6-0 (pk-ercoss2) postonline:(postonline) Invoked with arg0=pk-ercoss2, arg1=PubLan
    2013/05/17 12:42:44 VCS INFO V-16-6-0 (pk-ercoss2) postonline:Executing /ericsson/core/cluster/scripts/ with arg0=pk-ercoss2, arg1=
    2013/05/17 12:42:44 VCS INFO V-16-6-0 (pk-ercoss2) postonline:Completed execution of /ericsson/core/cluster/scripts/ for group PubL
    2013/05/17 12:42:44 VCS INFO V-16-6-15002 (pk-ercoss2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postonline pk-ercoss2 PubLan   suc
    2013/05/17 12:42:44 VCS INFO V-16-1-10298 Resource pub_p (Owner: Unspecified, Group: PubLan) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:44 VCS NOTICE V-16-1-10447 Group PubLan is online on system pk-ercoss1
    2013/05/17 12:42:45 VCS INFO V-16-6-0 (pk-ercoss1) postonline:(postonline) Invoked with arg0=pk-ercoss1, arg1=PubLan
    2013/05/17 12:42:45 VCS INFO V-16-6-0 (pk-ercoss1) postonline:Executing /ericsson/core/cluster/scripts/ with arg0=pk-ercoss1, arg1=
    2013/05/17 12:42:45 VCS INFO V-16-1-50135 User root fired command: hares -clear ossfs_ip  from localhost
    2013/05/17 12:42:45 VCS INFO V-16-1-50135 User root fired command: hares -clear syb1_ip  from localhost
    2013/05/17 12:42:45 VCS INFO V-16-1-50135 User root fired command: hares -clear snmp_ip  from localhost
    2013/05/17 12:42:45 VCS INFO V-16-6-0 (pk-ercoss1) postonline:Completed execution of /ericsson/core/cluster/scripts/ for group PubL
    2013/05/17 12:42:45 VCS INFO V-16-6-15002 (pk-ercoss1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postonline pk-ercoss1 PubLan   suc
    2013/05/17 12:42:45 VCS INFO V-16-1-50135 User root fired command: haconf -makerw from localhost
    2013/05/17 12:42:45 VCS INFO V-16-1-50135 User root fired command: hatype -modify Mount  RestartLimit  2  from localhost
    2013/05/17 12:42:45 VCS INFO V-16-1-50135 User root fired command: haconf -dump -makero from localhost
    2013/05/17 12:42:50 VCS INFO V-16-10001-88 (pk-ercoss1) Application:activemq_oss_loggingbroker:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsson/eric_3pp/activemq_oss_loggingbroker:default] successfully.
    2013/05/17 12:42:50 VCS INFO V-16-10001-88 (pk-ercoss1) Application:activemq:offline:Executed [/ericsson/core/cluster/scripts/ stop /erics
    son/eric_3pp/activemq:default] successfully.
    2013/05/17 12:42:50 VCS INFO V-16-2-13075 (pk-ercoss1) Resource(tomcat) has reported unexpected OFFLINE 1 times, which is still within the Toler
    2013/05/17 12:42:50 VCS INFO V-16-1-10299 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is online on pk-ercoss2 (Not initiated by VCS)
    2013/05/17 12:42:50 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group Ossfs on all nodes
    2013/05/17 12:42:50 VCS NOTICE V-16-1-51034 Failover group Ossfs is already active. Ignoring Restart
    2013/05/17 12:42:50 VCS INFO V-16-1-10299 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss2 (Not initiated by VCS)
    2013/05/17 12:42:50 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group Sybase1 on all nodes
    2013/05/17 12:42:50 VCS NOTICE V-16-1-51034 Failover group Sybase1 is already active. Ignoring Restart
    2013/05/17 12:42:52 VCS INFO V-16-1-10305 Resource activemq (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:42:52 VCS INFO V-16-1-10305 Resource activemq_oss_loggingbroker (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS ini
    2013/05/17 12:43:16 VCS INFO V-16-10001-1 (pk-ercoss1) for Oss to go Offline
    2013/05/17 12:43:16 VCS ERROR V-16-2-13066 (pk-ercoss1) Agent is calling clean for resource(syb1_ip) because the resource is not up even after o
    nline completed.
    2013/05/17 12:43:16 VCS INFO V-16-2-13068 (pk-ercoss1) Resource(syb1_ip) - clean completed successfully.
    2013/05/17 12:43:16 VCS INFO V-16-2-13072 (pk-ercoss1) Resource(syb1_ip): Agent is retrying online (attempt number 1 of 1).
    2013/05/17 12:43:21 VCS INFO V-16-1-10299 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (Not initiated by VCS)
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group Sybase1 on all nodes
    2013/05/17 12:43:21 VCS NOTICE V-16-1-51034 Failover group Sybase1 is already active. Ignoring Restart
    2013/05/17 12:43:21 VCS INFO V-16-1-10299 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is online on pk-ercoss1 (Not initiated by VCS)
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group Ossfs on all nodes
    2013/05/17 12:43:21 VCS NOTICE V-16-1-51034 Failover group Ossfs is already active. Ignoring Restart
    2013/05/17 12:43:21 VCS INFO V-16-1-10298 Resource syb1_ip (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb1_ip (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:43:21 VCS INFO V-16-1-10305 Resource syb1_ip (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sybmaster_mount (Owner: Unspecified, Group: Sybase1) on System pk-erc
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercoss
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercos
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-erco
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-erc
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmsyblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-erco
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmsybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-erc
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource dbdumps_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercos
    2013/05/17 12:43:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb1bak_ip (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:43:22 VCS INFO V-16-1-10305 Resource sybmaster_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:22 VCS INFO V-16-1-10305 Resource syblog_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:22 VCS INFO V-16-1-10305 Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:22 VCS INFO V-16-1-10305 Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:24 VCS INFO V-16-1-10305 Resource syb1bak_ip (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:24 VCS INFO V-16-1-10305 Resource dbdumps_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:24 VCS INFO V-16-1-10305 Resource fmsyblog_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:24 VCS INFO V-16-1-10305 Resource fmsybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:24 VCS INFO V-16-1-10305 Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:24 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sybasedg (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:43:27 VCS INFO V-16-1-10305 Resource sybasedg (Owner: Unspecified, Group: Sybase1) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:27 VCS NOTICE V-16-1-10446 Group Sybase1 is offline on system pk-ercoss1
    2013/05/17 12:43:27 VCS INFO V-16-1-10493 Evaluating pk-ercoss2 as potential target node for group Sybase1
    2013/05/17 12:43:27 VCS INFO V-16-1-10493 Evaluating pk-ercoss1 as potential target node for group Sybase1
    2013/05/17 12:43:27 VCS INFO V-16-1-50017 MigrateQ for group Sybase1 contains system pk-ercoss1 with ALLOWNODE set; choosing it as the best syst
    2013/05/17 12:43:27 VCS INFO V-16-6-15025 (pk-ercoss1) hatrigger:invoking nfs_preonline
    2013/05/17 12:43:27 VCS INFO V-16-6-0 (pk-ercoss1) postoffline:(postoffline) Invoked with arg0=pk-ercoss1, arg1=Sybase1
    2013/05/17 12:43:27 VCS INFO V-16-6-15076 (pk-ercoss1) hatrigger:invoking regular preonline trigger if it exists
    2013/05/17 12:43:27 VCS INFO V-16-6-0 (pk-ercoss1) postoffline:Executing /ericsson/core/cluster/scripts/ with arg0=pk-ercoss1, arg
    2013/05/17 12:43:28 VCS INFO V-16-6-0 (pk-ercoss1) preonline:(preonline) Invoked with arg0=pk-ercoss1, arg1=Sybase1
    2013/05/17 12:43:28 VCS INFO V-16-6-0 (pk-ercoss1) done
    2013/05/17 12:43:28 VCS INFO V-16-6-0 (pk-ercoss1) preonline:Executing /ericsson/core/cluster/scripts/ with arg0=pk-ercoss1, arg1=Sy
    2013/05/17 12:43:28 VCS INFO V-16-6-0 (pk-ercoss1) postoffline:Completed execution of /ericsson/core/cluster/scripts/ for group Sy
    2013/05/17 12:43:28 VCS INFO V-16-6-15002 (pk-ercoss1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline pk-ercoss1 Sybase1   s
    2013/05/17 12:43:28 VCS INFO V-16-6-0 (pk-ercoss1) done
    2013/05/17 12:43:28 VCS INFO V-16-6-0 (pk-ercoss1) preonline:Completed execution of /ericsson/core/cluster/scripts/ for group Sybase
    1. Onlining group Sybase1 with -nopre option
    2013/05/17 12:43:28 VCS INFO V-16-1-50135 User root fired command: hagrp -online Sybase1  pk-ercoss1  from localhost
    2013/05/17 12:43:28 VCS NOTICE V-16-1-10166 Initiating manual online of group Sybase1 on system pk-ercoss1
    2013/05/17 12:43:28 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group Sybase1 on all nodes
    2013/05/17 12:43:28 VCS NOTICE V-16-1-10187 Received -nopre online command for group Sybase1 on system pk-ercoss1
    2013/05/17 12:43:28 VCS NOTICE V-16-1-10301 Initiating Online of Resource sybasedg (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:43:28 VCS NOTICE V-16-1-10301 Initiating Online of Resource syb1_ip (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:43:28 VCS NOTICE V-16-1-10301 Initiating Online of Resource syb1bak_ip (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1

    2013/05/17 12:43:31 VCS NOTICE V-16-1-10301 Initiating Online of Resource dbdumps_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercoss
    2013/05/17 12:43:31 VCS NOTICE V-16-1-10301 Initiating Online of Resource fmsybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-erco
    2013/05/17 12:43:31 VCS NOTICE V-16-1-10301 Initiating Online of Resource fmsyblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercos
    2013/05/17 12:43:31 VCS NOTICE V-16-1-10301 Initiating Online of Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-erco
    2013/05/17 12:43:31 VCS NOTICE V-16-1-10301 Initiating Online of Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercos
    2013/05/17 12:43:31 VCS NOTICE V-16-1-10301 Initiating Online of Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercoss
    2013/05/17 12:43:31 VCS NOTICE V-16-1-10301 Initiating Online of Resource syblog_mount (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:43:31 VCS NOTICE V-16-1-10301 Initiating Online of Resource sybmaster_mount (Owner: Unspecified, Group: Sybase1) on System pk-erco
    2013/05/17 12:43:32 VCS INFO V-16-1-10298 Resource dbdumps_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:33 VCS INFO V-16-1-10298 Resource syb1_ip (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:33 VCS INFO V-16-1-10298 Resource fmsybdata_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:33 VCS INFO V-16-1-10298 Resource fmsyblog_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:33 VCS INFO V-16-1-10298 Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:33 VCS INFO V-16-1-10298 Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:33 VCS INFO V-16-1-10298 Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:33 VCS INFO V-16-1-10298 Resource syblog_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:33 VCS INFO V-16-1-10298 Resource sybmaster_mount (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:33 VCS NOTICE V-16-1-10301 Initiating Online of Resource masterdataservice (Owner: Unspecified, Group: Sybase1) on System pk-er
    2013/05/17 12:43:44 VCS INFO V-16-20018-7 (pk-ercoss1) Sybase:masterdataservice:monitor:Setting cookie for proc = /sybase/ASE-15_0/bin/dataserve
    r -smasterdataservice -e/ossrc/sybdev/sybmaster/v, PID = /proc/7534/psinfo
    2013/05/17 12:43:44 VCS INFO V-16-1-10298 Resource masterdataservice (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated
    2013/05/17 12:43:44 VCS NOTICE V-16-1-10301 Initiating Online of Resource masterdataservice_BACKUP (Owner: Unspecified, Group: Sybase1) on Syste
    m pk-ercoss1
    2013/05/17 12:43:50 VCS INFO V-16-2-13075 (pk-ercoss1) Resource(tomcat) has reported unexpected OFFLINE 2 times, which is still within the Toler
    2013/05/17 12:43:55 VCS INFO V-16-20018-7 (pk-ercoss1) SybaseBk:masterdataservice_BACKUP:monitor:Setting cookie for proc = /sybase/ASE-15_0/bin/
    backupserver -Smasterdataservice_BACKUP -N25 -V1 -C20 -M/o, PID = /proc/7785/psinfo
    2013/05/17 12:43:56 VCS INFO V-16-1-10298 Resource masterdataservice_BACKUP (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS in
    2013/05/17 12:43:56 VCS NOTICE V-16-1-10301 Initiating Online of Resource stop_sybase (Owner: Unspecified, Group: Sybase1) on System pk-ercoss1
    2013/05/17 12:43:56 VCS INFO V-16-10001-88 (pk-ercoss1) Application:stop_sybase:online:Executed [/ericsson/core/cluster/scripts/ s
    tart] successfully.
    2013/05/17 12:43:59 VCS INFO V-16-1-10298 Resource stop_sybase (Owner: Unspecified, Group: Sybase1) is online on pk-ercoss1 (VCS initiated)
    2013/05/17 12:43:59 VCS NOTICE V-16-1-10447 Group Sybase1 is online on system pk-ercoss1
    2013/05/17 12:43:59 VCS INFO V-16-6-0 (pk-ercoss1) postonline:(postonline) Invoked with arg0=pk-ercoss1, arg1=Sybase1
    2013/05/17 12:43:59 VCS INFO V-16-6-0 (pk-ercoss1) postonline:Executing /ericsson/core/cluster/scripts/ with arg0=pk-ercoss1, arg1=
    2013/05/17 12:43:59 VCS INFO V-16-6-0 (pk-ercoss1) UP: 7534
    2013/05/17 12:43:59 VCS INFO V-16-6-0 (pk-ercoss1) postonline:Completed execution of /ericsson/core/cluster/scripts/ for group Syba
    2013/05/17 12:43:59 VCS INFO V-16-6-15002 (pk-ercoss1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postonline pk-ercoss1 Sybase1   su
    2013/05/17 12:44:16 VCS INFO V-16-10001-1 (pk-ercoss1) for Oss to go Offline
    2013/05/17 12:44:49 VCS ERROR V-16-2-13067 (pk-ercoss1) Agent is calling clean for resource(tomcat) because the resource became OFFLINE unexpect
    edly, on its own.
    2013/05/17 12:44:49 VCS INFO V-16-2-13068 (pk-ercoss1) Resource(tomcat) - clean completed successfully.
    2013/05/17 12:44:49 VCS ERROR V-16-2-13073 (pk-ercoss1) Resource(tomcat) became OFFLINE unexpectedly on its own. Agent is restarting (attempt nu
    mber 1 of 2) the resource.
    2013/05/17 12:44:49 VCS INFO V-16-10001-88 (pk-ercoss1) Application:tomcat:online:Executed [/ericsson/core/cluster/scripts/ start /ericsso
    n/eric_3pp/tomcat:default] successfully.
    2013/05/17 12:44:50 VCS INFO V-16-2-13716 (pk-ercoss1) Resource(tomcat): Output of the completed operation (online)
    2013/05/17 12:44:52 VCS NOTICE V-16-2-13076 (pk-ercoss1) Agent has successfully restarted resource(tomcat).
    2013/05/17 12:44:52 VCS INFO V-16-1-55031 Resource tomcat in online state received recurring online message on system pk-ercoss1
    2013/05/17 12:45:16 VCS INFO V-16-10001-1 (pk-ercoss1) for Oss to go Offline
    2013/05/17 12:45:32 VCS INFO V-16-10001-88 (pk-ercoss1) Application:tbs:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsson/e
    ric_ep/TBS:default] successfully.
    2013/05/17 12:45:33 VCS INFO V-16-2-13716 (pk-ercoss1) Resource(tbs): Output of the completed operation (offline)
    2013/05/17 12:45:35 VCS INFO V-16-1-10305 Resource tbs (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource opendj (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource nsa (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sb_nsa (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ext_nsa (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource oad (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource osagent (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource tomcat (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource notif (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ext_notif (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sentinel (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource rmi_reg (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource rmi_reg_ext (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource time_service (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:35 VCS NOTICE V-16-1-10300 Initiating Offline of Resource log_service (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:37 VCS INFO V-16-10001-88 (pk-ercoss1) Application:osagent:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericss
    on/eric_3pp/borland_osagent:default] successfully.
    2013/05/17 12:45:37 VCS INFO V-16-10001-88 (pk-ercoss1) Application:sb_nsa:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsso
    n/eric_3pp/borland_sbnameservice:default] successfully.
    2013/05/17 12:45:37 VCS INFO V-16-10001-88 (pk-ercoss1) Application:ext_nsa:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericss
    on/eric_3pp/borland_extnameservice:default] successfully.
    2013/05/17 12:45:37 VCS INFO V-16-10001-88 (pk-ercoss1) Application:oad:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsson/e
    ric_3pp/borland_oad:default] successfully.
    2013/05/17 12:45:38 VCS INFO V-16-10001-88 (pk-ercoss1) Application:rmi_reg_ext:offline:Executed [/ericsson/core/cluster/scripts/ stop /er
    icsson/eric_3pp/rmiregistry_external:default] successfully.
    2013/05/17 12:45:38 VCS INFO V-16-10001-88 (pk-ercoss1) Application:nsa:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsson/e
    ric_3pp/borland_nameservice:default] successfully.
    2013/05/17 12:45:39 VCS INFO V-16-10001-88 (pk-ercoss1) Application:rmi_reg:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericss
    on/eric_3pp/rmiregistry:default] successfully.
    2013/05/17 12:45:39 VCS INFO V-16-10001-88 (pk-ercoss1) Application:sentinel:offline:Executed [/ericsson/core/cluster/scripts/ stop /erics
    son/eric_3pp/snlm:default] successfully.
    2013/05/17 12:45:40 VCS INFO V-16-10001-88 (pk-ercoss1) Application:tomcat:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsso
    n/eric_3pp/tomcat:default] successfully.
    2013/05/17 12:45:41 VCS INFO V-16-1-10305 Resource sb_nsa (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:42 VCS INFO V-16-1-10305 Resource osagent (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:43 VCS INFO V-16-1-10305 Resource oad (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:43 VCS NOTICE V-16-1-10300 Initiating Offline of Resource gui_nsa (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:45:43 VCS INFO V-16-10001-88 (pk-ercoss1) Application:opendj:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsso
    n/eric_3pp/opendj:default] successfully.
    2013/05/17 12:45:44 VCS INFO V-16-1-10305 Resource ext_nsa (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:44 VCS INFO V-16-2-13075 (pk-ercoss1) Resource(gui_nsa) has reported unexpected OFFLINE 1 times, which is still within the Tole
    2013/05/17 12:45:44 VCS INFO V-16-10001-88 (pk-ercoss1) Application:time_service:offline:Executed [/ericsson/core/cluster/scripts/ stop /e
    ricsson/eric_3pp/openfusion_timeservice:default] successfully.
    2013/05/17 12:45:45 VCS INFO V-16-1-10305 Resource rmi_reg_ext (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:45 VCS INFO V-16-10001-88 (pk-ercoss1) Application:gui_nsa:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericss
    on/eric_3pp/borland_guinameservice:default] successfully.
    2013/05/17 12:45:46 VCS INFO V-16-1-10305 Resource nsa (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:46 VCS INFO V-16-10001-88 (pk-ercoss1) Application:log_service:offline:Executed [/ericsson/core/cluster/scripts/ stop /er
    icsson/eric_3pp/openfusion_logservice:default] successfully.
    2013/05/17 12:45:46 VCS INFO V-16-1-10305 Resource rmi_reg (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:46 VCS INFO V-16-1-10305 Resource sentinel (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:47 VCS INFO V-16-1-10305 Resource tomcat (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:48 VCS INFO V-16-1-10305 Resource opendj (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:48 VCS INFO V-16-1-10305 Resource time_service (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:48 VCS INFO V-16-1-10305 Resource gui_nsa (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:48 VCS INFO V-16-1-10305 Resource log_service (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:45:55 VCS INFO V-16-10001-88 (pk-ercoss1) Application:ext_notif:offline:Executed [/ericsson/core/cluster/scripts/ stop /eric
    sson/eric_3pp/openfusion_extnotificationservice:default] successfully.
    2013/05/17 12:45:57 VCS INFO V-16-10001-88 (pk-ercoss1) Application:notif:offline:Executed [/ericsson/core/cluster/scripts/ stop /ericsson
    /eric_3pp/openfusion_notificationservice:default] successfully.
    2013/05/17 12:45:57 VCS INFO V-16-1-10305 Resource ext_notif (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:46:00 VCS INFO V-16-1-10305 Resource notif (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:46:00 VCS NOTICE V-16-1-10300 Initiating Offline of Resource change_versant (Owner: Unspecified, Group: Oss) on System pk-ercoss1
    2013/05/17 12:46:16 VCS INFO V-16-10001-1 (pk-ercoss1) for Oss to go Offline
    2013/05/17 12:46:37 VCS INFO V-16-10001-88 (pk-ercoss1) Application:change_versant:offline:Executed [/ericsson/core/cluster/scripts/change_versa stop] successfully.
    2013/05/17 12:46:40 VCS INFO V-16-1-10305 Resource change_versant (Owner: Unspecified, Group: Oss) is offline on pk-ercoss1 (VCS initiated)
    2013/05/17 12:47:16 VCS INFO V-16-10001-1 (pk-ercoss1) for Oss to go Offline
    2013/05/17 12:47:19 VCS WARNING V-16-2-13011 (pk-ercoss1) Resource(glassfish): offline procedure did not complete within the expected time.
    2013/05/17 12:47:19 VCS ERROR V-16-2-13063 (pk-ercoss1) Agent is calling clean for resource(glassfish) because offline did not complete within t

  • Thanks for the complete explanation

    and I fully agree with you that we should look for setting Tolerance Limit rather than increasing the MonitorInterval



  • If you increase the MonitorInterval then the monitor will run less frequently so it is less likely that the monitor will run when there is a short network issue, but there could still be a network outage for 20 seconds, like you had, and the monitor runs in that outtage and so fails and likewise if you leave MonitorInterval as it is, you could be lucky and monitor does not run in the period the network is down if it is down for a short period between monitor intervals.  Also if you have a real network failure, then if you set to 2 mins then it will take longer to detect and this detection time can be increased futher as you have to wait for the Proxy to run.

    So really ToleranceLimit is the best attribute to set as this is not down to luck of if the Monitor runs at the same time as the network outage - ToleranceLimit of 1 will ignore one failure and then react to the next.

    Note also that the default MultiNICB MonitorInterval is 10 seconds, which is one of the few types that has MonitorInterval reduced from the default of 60 seconds, so I would not advise increasing this by too much, but as MonitorInterval is so low, you may want to set ToleranceLimit to a higher value like 2 or 3.

    For example if ToleranceLimit is set to 2 (with MonitorInterval of 10), then network outages of less than 20 seconds will definately be ignored and network outages of 30 seconds or more will definately cause resource to fault and in between 20 and 30 seconds it is down to when the Monitor actually runs during the network outage.


  • It' hard to know exactly what happened as don't have and what service groups were online on what node at the time of the failure, which service groups did not succeed in failing over and don't have extract of logs past 12:42:05, but here are a few points:

    1. If pub_mnic is in a parrallel group, then failure of this resource won't directly cause failure of "failover" application groups - these will fail if there is a Proxy to the pub_mnic resource, but the proxy may fail sometime afterwards, or maynot fail if NIC comes back up before proxy probes
    2. You had resource faults on both nodes - pub_mnic on both nodes and syb1_p1 and ossfs_p1 on System pk-ercoss2
    3. As NICs failed on both nodes, then if Proxies were failed on target failover node, then this would stop service failing over



  • Hello


    PUBLAN service group ,of which pub_mnic is a resource, is a parallel service group

    so, if both the interfaces on one node failed , should not be IP resources and other service groups should be properly failed over to other node .


  • Hi symsonu,

    As Mike said, the MultiNICB resource pub_mnic was fault due to NIC down and there is no tolerance limit setting on it. So all parent resources/groups depending on it were fault. 

    You could setup tolerancelimit as Mike commented, or tune MonitorInterval of resource type MultiNICB a little bit longer like 120 secs to tolerant unstable network situation. 

  • You had a network failure:

    May 17 12:41:37 pk-ercoss1 in.mpathd[6024]: [ID 594170 daemon.error] NIC failure detected on oce9 o
    f group pub_mnic
    May 17 12:41:37 pk-ercoss1 in.mpathd[6024]: [ID 832587 daemon.error] Successfully failed over from N
    IC oce9 to NIC oce0
    May 17 12:41:38 pk-ercoss1 in.mpathd[6024]: [ID 168056 daemon.error] All Interfaces in group pub_m have failed

    So oce9 failed so then oce0 was tried but this failed too.  Network was not down for long - less than 20 seconds:

    May 17 12:41:56 pk-ercoss1 in.mpathd[6024]: [ID 299542 daemon.error] NIC repair detected on oce0 of group pub_mnic
    May 17 12:41:56 pk-ercoss1 in.mpathd[6024]: [ID 620804 daemon.error] Successfully failed back to NIC oce0
    May 17 12:41:56 pk-ercoss1 in.mpathd[6024]: [ID 237757 daemon.error] At least 1 interface (oce0) of group pub_mnic has repaired
    May 17 12:41:57 pk-ercoss1 in.mpathd[6024]: [ID 299542 daemon.error] NIC repair detected on oce9 of group pub_mnic
    May 17 12:41:57 pk-ercoss1 in.mpathd[6024]: [ID 620804 daemon.error] Successfully failed back to NIC oce9

    You had Tolerance set on ossfs_ip so this network error was ignored by this resource, but tolerance was not set on pub_mnic so this failed.  I would recommend setting tolerance on pub_mnic to stop VCS failing over if you had a short nework outage again by running:

    hatype -modify MultiNICB ToleranceLimit 1
