cancel
Showing results for 
Search instead for 
Did you mean: 

Need to know the root cause for the service outage in vcs cluster

symsonu
Level 6

Hello Friends,

I need your help in finding  out what has caused the service outage and how can we overcome this situation

 

logs from message file :--> /var/adm/messages

==========================================

 

Sep  2 23:35:04 duadm2 genunix: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (oce1) node 0 in trouble
Sep  2 23:35:05 duadm2 genunix: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 2 (oce0) node 0 in trouble
Sep  2 23:35:09 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 2 (oce0) node 0 inactive 8 sec (2883648)
Sep  2 23:35:10 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (oce1) node 0 inactive 8 sec (6644221)
Sep  2 23:35:10 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 2 (oce0) node 0 inactive 9 sec (2883648)
Sep  2 23:35:11 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (oce1) node 0 inactive 9 sec (6644221)
Sep  2 23:35:11 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 2 (oce0) node 0 inactive 10 sec (2883648)
Sep  2 23:35:12 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (oce1) node 0 inactive 10 sec (6644221)
Sep  2 23:35:12 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 2 (oce0) node 0 inactive 11 sec (2883648)
Sep  2 23:35:13 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (oce1) node 0 inactive 11 sec (6644221)
Sep  2 23:35:13 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 2 (oce0) node 0 inactive 12 sec (2883648)
Sep  2 23:35:14 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (oce1) node 0 inactive 12 sec (6644221)
Sep  2 23:35:14 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 2 (oce0) node 0 inactive 13 sec (2883648)
Sep  2 23:35:15 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (oce1) node 0 inactive 13 sec (6644221)
Sep  2 23:35:15 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 2 (oce0) node 0 inactive 14 sec (2883648)
Sep  2 23:35:15 duadm2 genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 2 (oce0) node 0. 4 more to go.
Sep  2 23:35:16 duadm2 genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 2 (oce0) node 0. 3 more to go.
Sep  2 23:35:16 duadm2 genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 0 (oce1) node 0. 4 more to go.
Sep  2 23:35:16 duadm2 genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 2 (oce0) node 0. 2 more to go.
Sep  2 23:35:16 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (oce1) node 0 inactive 14 sec (6644221)
Sep  2 23:35:16 duadm2 genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 0 (oce1) node 0. 3 more to go.
Sep  2 23:35:16 duadm2 genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 2 (oce0) node 0. 1 more to go.
Sep  2 23:35:16 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 2 (oce0) node 0 inactive 15 sec (2883648)
Sep  2 23:35:16 duadm2 genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 0 (oce1) node 0. 2 more to go.
Sep  2 23:35:16 duadm2 genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 2 (oce0) node 0. 0 more to go.
Sep  2 23:35:16 duadm2 genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 0 (oce1) node 0. 1 more to go.
Sep  2 23:35:16 duadm2 genunix: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 2 (oce0) node 0 expired
Sep  2 23:35:17 duadm2 genunix: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 0 (oce1) node 0. 0 more to go.
Sep  2 23:35:17 duadm2 genunix: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (oce1) node 0 inactive 15 sec (6644221)
Sep  2 23:35:17 duadm2 genunix: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 0 (oce1) node 0 expired
Sep  2 23:35:21 duadm2 genunix: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port h gen   288706 membership 01
Sep  2 23:35:21 duadm2 genunix: [ID 608499 kern.notice] GAB INFO V-15-1-20037 Port h gen   288706   jeopardy ;1
Sep  2 23:35:21 duadm2 genunix: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port a gen   288701 membership 01
Sep  2 23:35:21 duadm2 genunix: [ID 608499 kern.notice] GAB INFO V-15-1-20037 Port a gen   288701   jeopardy ;1
Sep  2 23:35:21 duadm2 genunix: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port b gen   288704 membership 01
Sep  2 23:35:21 duadm2 genunix: [ID 608499 kern.notice] GAB INFO V-15-1-20037 Port b gen   288704   jeopardy ;1
Sep  2 23:35:21 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS INFO V-16-1-10077 Received new cluster membership
Sep  2 23:35:21 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10111 System duadm2 (Node '1') is in Regular and Jeopardy Memberships - Membership: 0x3
, Jeopardy: 0x2
Sep  2 23:56:50 duadm2 in.mpathd[6542]: [ID 585766 daemon.error] Cannot meet requested failure detection time of 10000 ms on (inet oce2) new failure detection time
for group "stor_mnic" is 48426 ms
Sep  2 23:56:57 duadm2 in.mpathd[6542]: [ID 594170 daemon.error] NIC failure detected on oce0 of group pub_mnic
Sep  2 23:56:57 duadm2 in.mpathd[6542]: [ID 832587 daemon.error] Successfully failed over from NIC oce0 to NIC oce9
Sep  2 23:57:01 duadm2 in.mpathd[6542]: [ID 299542 daemon.error] NIC repair detected on oce0 of group pub_mnic
Sep  2 23:57:01 duadm2 in.mpathd[6542]: [ID 620804 daemon.error] Successfully failed back to NIC oce0
Sep  2 23:57:10 duadm2 in.mpathd[6542]: [ID 594170 daemon.error] NIC failure detected on oce0 of group pub_mnic
Sep  2 23:57:10 duadm2 in.mpathd[6542]: [ID 832587 daemon.error] Successfully failed over from NIC oce0 to NIC oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.027 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.030 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.027 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.030 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.027 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.030 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 567813 kern.warning] WARNING: oce9:2 has duplicate address 010.001.014.027 (claimed by 44:1e:a1:74:e5:46); disabled
Sep  2 23:57:10 duadm2 ip: [ID 567813 kern.warning] WARNING: oce9:1 has duplicate address 010.001.014.030 (claimed by 44:1e:a1:74:e5:46); disabled
Sep  2 23:57:11 duadm2 in.mpathd[6542]: [ID 299542 daemon.error] NIC repair detected on oce0 of group pub_mnic
Sep  2 23:57:11 duadm2 in.mpathd[6542]: [ID 620804 daemon.error] Successfully failed back to NIC oce0
Sep  2 23:57:20 duadm2 in.mpathd[6542]: [ID 168056 daemon.error] All Interfaces in group pub_mnic have failed
Sep  2 23:57:27 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on s
ys duadm2
Sep  2 23:57:28 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10205 Group PubLan is faulted on system duadm2
Sep  2 23:57:29 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on s
ys duadm1
Sep  2 23:57:30 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource stor_mnic (Owner: Unspecified, Group: StorLan) is FAULTED (timed out) on
 sys duadm1
Sep  2 23:57:31 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10205 Group PubLan is faulted on system duadm1
Sep  2 23:57:53 duadm2 in.mpathd[6542]: [ID 594170 daemon.error] NIC failure detected on oce11 of group stor_mnic
Sep  2 23:57:53 duadm2 in.mpathd[6542]: [ID 832587 daemon.error] Successfully failed over from NIC oce11 to NIC oce2
Sep  2 23:57:57 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sy
s duadm1
Sep  2 23:57:57 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on s
ys duadm1
Sep  2 23:57:58 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 (duadm1) Agent is calling clean for resource(ossfs_ip) because the resource becam
e OFFLINE unexpectedly, on its own.
Sep  2 23:57:59 duadm2 in.mpathd[6542]: [ID 168056 daemon.error] All Interfaces in group stor_mnic have failed
Sep  2 23:58:02 duadm2 AgentFramework[6332]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 Thread(8) Agent is calling clean for resource(syb1_ip) because the res
ource became OFFLINE unexpectedly, on its own.
Sep  2 23:58:02 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 (duadm2) Agent is calling clean for resource(syb1_ip) because the resource became
 OFFLINE unexpectedly, on its own.
Sep  2 23:58:02 duadm2 AgentFramework[6332]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13068 Thread(8) Resource(syb1_ip) - clean completed successfully.
Sep  2 23:58:05 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on s
ys duadm2
Sep  2 23:58:05 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sy
s duadm2
Sep  2 23:58:08 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource stor_mnic (Owner: Unspecified, Group: StorLan) is FAULTED (timed out) on
 sys duadm2
Sep  2 23:58:20 duadm2 ldap_cachemgr[516]: [ID 293258 daemon.warning] libsldap: Status: 91  Mesg: openConnection: simple bind failed - Can't connect to the LDAP ser
ver
Sep  2 23:58:21 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 (duadm1) Agent is calling clean for resource(stor_p) because the resource became
OFFLINE unexpectedly, on its own.
Sep  2 23:58:21 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13070 (duadm1) Resource(stor_p) - clean not implemented.
Sep  2 23:58:22 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10205 Group StorLan is faulted on system duadm1
Sep  2 23:58:24 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13027 (duadm1) Resource(cluster_maint) - monitor procedure did not complete within the
expected time.
Sep  2 23:58:30 duadm2 AgentFramework[6345]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 Thread(3) Agent is calling clean for resource(stor_p) because the reso
urce became OFFLINE unexpectedly, on its own.
Sep  2 23:58:30 duadm2 AgentFramework[6345]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13070 Thread(3) Resource(stor_p) - clean not implemented.
Sep  2 23:58:30 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 (duadm2) Agent is calling clean for resource(stor_p) because the resource became
OFFLINE unexpectedly, on its own.
Sep  2 23:58:30 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13070 (duadm2) Resource(stor_p) - clean not implemented.
Sep  2 23:58:31 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10205 Group StorLan is faulted on system duadm2
Sep  2 23:59:19 duadm2 ldap_cachemgr[516]: [ID 293258 daemon.warning] libsldap: Status: 91  Mesg: openConnection: simple bind failed - Can't connect to the LDAP ser
ver
Sep  2 23:59:19 duadm2 ldap_cachemgr[516]: [ID 545954 daemon.error] libsldap: makeConnection: failed to open connection to 10.1.14.35
Sep  2 23:59:19 duadm2 ldap_cachemgr[516]: [ID 687686 daemon.warning] libsldap: Falling back to anonymous, non-SSL mode for __ns_ldap_getRootDSE. openConnection: si
mple bind failed - Can't connect to the LDAP server
Sep  2 23:59:20 duadm2 ldap_cachemgr[516]: [ID 293258 daemon.warning] libsldap: Status: 91  Mesg: openConnection: simple bind failed - Can't connect to the LDAP ser
ver
Sep  2 23:59:20 duadm2 ldap_cachemgr[516]: [ID 292100 daemon.warning] libsldap: could not remove 10.1.14.35 from servers list
Sep  2 23:59:20 duadm2 ldap_cachemgr[516]: [ID 292100 daemon.warning] libsldap: could not remove 10.1.14.37 from servers list
Sep  2 23:59:20 duadm2 ldap_cachemgr[516]: [ID 293258 daemon.warning] libsldap: Status: 7  Mesg: Session error no available conn.
Sep  2 23:59:20 duadm2 ldap_cachemgr[516]: [ID 186574 daemon.error] Error: Unable to refresh profile:default: Session error no available conn.
Sep  3 00:00:08 duadm2 svc.startd[9]: [ID 122153 daemon.warning] svc:/ericsson/eric_monitor/ddc:default: Method or service exit timed out.  Killing contract 145156.
Sep  3 00:00:08 duadm2 svc.startd[9]: [ID 636263 daemon.warning] svc:/ericsson/eric_monitor/ddc:default: Method "/opt/ERICddc/bin/ddc stop" failed due to signal KIL
L.
Sep  3 00:00:19 duadm2 ldap_cachemgr[516]: [ID 293258 daemon.warning] libsldap: Status: 91  Mesg: openConnection: simple bind failed - Can't connect to the LDAP ser
ver
Sep  3 00:00:19 duadm2 ldap_cachemgr[516]: [ID 545954 daemon.error] libsldap: makeConnection: failed to open connection to 10.1.14.35
Sep  3 00:00:19 duadm2 ldap_cachemgr[516]: [ID 293258 daemon.warning] libsldap: Status: 91  Mesg: openConnection: simple bind failed - Can't connect to the LDAP ser
ver
Sep  3 00:00:19 duadm2 ldap_cachemgr[516]: [ID 545954 daemon.error] libsldap: makeConnection: failed to open connection to 10.1.14.37
Sep  3 00:00:19 duadm2 ldap_cachemgr[516]: [ID 687686 daemon.warning] libsldap: Falling back to anonymous, non-SSL mode for __ns_ldap_getRootDSE. openConnection: si
mple bind failed - Can't connect to the LDAP server
Sep  3 00:00:20 duadm2 ldap_cachemgr[516]: [ID 293258 daemon.warning] libsldap: Status: 91  Mesg: openConnection: simple bind failed - Can't connect to the LDAP server
Sep  3 00:00:20 duadm2 last message repeated 1 time
Sep  3 00:00:20 duadm2 ldap_cachemgr[516]: [ID 545954 daemon.error] libsldap: makeConnection: failed to open connection to 10.1.14.35
Sep  3 00:00:20 duadm2 ldap_cachemgr[516]: [ID 545954 daemon.error] libsldap: makeConnection: failed to open connection to 10.1.14.37
Sep  3 00:00:20 duadm2 ldap_cachemgr[516]: [ID 687686 daemon.warning] libsldap: Falling back to anonymous, non-SSL mode for __ns_ldap_getRootDSE. openConnection: simple bind failed - Can't connect to the LDAP server
Sep  3 00:00:20 duadm2 last message repeated 1 time

 

============================================================

 

engine logs :-->

 

2013/09/02 23:35:21 VCS INFO V-16-1-10077 Received new cluster membership
2013/09/02 23:35:21 VCS NOTICE V-16-1-10112 System (duadm1) - Membership: 0x3, DDNA: 0x2
2013/09/02 23:35:21 VCS ERROR V-16-1-10111 System duadm2 (Node '1') is in Regular and Jeopardy Memberships - Membership: 0x3, Jeopardy: 0x2
2013/09/02 23:35:25 VCS WARNING V-16-1-11141 LLT heartbeat link status changed. Previous status = oce1 UP oce8 UP oce0 UP; Current status = oce1 DOWN oce8 UP oce0 DOWN.
2013/09/02 23:57:27 VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys duadm2
2013/09/02 23:57:27 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pub_p (Owner: Unspecified, Group: PubLan) on System duadm2
2013/09/02 23:57:27 VCS INFO V-16-6-0 (duadm2) resfault:(resfault) Invoked with arg0=duadm2, arg1=pub_mnic, arg2=ONLINE
2013/09/02 23:57:27 VCS INFO V-16-0 (duadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm2 ,arg2=pub_mnic
2013/09/02 23:57:27 VCS INFO V-16-6-15002 (duadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm2 pub_mnic ONLINE  successfully
2013/09/02 23:57:28 VCS INFO V-16-1-10305 Resource pub_p (Owner: Unspecified, Group: PubLan) is offline on duadm2 (VCS initiated)
2013/09/02 23:57:28 VCS ERROR V-16-1-10205 Group PubLan is faulted on system duadm2
2013/09/02 23:57:28 VCS NOTICE V-16-1-10446 Group PubLan is offline on system duadm2
2013/09/02 23:57:28 VCS INFO V-16-1-10493 Evaluating duadm1 as potential target node for group PubLan
2013/09/02 23:57:28 VCS INFO V-16-1-50010 Group PubLan is online or faulted on system duadm1
2013/09/02 23:57:28 VCS INFO V-16-1-10493 Evaluating duadm2 as potential target node for group PubLan
2013/09/02 23:57:28 VCS INFO V-16-1-50010 Group PubLan is online or faulted on system duadm2
2013/09/02 23:57:28 VCS NOTICE V-16-1-10235 Restart is set for group PubLan. Group will be brought online if fault on persistent resource clears. If group is brought online anywhere else from AutoStartList or manually, then Restart will be reset
2013/09/02 23:57:28 VCS INFO V-16-6-0 (duadm2) postoffline:(postoffline) Invoked with arg0=duadm2, arg1=PubLan
2013/09/02 23:57:28 VCS INFO V-16-2-13075 (duadm1) Resource(ossfs_ip) has reported unexpected OFFLINE 1 times, which is still within the ToleranceLimit(1).
2013/09/02 23:57:28 VCS INFO V-16-6-0 (duadm2) postoffline:Executing /ericsson/core/cluster/scripts/postoffline.sh with arg0=duadm2, arg1=PubLan
2013/09/02 23:57:28 VCS INFO V-16-6-0 (duadm2) postoffline.sh:PubLan:Nothing done
2013/09/02 23:57:28 VCS INFO V-16-6-0 (duadm2) postoffline:Completed execution of /ericsson/core/cluster/scripts/postoffline.sh for group PubLan
2013/09/02 23:57:29 VCS INFO V-16-6-15002 (duadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline duadm2 PubLan   successfully
2013/09/02 23:57:29 VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys duadm1
2013/09/02 23:57:29 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pub_p (Owner: Unspecified, Group: PubLan) on System duadm1
2013/09/02 23:57:30 VCS INFO V-16-6-0 (duadm1) resfault:(resfault) Invoked with arg0=duadm1, arg1=pub_mnic, arg2=ONLINE
2013/09/02 23:57:30 VCS INFO V-16-0 (duadm1) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm1 ,arg2=pub_mnic
2013/09/02 23:57:30 VCS INFO V-16-6-15002 (duadm1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm1 pub_mnic ONLINE  successfully
2013/09/02 23:57:31 VCS ERROR V-16-1-10303 Resource stor_mnic (Owner: Unspecified, Group: StorLan) is FAULTED (timed out) on sys duadm1
2013/09/02 23:57:31 VCS WARNING V-16-1-13310 Offlining parent group DDCMon on system duadm1
2013/09/02 23:57:31 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ddc_app (Owner: Unspecified, Group: DDCMon) on System duadm1
2013/09/02 23:57:31 VCS NOTICE V-16-1-10300 Initiating Offline of Resource hyperic_app (Owner: Unspecified, Group: DDCMon) on System duadm1
2013/09/02 23:57:31 VCS INFO V-16-1-10305 Resource pub_p (Owner: Unspecified, Group: PubLan) is offline on duadm1 (VCS initiated)
2013/09/02 23:57:31 VCS ERROR V-16-1-10205 Group PubLan is faulted on system duadm1
2013/09/02 23:57:31 VCS NOTICE V-16-1-10446 Group PubLan is offline on system duadm1
2013/09/02 23:57:31 VCS INFO V-16-1-10493 Evaluating duadm1 as potential target node for group PubLan
2013/09/02 23:57:31 VCS INFO V-16-1-50010 Group PubLan is online or faulted on system duadm1
2013/09/02 23:57:31 VCS INFO V-16-1-10493 Evaluating duadm2 as potential target node for group PubLan
2013/09/02 23:57:31 VCS INFO V-16-1-50010 Group PubLan is online or faulted on system duadm2
2013/09/02 23:57:31 VCS NOTICE V-16-1-10235 Restart is set for group PubLan. Group will be brought online if fault on persistent resource clears. If group is brought online anywhere else from AutoStartList or manually, then Restart will be reset
2013/09/02 23:57:31 VCS INFO V-16-6-0 (duadm1) resfault:(resfault) Invoked with arg0=duadm1, arg1=stor_mnic, arg2=ONLINE
2013/09/02 23:57:31 VCS INFO V-16-6-0 (duadm1) postoffline:(postoffline) Invoked with arg0=duadm1, arg1=PubLan
2013/09/02 23:57:31 VCS INFO V-16-0 (duadm1) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm1 ,arg2=stor_mnic
2013/09/02 23:57:31 VCS INFO V-16-6-0 (duadm1) postoffline:Executing /ericsson/core/cluster/scripts/postoffline.sh with arg0=duadm1, arg1=PubLan
2013/09/02 23:57:31 VCS INFO V-16-6-15002 (duadm1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm1 stor_mnic ONLINE  successfully
2013/09/02 23:57:31 VCS INFO V-16-6-0 (duadm1) postoffline.sh:PubLan:Nothing done
2013/09/02 23:57:31 VCS INFO V-16-6-0 (duadm1) postoffline:Completed execution of /ericsson/core/cluster/scripts/postoffline.sh for group PubLan
2013/09/02 23:57:31 VCS INFO V-16-6-15002 (duadm1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline duadm1 PubLan   successfully
2013/09/02 23:57:32 VCS INFO V-16-2-13075 (duadm2) Resource(syb1_ip) has reported unexpected OFFLINE 1 times, which is still within the ToleranceLimit(1).
2013/09/02 23:57:41 VCS INFO V-16-2-13075 (duadm1) Resource(pms_ip) has reported unexpected OFFLINE 1 times, which is still within the ToleranceLimit(1).
2013/09/02 23:57:41 VCS INFO V-16-2-13075 (duadm1) Resource(snmp_ip) has reported unexpected OFFLINE 1 times, which is still within the ToleranceLimit(1).
2013/09/02 23:57:41 VCS INFO V-16-2-13075 (duadm1) Resource(cms_ip) has reported unexpected OFFLINE 1 times, which is still within the ToleranceLimit(1).
2013/09/02 23:57:57 VCS ERROR V-16-1-10303 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys duadm1
2013/09/02 23:57:57 VCS NOTICE V-16-1-10300 Initiating Offline of Resource snmp_ip (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/02 23:57:57 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pms_ip (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/02 23:57:57 VCS NOTICE V-16-1-10300 Initiating Offline of Resource stop_oss (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/02 23:57:57 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cms_ip (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/02 23:57:57 VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys duadm1
2013/09/02 23:57:58 VCS INFO V-16-1-10305 Resource snmp_ip (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/02 23:57:58 VCS INFO V-16-1-10305 Resource pms_ip (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/02 23:57:58 VCS INFO V-16-1-10305 Resource cms_ip (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/02 23:57:58 VCS INFO V-16-6-0 (duadm1) resfault:(resfault) Invoked with arg0=duadm1, arg1=syb1_p1, arg2=ONLINE
2013/09/02 23:57:58 VCS INFO V-16-6-0 (duadm1) resfault:(resfault) Invoked with arg0=duadm1, arg1=ossfs_p1, arg2=ONLINE
2013/09/02 23:57:58 VCS INFO V-16-0 (duadm1) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm1 ,arg2=syb1_p1
2013/09/02 23:57:58 VCS INFO V-16-0 (duadm1) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm1 ,arg2=ossfs_p1
2013/09/02 23:57:58 VCS INFO V-16-6-15002 (duadm1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm1 syb1_p1 ONLINE  successfully
2013/09/02 23:57:58 VCS INFO V-16-6-15002 (duadm1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm1 ossfs_p1 ONLINE  successfully
2013/09/02 23:57:58 VCS ERROR V-16-2-13067 (duadm1) Agent is calling clean for resource(ossfs_ip) because the resource became OFFLINE unexpectedly, on its own.
2013/09/02 23:57:58 VCS INFO V-16-2-13068 (duadm1) Resource(ossfs_ip) - clean completed successfully.
2013/09/02 23:57:59 VCS INFO V-16-1-10307 Resource ossfs_ip (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (Not initiated by VCS)
2013/09/02 23:57:59 VCS INFO V-16-6-0 (duadm1) resfault:(resfault) Invoked with arg0=duadm1, arg1=ossfs_ip, arg2=ONLINE
2013/09/02 23:57:59 VCS INFO V-16-0 (duadm1) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm1 ,arg2=ossfs_ip
2013/09/02 23:57:59 VCS INFO V-16-6-15002 (duadm1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm1 ossfs_ip ONLINE  successfully
2013/09/02 23:58:02 VCS ERROR V-16-2-13067 (duadm2) Agent is calling clean for resource(syb1_ip) because the resource became OFFLINE unexpectedly, on its own.
2013/09/02 23:58:02 VCS INFO V-16-2-13068 (duadm2) Resource(syb1_ip) - clean completed successfully.
2013/09/02 23:58:02 VCS INFO V-16-1-10307 Resource syb1_ip (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (Not initiated by VCS)
2013/09/02 23:58:02 VCS NOTICE V-16-1-10300 Initiating Offline of Resource stop_sybase (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/02 23:58:02 VCS INFO V-16-6-0 (duadm2) resfault:(resfault) Invoked with arg0=duadm2, arg1=syb1_ip, arg2=ONLINE
2013/09/02 23:58:02 VCS INFO V-16-10001-88 (duadm2) Application:stop_sybase:offline:Executed [/ericsson/core/cluster/scripts/stop_sybase.sh stop] successfully.
2013/09/02 23:58:02 VCS INFO V-16-0 (duadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm2 ,arg2=syb1_ip
2013/09/02 23:58:02 VCS INFO V-16-6-15002 (duadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm2 syb1_ip ONLINE  successfully
2013/09/02 23:58:03 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss  duadm2  from localhost
2013/09/02 23:58:03 VCS NOTICE V-16-1-10208 Initiating switch of group Oss from system duadm1 to system duadm2
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource activemq (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource activemq_oss_loggingbroker (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource alex (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource apache (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cron (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmria (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource glassfish (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource imgr_tomcat (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource restart_mc (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb_log_mon (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb_proc_mon (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource trapdist (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vrsnt_log_mon (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:58:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/02 23:58:03 VCS INFO V-16-10001-88 (duadm1) Application:cron:offline:Executed [/ericsson/core/cluster/scripts/cron.sh stop] successfully.
2013/09/02 23:58:03 VCS INFO V-16-10001-88 (duadm1) Application:alex:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/eric_3pp/alex:default] successfully.
2013/09/02 23:58:03 VCS INFO V-16-10001-88 (duadm1) Application:apache:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /network/http:apache2] successfully.
2013/09/02 23:58:05 VCS INFO V-16-10001-88 (duadm1) Application:restart_mc:offline:Executed [/ericsson/core/cluster/scripts/restart_mc.sh stop] successfully.
2013/09/02 23:58:05 VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys duadm2
2013/09/02 23:58:05 VCS ERROR V-16-1-10303 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys duadm2
2013/09/02 23:58:05 VCS INFO V-16-6-0 (duadm2) resfault:(resfault) Invoked with arg0=duadm2, arg1=syb1_p1, arg2=ONLINE
2013/09/02 23:58:05 VCS INFO V-16-6-0 (duadm2) resfault:(resfault) Invoked with arg0=duadm2, arg1=ossfs_p1, arg2=ONLINE
2013/09/02 23:58:05 VCS INFO V-16-0 (duadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm2 ,arg2=syb1_p1
2013/09/02 23:58:05 VCS INFO V-16-0 (duadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm2 ,arg2=ossfs_p1
2013/09/02 23:58:05 VCS INFO V-16-6-15002 (duadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm2 syb1_p1 ONLINE  successfully
2013/09/02 23:58:05 VCS INFO V-16-6-15002 (duadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm2 ossfs_p1 ONLINE  successfully
2013/09/02 23:58:05 VCS INFO V-16-1-10305 Resource stop_sybase (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/02 23:58:05 VCS NOTICE V-16-1-10300 Initiating Offline of Resource masterdataservice_BACKUP (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/02 23:58:08 VCS ERROR V-16-1-10303 Resource stor_mnic (Owner: Unspecified, Group: StorLan) is FAULTED (timed out) on sys duadm2
2013/09/02 23:58:08 VCS WARNING V-16-1-13310 Offlining parent group DDCMon on system duadm2
2013/09/02 23:58:08 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ddc_app (Owner: Unspecified, Group: DDCMon) on System duadm2
2013/09/02 23:58:08 VCS NOTICE V-16-1-10300 Initiating Offline of Resource hyperic_app (Owner: Unspecified, Group: DDCMon) on System duadm2
2013/09/02 23:58:08 VCS INFO V-16-6-0 (duadm2) resfault:(resfault) Invoked with arg0=duadm2, arg1=stor_mnic, arg2=ONLINE
2013/09/02 23:58:08 VCS INFO V-16-0 (duadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm2 ,arg2=stor_mnic
2013/09/02 23:58:08 VCS INFO V-16-6-15002 (duadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm2 stor_mnic ONLINE  successfully
2013/09/02 23:58:21 VCS ERROR V-16-2-13067 (duadm1) Agent is calling clean for resource(stor_p) because the resource became OFFLINE unexpectedly, on its own.
2013/09/02 23:58:21 VCS ERROR V-16-2-13070 (duadm1) Resource(stor_p) - clean not implemented.
2013/09/02 23:58:22 VCS INFO V-16-1-10307 Resource stor_p (Owner: Unspecified, Group: StorLan) is offline on duadm1 (Not initiated by VCS)
2013/09/02 23:58:22 VCS ERROR V-16-1-10205 Group StorLan is faulted on system duadm1
2013/09/02 23:58:22 VCS NOTICE V-16-1-10446 Group StorLan is offline on system duadm1
2013/09/02 23:58:22 VCS INFO V-16-1-10493 Evaluating duadm1 as potential target node for group StorLan
2013/09/02 23:58:22 VCS INFO V-16-1-50010 Group StorLan is online or faulted on system duadm1
2013/09/02 23:58:22 VCS INFO V-16-1-10493 Evaluating duadm2 as potential target node for group StorLan
2013/09/02 23:58:22 VCS INFO V-16-1-50010 Group StorLan is online or faulted on system duadm2
2013/09/02 23:58:22 VCS NOTICE V-16-1-10235 Restart is set for group StorLan. Group will be brought online if fault on persistent resource clears. If group is brought online anywhere else from AutoStartList or manually, then Restart will be reset
2013/09/02 23:58:22 VCS INFO V-16-6-0 (duadm1) resfault:(resfault) Invoked with arg0=duadm1, arg1=stor_p, arg2=ONLINE
2013/09/02 23:58:22 VCS INFO V-16-6-0 (duadm1) postoffline:(postoffline) Invoked with arg0=duadm1, arg1=StorLan
2013/09/02 23:58:22 VCS INFO V-16-0 (duadm1) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm1 ,arg2=stor_p
2013/09/02 23:58:22 VCS INFO V-16-6-0 (duadm1) postoffline:Executing /ericsson/core/cluster/scripts/postoffline.sh with arg0=duadm1, arg1=StorLan
2013/09/02 23:58:22 VCS INFO V-16-6-15002 (duadm1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm1 stor_p ONLINE  successfully
2013/09/02 23:58:22 VCS INFO V-16-6-0 (duadm1) postoffline.sh:StorLan:Nothing done
2013/09/02 23:58:22 VCS INFO V-16-6-0 (duadm1) postoffline:Completed execution of /ericsson/core/cluster/scripts/postoffline.sh for group StorLan
2013/09/02 23:58:22 VCS INFO V-16-6-15002 (duadm1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline duadm1 StorLan   successfully
2013/09/02 23:58:24 VCS ERROR V-16-2-13027 (duadm1) Resource(cluster_maint) - monitor procedure did not complete within the expected time.
2013/09/02 23:58:30 VCS ERROR V-16-2-13067 (duadm2) Agent is calling clean for resource(stor_p) because the resource became OFFLINE unexpectedly, on its own.
2013/09/02 23:58:30 VCS ERROR V-16-2-13070 (duadm2) Resource(stor_p) - clean not implemented.
2013/09/02 23:58:31 VCS INFO V-16-1-10307 Resource stor_p (Owner: Unspecified, Group: StorLan) is offline on duadm2 (Not initiated by VCS)
2013/09/02 23:58:31 VCS ERROR V-16-1-10205 Group StorLan is faulted on system duadm2
2013/09/02 23:58:31 VCS NOTICE V-16-1-10446 Group StorLan is offline on system duadm2
2013/09/02 23:58:31 VCS INFO V-16-1-10493 Evaluating duadm1 as potential target node for group StorLan
2013/09/02 23:58:31 VCS INFO V-16-1-50010 Group StorLan is online or faulted on system duadm1
2013/09/02 23:58:31 VCS INFO V-16-1-10493 Evaluating duadm2 as potential target node for group StorLan
2013/09/02 23:58:31 VCS INFO V-16-1-50010 Group StorLan is online or faulted on system duadm2
2013/09/02 23:58:31 VCS NOTICE V-16-1-10235 Restart is set for group StorLan. Group will be brought online if fault on persistent resource clears. If group is brought online anywhere else from AutoStartList or manually, then Restart will be reset
2013/09/02 23:58:31 VCS INFO V-16-6-0 (duadm2) resfault:(resfault) Invoked with arg0=duadm2, arg1=stor_p, arg2=ONLINE
2013/09/02 23:58:31 VCS INFO V-16-6-0 (duadm2) postoffline:(postoffline) Invoked with arg0=duadm2, arg1=StorLan
2013/09/02 23:58:31 VCS INFO V-16-0 (duadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=duadm2 ,arg2=stor_p
2013/09/02 23:58:31 VCS INFO V-16-6-0 (duadm2) postoffline:Executing /ericsson/core/cluster/scripts/postoffline.sh with arg0=duadm2, arg1=StorLan
2013/09/02 23:58:31 VCS INFO V-16-6-15002 (duadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault duadm2 stor_p ONLINE  successfully
2013/09/02 23:58:31 VCS INFO V-16-6-0 (duadm2) postoffline.sh:StorLan:Nothing done
2013/09/02 23:58:31 VCS INFO V-16-6-0 (duadm2) postoffline:Completed execution of /ericsson/core/cluster/scripts/postoffline.sh for group StorLan
2013/09/02 23:58:31 VCS INFO V-16-6-15002 (duadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline duadm2 StorLan   successfully
2013/09/02 23:59:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/02 23:59:33 VCS INFO V-16-10001-88 (duadm1) Application:syb_log_mon:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/eric_3pp/sybase_log_monitor:default] successfully.
2013/09/02 23:59:33 VCS INFO V-16-10001-88 (duadm1) Application:syb_proc_mon:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/eric_3pp/sybase_process_monitor:default] successfully.
2013/09/02 23:59:34 VCS INFO V-16-10001-88 (duadm1) Application:fmria:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/eric_ep/riadaemon:default] successfully.
2013/09/02 23:59:35 VCS INFO V-16-10001-88 (duadm1) Application:trapdist:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/eric_ep/trapd:default] successfully.
2013/09/02 23:59:35 VCS INFO V-16-10001-88 (duadm1) Application:vrsnt_log_mon:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/eric_3pp/versant_log_monitor:default] successfully.
2013/09/02 23:59:36 VCS INFO V-16-1-10305 Resource alex (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/02 23:59:37 VCS INFO V-16-10001-88 (duadm1) Application:imgr_tomcat:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/eric_3pp/tomcat:default] successfully.
2013/09/02 23:59:37 VCS INFO V-16-1-10305 Resource apache (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/02 23:59:38 VCS INFO V-16-1-10305 Resource cron (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/02 23:59:40 VCS INFO V-16-2-13075 (duadm1) Resource(tomcat) has reported unexpected OFFLINE 1 times, which is still within the ToleranceLimit(2).
2013/09/02 23:59:50 VCS INFO V-16-1-10305 Resource restart_mc (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/02 23:59:50 VCS NOTICE V-16-1-10300 Initiating Offline of Resource smssr (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/02 23:59:50 VCS INFO V-16-1-10305 Resource syb_proc_mon (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/02 23:59:52 VCS INFO V-16-1-10305 Resource syb_log_mon (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/02 23:59:52 VCS INFO V-16-1-10305 Resource trapdist (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/02 23:59:52 VCS INFO V-16-1-10305 Resource fmria (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/02 23:59:54 VCS INFO V-16-1-10305 Resource vrsnt_log_mon (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/02 23:59:55 VCS INFO V-16-10001-88 (duadm1) Application:smssr:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/ossrc/ssr:default] successfully.
2013/09/02 23:59:57 VCS INFO V-16-1-10305 Resource imgr_tomcat (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:00:00 VCS INFO V-16-1-10305 Resource smssr (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:00:00 VCS NOTICE V-16-1-10300 Initiating Offline of Resource supervisor (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:00:03 VCS INFO V-16-10001-88 (duadm1) Application:supervisor:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/ossrc/ssrProcessSupervisor:default] successfully.
2013/09/03 00:00:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:00:06 VCS INFO V-16-1-10305 Resource supervisor (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:00:06 VCS NOTICE V-16-1-10300 Initiating Offline of Resource tbs (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:00:11 VCS INFO V-16-2-13075 (duadm1) Resource(tomcat) has reported unexpected OFFLINE 2 times, which is still within the ToleranceLimit(2).
2013/09/03 00:01:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:01:08 VCS ERROR V-16-2-13067 (duadm1) Agent is calling clean for resource(tomcat) because the resource became OFFLINE unexpectedly, on its own.
2013/09/03 00:01:08 VCS INFO V-16-2-13068 (duadm1) Resource(tomcat) - clean completed successfully.
2013/09/03 00:01:08 VCS ERROR V-16-2-13073 (duadm1) Resource(tomcat) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 2) the resource.
2013/09/03 00:01:08 VCS INFO V-16-10001-88 (duadm1) Application:tomcat:online:Executed [/ericsson/core/cluster/scripts/svc.sh start /ericsson/eric_3pp/tomcat:default] successfully.
2013/09/03 00:01:09 VCS INFO V-16-2-13716 (duadm1) Resource(tomcat): Output of the completed operation (online)
==============================================
svcadm: Instance "svc:/ericsson/eric_3pp/tomcat:default" is not in a maintenance or degraded state.
==============================================

2013/09/03 00:01:11 VCS NOTICE V-16-2-13076 (duadm1) Agent has successfully restarted resource(tomcat).
2013/09/03 00:01:11 VCS INFO V-16-1-55031 Resource tomcat in online state received recurring online message on system duadm1
2013/09/03 00:01:35 VCS INFO V-16-10001-88 (duadm1) Application:activemq_oss_loggingbroker:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/eric_3pp/activemq_oss_loggingbroker:default] successfully.
2013/09/03 00:01:35 VCS INFO V-16-10001-88 (duadm1) Application:activemq:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/eric_3pp/activemq:default] successfully.
2013/09/03 00:01:36 VCS INFO V-16-2-13716 (duadm1) Resource(activemq): Output of the completed operation (offline)
==============================================
svcadm: Instance "svc:/ericsson/eric_3pp/activemq:default" is in maintenance state.
==============================================

2013/09/03 00:01:36 VCS INFO V-16-2-13716 (duadm1) Resource(activemq_oss_loggingbroker): Output of the completed operation (offline)
==============================================
svcadm: Instance "svc:/ericsson/eric_3pp/activemq_oss_loggingbroker:default" is in maintenance state.
==============================================

2013/09/03 00:01:38 VCS INFO V-16-1-10305 Resource activemq (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:01:38 VCS INFO V-16-1-10305 Resource activemq_oss_loggingbroker (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:01:39 VCS ERROR V-16-20018-25 (duadm2) SybaseBk:masterdataservice_BACKUP:offline:Sybase Backup service masterdataservice_BACKUP cannot be stopped by isql - shutdown
2013/09/03 00:01:39 VCS ERROR V-16-20018-17 (duadm2) SybaseBk:masterdataservice_BACKUP:offline:clean scripts should do the process kill sequence
2013/09/03 00:01:39 VCS INFO V-16-2-13716 (duadm2) Resource(masterdataservice_BACKUP): Output of the completed operation (offline)
==============================================
Password:
CT-LIBRARY error:
    ct_connect(): user api layer: internal Client Library error: Read from the server has timed out.
Password:
CT-LIBRARY error:
    ct_connect(): user api layer: internal Client Library error: Read from the server has timed out.
==============================================

2013/09/03 00:01:40 VCS ERROR V-16-2-13064 (duadm2) Agent is calling clean for resource(masterdataservice_BACKUP) because the resource is up even after offline completed.
2013/09/03 00:01:40 VCS ERROR V-16-20018-20 (duadm2) SybaseBk:masterdataservice_BACKUP:clean:kill -15 of 16419
2013/09/03 00:02:01 VCS INFO V-16-2-13068 (duadm2) Resource(masterdataservice_BACKUP) - clean completed successfully.
2013/09/03 00:02:01 VCS WARNING V-16-20018-301 (duadm2) SybaseBk:masterdataservice_BACKUP:monitor:Open for backupserver failed, setting cookie to NULL
2013/09/03 00:02:01 VCS INFO V-16-1-10305 Resource masterdataservice_BACKUP (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/03 00:02:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource masterdataservice (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/03 00:02:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:02:32 VCS WARNING V-16-2-13011 (duadm1) Resource(hyperic_app): offline procedure did not complete within the expected time.
2013/09/03 00:02:32 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(hyperic_app) because offline did not complete within the expected time.
2013/09/03 00:02:32 VCS WARNING V-16-2-13011 (duadm1) Resource(ddc_app): offline procedure did not complete within the expected time.
2013/09/03 00:02:32 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(ddc_app) because offline did not complete within the expected time.
2013/09/03 00:02:32 VCS INFO V-16-2-13068 (duadm1) Resource(hyperic_app) - clean completed successfully.
2013/09/03 00:02:32 VCS INFO V-16-2-13068 (duadm1) Resource(ddc_app) - clean completed successfully.
2013/09/03 00:02:34 VCS INFO V-16-1-10305 Resource hyperic_app (Owner: Unspecified, Group: DDCMon) is offline on duadm1 (VCS initiated)
2013/09/03 00:02:34 VCS INFO V-16-1-10305 Resource ddc_app (Owner: Unspecified, Group: DDCMon) is offline on duadm1 (VCS initiated)
2013/09/03 00:02:34 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ddc_mount (Owner: Unspecified, Group: DDCMon) on System duadm1
2013/09/03 00:03:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:03:05 VCS WARNING V-16-2-13011 (duadm1) Resource(glassfish): offline procedure did not complete within the expected time.
2013/09/03 00:03:05 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(glassfish) because offline did not complete within the expected time.
2013/09/03 00:03:07 VCS INFO V-16-10001-88 (duadm1) Application:tbs:offline:Executed [/ericsson/core/cluster/scripts/svc.sh stop /ericsson/eric_ep/TBS:default] successfully.
2013/09/03 00:03:08 VCS INFO V-16-2-13716 (duadm1) Resource(tbs): Output of the completed operation (offline)
==============================================
svcadm: Instance "svc:/ericsson/eric_ep/TBS:default" is in maintenance state.
==============================================

2013/09/03 00:03:08 VCS WARNING V-16-2-13011 (duadm2) Resource(hyperic_app): offline procedure did not complete within the expected time.
2013/09/03 00:03:08 VCS WARNING V-16-2-13011 (duadm2) Resource(ddc_app): offline procedure did not complete within the expected time.
2013/09/03 00:03:08 VCS ERROR V-16-2-13063 (duadm2) Agent is calling clean for resource(hyperic_app) because offline did not complete within the expected time.
2013/09/03 00:03:08 VCS ERROR V-16-2-13063 (duadm2) Agent is calling clean for resource(ddc_app) because offline did not complete within the expected time.
2013/09/03 00:03:08 VCS INFO V-16-2-13068 (duadm2) Resource(ddc_app) - clean completed successfully.
2013/09/03 00:03:08 VCS INFO V-16-2-13068 (duadm2) Resource(hyperic_app) - clean completed successfully.
2013/09/03 00:03:10 VCS INFO V-16-1-10305 Resource tbs (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ext_notif (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ext_nsa (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource log_service (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource notif (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource nsa (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource oad (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource opendj (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource osagent (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource rmi_reg (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource rmi_reg_ext (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sb_nsa (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sentinel (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource time_service (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource tomcat (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:03:10 VCS INFO V-16-1-10305 Resource ddc_app (Owner: Unspecified, Group: DDCMon) is offline on duadm2 (VCS initiated)
2013/09/03 00:03:10 VCS INFO V-16-1-10305 Resource hyperic_app (Owner: Unspecified, Group: DDCMon) is offline on duadm2 (VCS initiated)
2013/09/03 00:03:10 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ddc_mount (Owner: Unspecified, Group: DDCMon) on System duadm2
2013/09/03 00:04:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:04:06 VCS ERROR V-16-2-13006 (duadm1) Resource(glassfish): clean procedure did not complete within the expected time.
2013/09/03 00:05:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:06:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:07:01 VCS INFO V-16-2-13003 (duadm2) Resource(masterdataservice): Output of the timed out operation (offline)
Password:
CT-LIBRARY error:
    ct_connect(): user api layer: internal Client Library error: Read from the server has timed out.

2013/09/03 00:07:01 VCS WARNING V-16-2-13011 (duadm2) Resource(masterdataservice): offline procedure did not complete within the expected time.
2013/09/03 00:07:01 VCS ERROR V-16-2-13063 (duadm2) Agent is calling clean for resource(masterdataservice) because offline did not complete within the expected time.
2013/09/03 00:07:02 VCS ERROR V-16-20018-20 (duadm2) Sybase:masterdataservice:clean:kill -15 of 16142
2013/09/03 00:07:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:07:22 VCS INFO V-16-20018-51 (duadm2) Sybase:masterdataservice:clean:Could not open directory [/var/tmp/sybase_shm/masterdataservice] for read [No such file or directory]
2013/09/03 00:07:22 VCS NOTICE V-16-20018-78 (duadm2) Sybase:masterdataservice:clean:Cannot remove Shared Memory.
2013/09/03 00:07:22 VCS INFO V-16-2-13068 (duadm2) Resource(masterdataservice) - clean completed successfully.
2013/09/03 00:07:23 VCS WARNING V-16-20018-301 (duadm2) Sybase:masterdataservice:monitor:Open for dataserver failed, setting cookie to NULL
2013/09/03 00:07:23 VCS INFO V-16-1-10305 Resource masterdataservice (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/03 00:07:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb1bak_ip (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/03 00:07:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource dbdumps_mount (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/03 00:07:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmsybdata_mount (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/03 00:07:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmsyblog_mount (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/03 00:07:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/03 00:07:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/03 00:07:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/03 00:07:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syblog_mount (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/03 00:07:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sybmaster_mount (Owner: Unspecified, Group: Sybase1) on System duadm2
2013/09/03 00:07:25 VCS INFO V-16-1-10305 Resource dbdumps_mount (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/03 00:07:25 VCS INFO V-16-1-10305 Resource fmsyblog_mount (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/03 00:07:25 VCS INFO V-16-1-10305 Resource pmsybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/03 00:07:25 VCS INFO V-16-1-10305 Resource pmsyblog_mount (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/03 00:07:25 VCS INFO V-16-1-10305 Resource sybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/03 00:07:25 VCS INFO V-16-1-10305 Resource fmsybdata_mount (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/03 00:07:25 VCS INFO V-16-1-10305 Resource syblog_mount (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/03 00:07:25 VCS INFO V-16-1-10305 Resource syb1bak_ip (Owner: Unspecified, Group: Sybase1) is offline on duadm2 (VCS initiated)
2013/09/03 00:07:35 VCS WARNING V-16-2-13011 (duadm1) Resource(ddc_mount): offline procedure did not complete within the expected time.
2013/09/03 00:07:35 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(ddc_mount) because offline did not complete within the expected time.
2013/09/03 00:07:35 VCS INFO V-16-10001-5530 (duadm1) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:08:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:08:11 VCS WARNING V-16-2-13011 (duadm1) Resource(ext_nsa): offline procedure did not complete within the expected time.
2013/09/03 00:08:11 VCS WARNING V-16-2-13011 (duadm1) Resource(notif): offline procedure did not complete within the expected time.
2013/09/03 00:08:11 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(notif) because offline did not complete within the expected time.
2013/09/03 00:08:11 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(ext_nsa) because offline did not complete within the expected time.
2013/09/03 00:08:11 VCS WARNING V-16-2-13011 (duadm1) Resource(oad): offline procedure did not complete within the expected time.
2013/09/03 00:08:11 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(oad) because offline did not complete within the expected time.
2013/09/03 00:08:11 VCS INFO V-16-2-13068 (duadm1) Resource(oad) - clean completed successfully.
2013/09/03 00:08:11 VCS INFO V-16-2-13068 (duadm1) Resource(notif) - clean completed successfully.
2013/09/03 00:08:11 VCS WARNING V-16-2-13011 (duadm1) Resource(ext_notif): offline procedure did not complete within the expected time.
2013/09/03 00:08:11 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(ext_notif) because offline did not complete within the expected time.
2013/09/03 00:08:11 VCS WARNING V-16-2-13011 (duadm1) Resource(log_service): offline procedure did not complete within the expected time.
2013/09/03 00:08:11 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(log_service) because offline did not complete within the expected time.
2013/09/03 00:08:11 VCS WARNING V-16-2-13011 (duadm1) Resource(nsa): offline procedure did not complete within the expected time.
2013/09/03 00:08:11 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(nsa) because offline did not complete within the expected time.
2013/09/03 00:08:11 VCS INFO V-16-2-13068 (duadm1) Resource(nsa) - clean completed successfully.
2013/09/03 00:08:11 VCS INFO V-16-2-13068 (duadm1) Resource(ext_nsa) - clean completed successfully.
2013/09/03 00:08:12 VCS WARNING V-16-2-13011 (duadm2) Resource(ddc_mount): offline procedure did not complete within the expected time.
2013/09/03 00:08:12 VCS ERROR V-16-2-13063 (duadm2) Agent is calling clean for resource(ddc_mount) because offline did not complete within the expected time.
2013/09/03 00:08:12 VCS INFO V-16-10001-5530 (duadm2) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:08:12 VCS WARNING V-16-2-13011 (duadm1) Resource(osagent): offline procedure did not complete within the expected time.
2013/09/03 00:08:12 VCS WARNING V-16-2-13011 (duadm1) Resource(opendj): offline procedure did not complete within the expected time.
2013/09/03 00:08:12 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(osagent) because offline did not complete within the expected time.
2013/09/03 00:08:12 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(opendj) because offline did not complete within the expected time.
2013/09/03 00:08:12 VCS INFO V-16-2-13068 (duadm1) Resource(osagent) - clean completed successfully.
2013/09/03 00:08:36 VCS ERROR V-16-2-13006 (duadm1) Resource(ddc_mount): clean procedure did not complete within the expected time.
2013/09/03 00:09:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:09:07 VCS WARNING V-16-2-13011 (duadm1) Resource(rmi_reg_ext): offline procedure did not complete within the expected time.
2013/09/03 00:09:07 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(rmi_reg_ext) because offline did not complete within the expected time.
2013/09/03 00:09:07 VCS INFO V-16-2-13068 (duadm1) Resource(rmi_reg_ext) - clean completed successfully.
2013/09/03 00:09:11 VCS ERROR V-16-2-13006 (duadm1) Resource(ext_notif): clean procedure did not complete within the expected time.
2013/09/03 00:09:11 VCS ERROR V-16-2-13006 (duadm1) Resource(log_service): clean procedure did not complete within the expected time.
2013/09/03 00:09:13 VCS ERROR V-16-2-13006 (duadm2) Resource(ddc_mount): clean procedure did not complete within the expected time.
2013/09/03 00:09:13 VCS ERROR V-16-2-13006 (duadm1) Resource(opendj): clean procedure did not complete within the expected time.
2013/09/03 00:09:28 VCS INFO V-16-1-10305 Resource oad (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:09:28 VCS NOTICE V-16-1-10300 Initiating Offline of Resource gui_nsa (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:09:28 VCS INFO V-16-1-10305 Resource nsa (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:09:28 VCS INFO V-16-1-10305 Resource notif (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:09:30 VCS INFO V-16-1-10305 Resource rmi_reg_ext (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:09:30 VCS INFO V-16-1-10305 Resource osagent (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:09:30 VCS INFO V-16-1-10305 Resource ext_nsa (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:09:32 VCS INFO V-16-1-10305 Resource glassfish (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:09:36 VCS ERROR V-16-2-13077 (duadm1) Agent is unable to offline resource(ddc_mount). Administrative intervention may be required.
2013/09/03 00:09:36 VCS INFO V-16-10001-5530 (duadm1) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:10:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:10:12 VCS ERROR V-16-2-13210 (duadm1) Agent is calling clean for resource(cluster_maint) because 4 successive invocations of the monitor procedure did not complete within the expected time.
2013/09/03 00:10:13 VCS ERROR V-16-2-13077 (duadm2) Agent is unable to offline resource(ddc_mount). Administrative intervention may be required.
2013/09/03 00:10:13 VCS INFO V-16-10001-5530 (duadm2) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:10:13 VCS INFO V-16-1-10305 Resource ext_notif (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:10:13 VCS INFO V-16-1-10305 Resource log_service (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:10:16 VCS INFO V-16-1-10305 Resource opendj (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:11:03 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:11:13 VCS ERROR V-16-2-13006 (duadm1) Resource(cluster_maint): clean procedure did not complete within the expected time.
2013/09/03 00:11:37 VCS INFO V-16-10001-5530 (duadm1) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:12:04 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:12:14 VCS INFO V-16-10001-5530 (duadm2) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:12:16 VCS INFO V-16-2-13026 (duadm1) Resource(cluster_maint) - monitor procedure finished successfully after failing to complete within the expected time for (4) consecutive times.
2013/09/03 00:12:16 VCS INFO V-16-2-13075 (duadm1) Resource(cluster_maint) has reported unexpected OFFLINE 1 times, which is still within the ToleranceLimit(2).
2013/09/03 00:12:24 VCS WARNING V-16-2-13011 (duadm2) Resource(sybmaster_mount): offline procedure did not complete within the expected time.
2013/09/03 00:12:24 VCS ERROR V-16-2-13063 (duadm2) Agent is calling clean for resource(sybmaster_mount) because offline did not complete within the expected time.
2013/09/03 00:12:24 VCS INFO V-16-10001-5530 (duadm2) Mount:sybmaster_mount:clean:Checking for loopback mount.
2013/09/03 00:13:04 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:13:13 VCS WARNING V-16-2-13011 (duadm1) Resource(tomcat): offline procedure did not complete within the expected time.
2013/09/03 00:13:13 VCS WARNING V-16-2-13011 (duadm1) Resource(time_service): offline procedure did not complete within the expected time.
2013/09/03 00:13:13 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(tomcat) because offline did not complete within the expected time.
2013/09/03 00:13:13 VCS WARNING V-16-2-13011 (duadm1) Resource(rmi_reg): offline procedure did not complete within the expected time.
2013/09/03 00:13:13 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(time_service) because offline did not complete within the expected time.
2013/09/03 00:13:13 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(rmi_reg) because offline did not complete within the expected time.
2013/09/03 00:13:13 VCS WARNING V-16-2-13011 (duadm1) Resource(sb_nsa): offline procedure did not complete within the expected time.
2013/09/03 00:13:13 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(sb_nsa) because offline did not complete within the expected time.
2013/09/03 00:13:13 VCS INFO V-16-2-13068 (duadm1) Resource(tomcat) - clean completed successfully.
2013/09/03 00:13:13 VCS INFO V-16-2-13068 (duadm1) Resource(rmi_reg) - clean completed successfully.
2013/09/03 00:13:14 VCS WARNING V-16-2-13011 (duadm1) Resource(sentinel): offline procedure did not complete within the expected time.
2013/09/03 00:13:14 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(sentinel) because offline did not complete within the expected time.
2013/09/03 00:13:14 VCS INFO V-16-2-13068 (duadm1) Resource(sentinel) - clean completed successfully.
2013/09/03 00:13:15 VCS INFO V-16-1-10305 Resource rmi_reg (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:13:15 VCS INFO V-16-1-10305 Resource tomcat (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:13:16 VCS INFO V-16-1-10305 Resource sentinel (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:13:16 VCS INFO V-16-2-13075 (duadm1) Resource(cluster_maint) has reported unexpected OFFLINE 2 times, which is still within the ToleranceLimit(2).
2013/09/03 00:13:25 VCS ERROR V-16-2-13006 (duadm2) Resource(sybmaster_mount): clean procedure did not complete within the expected time.
2013/09/03 00:13:38 VCS INFO V-16-10001-5530 (duadm1) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:14:04 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:14:14 VCS ERROR V-16-2-13006 (duadm1) Resource(sb_nsa): clean procedure did not complete within the expected time.
2013/09/03 00:14:14 VCS ERROR V-16-2-13006 (duadm1) Resource(time_service): clean procedure did not complete within the expected time.
2013/09/03 00:14:15 VCS ERROR V-16-2-13067 (duadm1) Agent is calling clean for resource(cluster_maint) because the resource became OFFLINE unexpectedly, on its own.
2013/09/03 00:14:15 VCS INFO V-16-10001-5530 (duadm2) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:14:16 VCS INFO V-16-1-10305 Resource time_service (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:14:16 VCS INFO V-16-1-10305 Resource sb_nsa (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:14:25 VCS ERROR V-16-2-13077 (duadm2) Agent is unable to offline resource(sybmaster_mount). Administrative intervention may be required.
2013/09/03 00:14:26 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2013/09/03 00:14:26 VCS INFO V-16-10001-5530 (duadm2) Mount:sybmaster_mount:clean:Checking for loopback mount.
2013/09/03 00:14:39 VCS WARNING V-16-2-13011 (duadm1) Resource(gui_nsa): offline procedure did not complete within the expected time.
2013/09/03 00:14:39 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(gui_nsa) because offline did not complete within the expected time.
2013/09/03 00:14:39 VCS INFO V-16-2-13068 (duadm1) Resource(gui_nsa) - clean completed successfully.
2013/09/03 00:14:41 VCS INFO V-16-1-10305 Resource gui_nsa (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:14:41 VCS NOTICE V-16-1-10300 Initiating Offline of Resource change_versant (Owner: Unspecified, Group: Oss) on System duadm1
2013/09/03 00:15:04 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:15:16 VCS ERROR V-16-2-13006 (duadm1) Resource(cluster_maint): clean procedure did not complete within the expected time.
2013/09/03 00:15:40 VCS INFO V-16-10001-5530 (duadm1) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:16:04 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:16:16 VCS INFO V-16-10001-5530 (duadm2) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:16:27 VCS INFO V-16-10001-5530 (duadm2) Mount:sybmaster_mount:clean:Checking for loopback mount.
2013/09/03 00:17:04 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:17:41 VCS INFO V-16-10001-5530 (duadm1) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:17:58 VCS WARNING V-16-2-13011 (duadm1) Resource(stop_oss): offline procedure did not complete within the expected time.
2013/09/03 00:17:58 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(stop_oss) because offline did not complete within the expected time.
2013/09/03 00:18:04 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss  duadm2  from localhost
2013/09/03 00:18:04 VCS INFO V-16-10001-1 (duadm1) Application:stop_oss:stop_oss.sh:Waiting for Oss to go Offline
2013/09/03 00:18:18 VCS INFO V-16-10001-5530 (duadm2) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:18:28 VCS INFO V-16-10001-5530 (duadm2) Mount:sybmaster_mount:clean:Checking for loopback mount.
2013/09/03 00:18:59 VCS ERROR V-16-2-13006 (duadm1) Resource(stop_oss): clean procedure did not complete within the expected time.
2013/09/03 00:19:42 VCS INFO V-16-10001-5530 (duadm1) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:19:42 VCS WARNING V-16-2-13011 (duadm1) Resource(change_versant): offline procedure did not complete within the expected time.
2013/09/03 00:19:42 VCS ERROR V-16-2-13063 (duadm1) Agent is calling clean for resource(change_versant) because offline did not complete within the expected time.
2013/09/03 00:19:42 VCS INFO V-16-2-13068 (duadm1) Resource(change_versant) - clean completed successfully.
2013/09/03 00:19:44 VCS INFO V-16-1-10305 Resource change_versant (Owner: Unspecified, Group: Oss) is offline on duadm1 (VCS initiated)
2013/09/03 00:19:44 VCS NOTICE V-16-1-10446 Group Oss is offline on system duadm1
2013/09/03 00:19:44 VCS INFO V-16-6-15025 (duadm2) hatrigger:invoking nfs_preonline
2013/09/03 00:19:44 VCS INFO V-16-6-15076 (duadm2) hatrigger:invoking regular preonline trigger if it exists
2013/09/03 00:19:44 VCS INFO V-16-6-0 (duadm1) postoffline:(postoffline) Invoked with arg0=duadm1, arg1=Oss
2013/09/03 00:19:44 VCS INFO V-16-6-0 (duadm1) postoffline:Executing /ericsson/core/cluster/scripts/postoffline.sh with arg0=duadm1, arg1=Oss
2013/09/03 00:19:44 VCS INFO V-16-6-0 (duadm2) preonline:(preonline) Invoked with arg0=duadm2, arg1=Oss
2013/09/03 00:19:44 VCS INFO V-16-6-0 (duadm2) preonline:Executing /ericsson/core/cluster/scripts/preonline.sh with arg0=duadm2, arg1=Oss
2013/09/03 00:19:44 VCS INFO V-16-6-0 (duadm1) postoffline.sh:Oss:Stop old FM processes
2013/09/03 00:19:44 VCS INFO V-16-6-0 (duadm1) postoffline.sh:Oss:Stopping dummy SMF resources for Sybase
2013/09/03 00:19:52 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:20:00 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:20:01 VCS INFO V-16-1-10305 Resource stop_oss (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cluster_maint (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource smrs_nfs (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ossbak_ip (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource eba_ebsg_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource eba_ebss_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource eba_ebsw_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource eba_rede_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource eba_rtt_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource home_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource nms_cosm_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource pmstor_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource segment1_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource sgwcg_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource versant_mount (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource a3pp_share (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ericsson_share (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource etc_share (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource mail_share (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource opt_share (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource upgrade_share (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource usr_local_share (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:01 VCS NOTICE V-16-1-10300 Initiating Offline of Resource var_share (Owner: Unspecified, Group: Ossfs) on System duadm1
2013/09/03 00:20:02 VCS INFO V-16-1-10305 Resource a3pp_share (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:20:02 VCS INFO V-16-1-10305 Resource ericsson_share (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:20:02 VCS INFO V-16-1-10305 Resource etc_share (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:20:02 VCS INFO V-16-1-10305 Resource opt_share (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:20:02 VCS INFO V-16-1-10305 Resource usr_local_share (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:20:02 VCS INFO V-16-1-10305 Resource mail_share (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:20:02 VCS INFO V-16-1-10305 Resource var_share (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:20:02 VCS INFO V-16-1-10305 Resource upgrade_share (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:20:03 VCS INFO V-16-1-10305 Resource ossbak_ip (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:20:08 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:20:17 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:20:19 VCS INFO V-16-10001-5530 (duadm2) Mount:ddc_mount:clean:Checking for loopback mount.
2013/09/03 00:20:25 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:20:29 VCS INFO V-16-10001-5530 (duadm2) Mount:sybmaster_mount:clean:Checking for loopback mount.
2013/09/03 00:20:33 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:20:41 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:20:49 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:20:57 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:21:03 VCS INFO V-16-1-10305 Resource eba_ebsw_mount (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:21:03 VCS INFO V-16-1-10305 Resource eba_rtt_mount (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:21:03 VCS INFO V-16-1-10305 Resource eba_rede_mount (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:21:03 VCS INFO V-16-1-10305 Resource pmstor_mount (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:21:03 VCS INFO V-16-1-10305 Resource eba_ebss_mount (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:21:03 VCS INFO V-16-1-10305 Resource nms_cosm_mount (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:21:03 VCS INFO V-16-1-10305 Resource eba_ebsg_mount (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:21:03 VCS INFO V-16-1-10305 Resource segment1_mount (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:21:04 VCS INFO V-16-1-10305 Resource versant_mount (Owner: Unspecified, Group: Ossfs) is offline on duadm1 (VCS initiated)
2013/09/03 00:21:05 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:21:13 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:21:21 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally
2013/09/03 00:21:29 VCS INFO V-16-6-0 (duadm2) preonline.sh:Oss:Waiting for Sybase1 to come online globally

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

mikebounds
Level 6
Partner Accredited

As you correctly have proxies, then setting ToleranceLimit on the MultiNICB type may help a little as it would delay faulting the MultinicB resources which in turn MAY delay the faulting of the Proxies but as the default MonitorInternal for MultiNICB is only 10 seconds, so even setting ToleranceLimit to 2 would only give an extra 20 seconds for MulitNICB, which means above that it would have faulted at 23:57:47, rather than 23:57:27, but as the Proxy ran its monitor at 23:57:57, this wouldn't have made any difference, so you could set ToleranceLimit on Proxy type instead.

 

Mike

View solution in original post

13 REPLIES 13

mikebounds
Level 6
Partner Accredited

Both your heartbeat links failed at the same time and so from a VCS perpective each node thinks the other one has gone down as it has lost all connection to it and so each node will try to take over the others service groups and then when the network returns you then have the same IP up on both nodes, so you get further errors.

Your heartbeats should be independent and so for VCS to interpret a networt failure, incorrectly as a node failure should mean that 2 pieces of network hardware failed at the same time which is very very rare, so more likely, your heartbeats are not independent and you have a single point of failure (SPOF), where the failure of one component caused both heartbeats (and it looks like the public network too) to fail.

Therefore you need to look at the design of your hearbeats, each heartbeats should:

  • Use a completetly different NIC (example - you should NOT put both heartbeats into the same dual or quad card)
  • Use a separate switch
  • Use a separate VLAN if you are using VLANs

Mike

symsonu
Level 6

I think the node is showing to be in jeopardy 

Sep  2 23:35:21 duadm2 genunix: [ID 608499 kern.notice] GAB INFO V-15-1-20037 Port h gen   288706   jeopardy ;1
Sep  2 23:35:21 duadm2 genunix: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port a gen   288701 membership 01
Sep  2 23:35:21 duadm2 genunix: [ID 608499 kern.notice] GAB INFO V-15-1-20037 Port a gen   288701   jeopardy ;1
Sep  2 23:35:21 duadm2 genunix: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port b gen   288704 membership 01
Sep  2 23:35:21 duadm2 genunix: [ID 608499 kern.notice] GAB INFO V-15-1-20037 Port b gen   288704   jeopardy ;1
Sep  2 23:35:21 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS INFO V-16-1-10077 Received new cluster membership
Sep  2 23:35:21 duadm2 Had[6267]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10111 System duadm2 (Node '1') is in Regular and Jeopardy Memberships - Membership: 0x3

 

if all the heartbeats have lost , then there should be a iofencing that have kicked in and could have made one node down.

 

mikebounds
Level 6
Partner Accredited

Yes you're right - I had only looked at messages log and had assumed you had 2 heartbeats, but I have now noticed your links are named  link0 and link2, implying you may have a 3rd link called link1 and engine log confirms this as it says:

Previous status = oce1 UP oce8 UP oce0 UP; Current status = oce1 DOWN oce8 UP oce0 DOWN

However this still means you probably have 2 heartbeats sharing a common component as oce1 and oce0 went down at the same time which probably means they are connected to same switch.  This means (when all 3 heartbeats are working) if you lost NIC oce8, then the cluster would NOT go into jeopardy and then sometime later before you have fixed oce8  (could be several hours or days later) if you lost switch that oce1  and oce0 are connected to then you would loose the 2 remaining heartbeats at the same time and this would cause split-brain.  So you should still look at the design of your hearbeats to ensure they are all independent.

In terms of finding out what has caused the service outage - it is evident that you had a network outage and this caused network resources in VCS to fault and fail service groups.

Mike

symsonu
Level 6

Thanks Mike for helping me understnad the situation.

You have helped me in the past and you are always there for my rescue :)

As in previous post you mentioned that both nodes tried to online IP resource and thats why there were

duplicate ip as below

if  the node went to jeopardy only then what could be the reason for below messages.

 

Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.027 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.030 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.027 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.030 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.027 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 876157 kern.warning] WARNING: node 44:1e:a1:74:e5:46 is using our IP address 010.001.014.030 on oce9
Sep  2 23:57:10 duadm2 ip: [ID 567813 kern.warning] WARNING: oce9:2 has duplicate address 010.001.014.027 (claimed by 44:1e:a1:74:e5:46); disabled
Sep  2 23:57:10 duadm2 ip: [ID 567813 kern.warning] WARNING: oce9:1 has duplicate address 010.001.014.030 (claimed by 44:1e:a1:74:e5:46); disabled

mikebounds
Level 6
Partner Accredited

Here is my analysis:
 

At 23:35 interface oce0 and oce1 failed - these are both heartbeats and also oce0 belongs to Mpathd group PubLan which is paired with oce9:

 

Sep  2 23:35:04 duadm2 genunix: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (oce1) node 0 in trouble
Sep  2 23:35:05 duadm2 genunix: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 2 (oce0) node 0 in trouble
 

 

For some reason, Solaris mpathd does not detect the failure of oce0 untl 20 mins later, where it fails over to oce9

 

Sep  2 23:56:57 duadm2 in.mpathd[6542]: [ID 594170 daemon.error] NIC failure detected on oce0 of group pub_mnic
Sep  2 23:56:57 duadm2 in.mpathd[6542]: [ID 832587 daemon.error] Successfully failed over from NIC oce0 to NIC oce9
 
But then 4 seconds later mathd says oce0 is backup
Sep  2 23:57:01 duadm2 in.mpathd[6542]: [ID 299542 daemon.error] NIC repair detected on oce0 of group pub_mnic
Sep  2 23:57:01 duadm2 in.mpathd[6542]: [ID 620804 daemon.error] Successfully failed back to NIC oce0
 
But then a further 19 seconds later, mpaths says ALL interfaces are failed
Sep  2 23:57:20 duadm2 in.mpathd[6542]: [ID 168056 daemon.error] All Interfaces in group pub_mnic have failed
 
I assume the VCS pub_mnic resource is setup to use mpathd, which means VCS does not actually monitor the interfaces - it monitors the mpath fail flag and as mpathd only fails over the "test" address, VCS fails over the VIPs, so I guess VCS tried to failover the IPs from oce0 to oce9, but as mpathd then switched back from oce9 back to oce0 just 4 secods later, somehow I would guess the virtual IPs in VCS got plumbed onto both oce0 and oce9.
VCS pub_nic resource faults as mpathd reports both oce0 and oce9 are down, not because of duplicate IP address.
 
Somethings are a bit unclear:
  1. Why did mpathd take 20 mins to detect oce0 was down
  2. Did oce0 really come back  23:57:01 as there are no VCS entries saying oce0 heartbeat is fixed
  3. Even though mpathd did repair oce0 4 seconds after marking it down - how exactly did this cause IP to be plumbed on to both interfaces - is this a failure of VCS agent or perhaps a Solaris API that the VCS agent uses
  4. What caused oce9 to fail -did the IP plumbed onto both interfaces cause mpathd to think oce9 was down or was oce9 really down.

Mike

 

symsonu
Level 6

 

There was a message from mpathd deamon before detecting failure,

Sep  2 23:56:50 duadm2 in.mpathd[6542]: [ID 585766 daemon.error] Cannot meet requested failure detection time of 10000 ms on (inet oce2) new failure detection time
for group "stor_mnic" is 48426 ms
Sep  2 23:56:57 duadm2 in.mpathd[6542]: [ID 594170 daemon.error] NIC failure detected on oce0 of group pub_mnic

 

Also,

publan is a parallel service group, if it got failed on one node, it should have worked  from other node

and service should be available. isnt it ?

 

 

-- SYSTEM STATE
-- System               State                Frozen
 
A  duadm1               RUNNING              0
A  duadm2               RUNNING              0
 
-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State
 

B  BkupLan         duadm1               Y          N               ONLINE
B  BkupLan         duadm2               Y          N               ONLINE
B  DDCMon          duadm1               Y          N               ONLINE
B  DDCMon          duadm2               Y          N               ONLINE
B  Oss             duadm1               Y          N               ONLINE
B  Oss             duadm2               Y          N               OFFLINE
B  Ossfs           duadm1               Y          N               ONLINE
B  Ossfs           duadm2               Y          N               OFFLINE
B  PrivLan         duadm1               Y          N               ONLINE
B  PrivLan         duadm2               Y          N               ONLINE
B  PubLan          duadm1               Y          N               ONLINE
B  PubLan          duadm2               Y          N               ONLINE
B  StorLan         duadm1               Y          N               ONLINE
B  StorLan         duadm2               Y          N               ONLINE
B  Sybase1         duadm1               Y          N               OFFLINE
B  Sybase1         duadm2               Y          N               ONLINE

 

One more question, does incresing the toelrance limit  of  multinic resources will help  here?

 

 

mikebounds
Level 6
Partner Accredited

publan was just the first resource to be failed and this didn't cause any other resources to offline, but after this you also had:


2013/09/02 23:57:31 VCS ERROR V-16-1-10303 Resource stor_mnic (Owner: Unspecified, Group: StorLan) is FAULTED (timed out) on sys duadm1
2013/09/02 23:57:57 VCS ERROR V-16-1-10303 Resource ossfs_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys duadm1
 
These resources faulting caused other dependent resources to be offlined.
 
Mike

symsonu
Level 6

hares -display pub_mnic
#Resource    Attribute              System     Value
pub_mnic     Group                  global     PubLan
pub_mnic     Type                   global     MultiNICB
pub_mnic     AutoStart              global     1
pub_mnic     Critical               global     1
pub_mnic     Enabled                global     1
pub_mnic     LastOnline             global     duadm1
pub_mnic     MonitorOnly            global     0
pub_mnic     ResourceOwner          global
pub_mnic     TriggerEvent           global     0
pub_mnic     ArgListValues          duadm1     UseMpathd        1       1       MpathdCommand   1       /usr/lib/inet/in.mpathd ConfigCheck     1       1       MpathdRestart   1       1       Device  4       oce0    0       oce9    1       NetworkHosts    1       10.1.14.1       LinkTestRatio   1       1       IgnoreLinkStatus        1       1     NetworkTimeout  1       100     OnlineTestRepeatCount   1       3       OfflineTestRepeatCount  1       3       NoBroadcast     1       0       DefaultRouter   1       0.0.0.0 Failback        1       0       GroupName       1       ""      Protocol        1       IPv4
pub_mnic     ArgListValues          duadm2     UseMpathd        1       1       MpathdCommand   1       /usr/lib/inet/in.mpathd ConfigCheck     1       1       MpathdRestart   1       1       Device  4       oce0    0       oce9    1       NetworkHosts    1       10.1.14.1       LinkTestRatio   1       1       IgnoreLinkStatus        1       1     NetworkTimeout  1       100     OnlineTestRepeatCount   1       3       OfflineTestRepeatCount  1       3       NoBroadcast     1       0       DefaultRouter   1       0.0.0.0 Failback        1       0       GroupName       1       ""      Protocol        1       IPv4
pub_mnic     ConfidenceLevel        duadm1     0
pub_mnic     ConfidenceLevel        duadm2     0
pub_mnic     ConfidenceMsg          duadm1
pub_mnic     ConfidenceMsg          duadm2
pub_mnic     Flags                  duadm1
pub_mnic     Flags                  duadm2
pub_mnic     IState                 duadm1     not waiting
pub_mnic     IState                 duadm2     not waiting
pub_mnic     MonitorMethod          duadm1     Traditional
pub_mnic     MonitorMethod          duadm2     Traditional
pub_mnic     Probed                 duadm1     1
pub_mnic     Probed                 duadm2     1
pub_mnic     Start                  duadm1     0
pub_mnic     Start                  duadm2     0
pub_mnic     State                  duadm1     ONLINE
pub_mnic     State                  duadm2     ONLINE
pub_mnic     ComputeStats           global     0
pub_mnic     ConfigCheck            global     1
pub_mnic     DefaultRouter          global     0.0.0.0
pub_mnic     Failback               global     0
pub_mnic     GroupName              global
pub_mnic     IgnoreLinkStatus       global     1
pub_mnic     LinkTestRatio          global     1
pub_mnic     MpathdCommand          global     /usr/lib/inet/in.mpathd
pub_mnic     MpathdRestart          global     1
pub_mnic     NetworkHosts           global     10.1.14.1
pub_mnic     NetworkTimeout         global     100
pub_mnic     NoBroadcast            global     0
pub_mnic     OfflineTestRepeatCount global     3
pub_mnic     OnlineTestRepeatCount  global     3
pub_mnic     Protocol               global     IPv4
pub_mnic     TriggerResStateChange  global     0
pub_mnic     UseMpathd              global     1
pub_mnic     ContainerInfo          duadm1     Type             Name            Enabled
pub_mnic     ContainerInfo          duadm2     Type             Name            Enabled
pub_mnic     Device                 duadm1     oce0     0       oce9    1
pub_mnic     Device                 duadm2     oce0     0       oce9    1
pub_mnic     MonitorTimeStats       duadm1     Avg      0       TS
pub_mnic     MonitorTimeStats       duadm2     Avg      0       TS
pub_mnic     ResourceInfo           duadm1     State    Valid   Msg             TS

 

 

symsonu
Level 6

stor_mnic is also a MULTINICB resource

 

hares -display stor_mnic
#Resource    Attribute              System     Value
stor_mnic    Group                  global     StorLan
stor_mnic    Type                   global     MultiNICB
stor_mnic    AutoStart              global     1
stor_mnic    Critical               global     1
stor_mnic    Enabled                global     1
stor_mnic    LastOnline             global     duadm1
stor_mnic    MonitorOnly            global     0
stor_mnic    ResourceOwner          global
stor_mnic    TriggerEvent           global     0
stor_mnic    ArgListValues          duadm1     UseMpathd        1       1       MpathdCommand   1       /usr/lib/inet/in.mpathd ConfigCheck     1       1       MpathdRestart   1       1       Device  4       oce2    0       oce11   1       NetworkHosts    1       10.0.0.29       LinkTestRatio   1       1       IgnoreLinkStatus        1       1     NetworkTimeout  1       100     OnlineTestRepeatCount   1       3       OfflineTestRepeatCount  1       3       NoBroadcast     1       0       DefaultRouter   1       0.0.0.0 Failback        1       0       GroupName       1       ""      Protocol        1       IPv4
stor_mnic    ArgListValues          duadm2     UseMpathd        1       1       MpathdCommand   1       /usr/lib/inet/in.mpathd ConfigCheck     1       1       MpathdRestart   1       1       Device  4       oce2    0       oce11   1       NetworkHosts    1       10.0.0.29       LinkTestRatio   1       1       IgnoreLinkStatus        1       1     NetworkTimeout  1       100     OnlineTestRepeatCount   1       3       OfflineTestRepeatCount  1       3       NoBroadcast     1       0       DefaultRouter   1       0.0.0.0 Failback        1       0       GroupName       1       ""      Protocol        1       IPv4
stor_mnic    ConfidenceLevel        duadm1     0
stor_mnic    ConfidenceLevel        duadm2     0
stor_mnic    ConfidenceMsg          duadm1
stor_mnic    ConfidenceMsg          duadm2
stor_mnic    Flags                  duadm1
stor_mnic    Flags                  duadm2
stor_mnic    IState                 duadm1     not waiting
stor_mnic    IState                 duadm2     not waiting
stor_mnic    MonitorMethod          duadm1     Traditional
stor_mnic    MonitorMethod          duadm2     Traditional
stor_mnic    Probed                 duadm1     1
stor_mnic    Probed                 duadm2     1
stor_mnic    Start                  duadm1     0
stor_mnic    Start                  duadm2     0
stor_mnic    State                  duadm1     ONLINE
stor_mnic    State                  duadm2     ONLINE
stor_mnic    ComputeStats           global     0
stor_mnic    ConfigCheck            global     1
stor_mnic    DefaultRouter          global     0.0.0.0
stor_mnic    Failback               global     0
stor_mnic    GroupName              global
stor_mnic    IgnoreLinkStatus       global     1
stor_mnic    LinkTestRatio          global     1
stor_mnic    MpathdCommand          global     /usr/lib/inet/in.mpathd
stor_mnic    MpathdRestart          global     1
stor_mnic    NetworkHosts           global     10.0.0.29
stor_mnic    NetworkTimeout         global     100
stor_mnic    NoBroadcast            global     0
stor_mnic    OfflineTestRepeatCount global     3
stor_mnic    OnlineTestRepeatCount  global     3
stor_mnic    Protocol               global     IPv4
stor_mnic    TriggerResStateChange  global     0
stor_mnic    UseMpathd              global     1
stor_mnic    ContainerInfo          duadm1     Type             Name            Enabled
stor_mnic    ContainerInfo          duadm2     Type             Name            Enabled
stor_mnic    Device                 duadm1     oce2     0       oce11   1
stor_mnic    Device                 duadm2     oce2     0       oce11   1
stor_mnic    MonitorTimeStats       duadm1     Avg      0       TS
stor_mnic    MonitorTimeStats       duadm2     Avg      0       TS
stor_mnic    ResourceInfo           duadm1     State    Valid   Msg             TS
stor_mnic    ResourceInfo           duadm2     State    Valid   Msg             TS

 

symsonu
Level 6

 

ossfs_p1 is a proxy resource, so , I think it failed because of pub_mnic failed.

hares -display ossfs_p1
#Resource    Attribute             System     Value
ossfs_p1     Group                 global     Ossfs
ossfs_p1     Type                  global     Proxy
ossfs_p1     AutoStart             global     1
ossfs_p1     Critical              global     1
ossfs_p1     Enabled               global     1
ossfs_p1     LastOnline            global     duadm1
ossfs_p1     MonitorOnly           global     0
ossfs_p1     ResourceOwner         global
ossfs_p1     TriggerEvent          global     0
ossfs_p1     ArgListValues         duadm1     TargetResName     1       pub_mnic        TargetSysName   1       ""      TargetResName:Probed    1       1       TargetResName:State     1       2
ossfs_p1     ArgListValues         duadm2     TargetResName     1       pub_mnic        TargetSysName   1       ""      TargetResName:Probed    1       1       TargetResName:State     1       2
ossfs_p1     ConfidenceLevel       duadm1     0
ossfs_p1     ConfidenceLevel       duadm2     0
ossfs_p1     ConfidenceMsg         duadm1
ossfs_p1     ConfidenceMsg         duadm2
ossfs_p1     Flags                 duadm1
ossfs_p1     Flags                 duadm2
ossfs_p1     IState                duadm1     not waiting
ossfs_p1     IState                duadm2     not waiting
ossfs_p1     MonitorMethod         duadm1     Traditional
ossfs_p1     MonitorMethod         duadm2     Traditional
ossfs_p1     Probed                duadm1     1
ossfs_p1     Probed                duadm2     1
ossfs_p1     Start                 duadm1     0
ossfs_p1     Start                 duadm2     0
ossfs_p1     State                 duadm1     ONLINE
ossfs_p1     State                 duadm2     ONLINE
ossfs_p1     ComputeStats          global     0
ossfs_p1     ResourceInfo          global     State     Valid   Msg             TS
ossfs_p1     TargetResName         global     pub_mnic
ossfs_p1     TargetSysName         global
ossfs_p1     TriggerResStateChange global     0
ossfs_p1     ContainerInfo         duadm1     Type              Name            Enabled
ossfs_p1     ContainerInfo         duadm2     Type              Name            Enabled
ossfs_p1     MonitorTimeStats      duadm1     Avg       0       TS
ossfs_p1     MonitorTimeStats      duadm2     Avg       0       TS
root@duadm1>

root@duadm1> hares -dep ossfs_p1
#Group       Parent     Child
Ossfs        cms_ip     ossfs_p1
Ossfs        ossfs_ip   ossfs_p1
Ossfs        pms_ip     ossfs_p1
Ossfs        snmp_ip    ossfs_p1
 

 

mikebounds
Level 6
Partner Accredited

As you correctly have proxies, then setting ToleranceLimit on the MultiNICB type may help a little as it would delay faulting the MultinicB resources which in turn MAY delay the faulting of the Proxies but as the default MonitorInternal for MultiNICB is only 10 seconds, so even setting ToleranceLimit to 2 would only give an extra 20 seconds for MulitNICB, which means above that it would have faulted at 23:57:47, rather than 23:57:27, but as the Proxy ran its monitor at 23:57:57, this wouldn't have made any difference, so you could set ToleranceLimit on Proxy type instead.

 

Mike

symsonu
Level 6

Hello Mike ,

as we know that due to network outage , multunicb resources and then proxy resources faulted

that caused the failure of other dependent resources.

I need your expert advice on the recovery procedure here.

Say , if the network gets corrected  then will alll the resources come online at its own ?

or administartive action is required.

In our case , it didnot come up automaticaly after network is back.

 

 

symsonu
Level 6

Hello Mike ,

as we know that due to network outage , multunicb resources and then proxy resources faulted

that caused the failure of other dependent resources.

I need your expert advice on the recovery procedure here.

Say , if the network gets corrected  then will alll the resources come online at its own ?

or administartive action is required.

In our case , it didnot come up automaticaly after network is back.