07-28-2013 10:29 PM
we tried to do failover , however it didnot worked
please find below logs and please help in finding the cause
2013/07/22 20:11:17 VCS INFO V-16-1-50859 Attempting to switch group Oss from system dukosgbs to system dukosgas
2013/07/22 20:11:17 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss dukosgas from localhost
2013/07/22 20:11:17 VCS NOTICE V-16-1-50929 Initial tests indicate group Oss is able to switch to system dukosgas. Initiating offline of group on system dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10167 Initiating manual offline of group Oss on system dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource activemq (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource alex (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource apache (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cron (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ddc (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource glassfish (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource imgr_httpd (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource imgr_tomcat (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ldap_mon (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource log_service (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource netmgt_nettl (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource netmgt_ov (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ovtrc (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource restart_mc (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb_log_mon (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb_proc_mon (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource time_service (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource trapdist (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vrsnt_log_mon (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:45:26 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss dukosgas from localhost
2013/07/22 22:19:35 VCS INFO V-16-2-13075 (dukosgbs) Resource(activemq) has reported unexpected OFFLINE 1 times, which is still within the ToleranceLimit(2).
2013/07/22 22:20:35 VCS INFO V-16-2-13075 (dukosgbs) Resource(activemq) has reported unexpected OFFLINE 2 times, which is still within the ToleranceLimit(2).
2013/07/22 22:21:35 VCS ERROR V-16-2-13067 (dukosgbs) Agent is calling clean for resource(activemq) because the resource became OFFLINE unexpectedly, on its own.
2013/07/22 22:21:36 VCS INFO V-16-2-13068 (dukosgbs) Resource(activemq) - clean completed successfully.
2013/07/22 22:21:36 VCS ERROR V-16-2-13073 (dukosgbs) Resource(activemq) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 3) the resource.
2013/07/22 22:21:36 VCS INFO V-16-10001-3 (dukosgbs) Application:activemq:online:Executed /ericsson/hacs/scripts/svc.sh
2013/07/22 22:21:37 VCS INFO V-16-2-13001 (dukosgbs) Resource(activemq): Output of the completed operation (online)
svcadm: Instance "svc:/ericsson/eric_3pp/activemq:default" is not in a maintenance or degraded state.
2013/07/22 22:21:38 VCS NOTICE V-16-2-13076 (dukosgbs) Agent has successfully restarted resource(activemq).
2013/07/22 22:23:05 VCS INFO V-16-1-50135 User root fired command: hagrp -clear Oss dukosgbs from localhost
2013/07/22 22:27:08 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Oss dukosgbs from localhost
2013/07/22 22:29:40 VCS INFO V-16-1-50135 User root fired command: hagrp -clearadminwait Oss dukosgbs from localhost
2013/07/22 22:37:21 VCS INFO V-16-1-50135 User root fired command: hagrp -flush ClusterService dukosgbs from localhost
2013/07/22 22:37:21 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Oss dukosgbs from localhost
2013/07/22 22:37:21 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Ossfs dukosgbs from localhost
2013/07/22 22:37:21 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Sybase1 dukosgbs from localhost
2013/07/22 22:38:14 VCS INFO V-16-1-50135 User root fired command: hagrp -flush ClusterService dukosgbs from localhost
2013/07/22 22:38:14 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Oss dukosgbs from localhost
2013/07/22 22:38:14 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Ossfs dukosgbs from localhost
2013/07/22 22:38:14 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Sybase1 dukosgbs from localhost
2013/07/22 22:39:08 VCS INFO V-16-1-50135 User root fired command: hares -refreshinfo activemq from localhost
2013/07/22 22:40:06 VCS INFO V-16-1-50135 User root fired command: hares -refreshinfo activemq localclus from localhost
2013/07/22 22:42:09 VCS INFO V-16-1-50135 User root fired command: hares -flushinfo activemq localclus from localhost
2013/07/22 22:47:41 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss dukosgbs from localhost
2013/07/22 22:50:52 VCS INFO V-16-1-50859 Attempting to switch group Oss from system dukosgbs to system dukosgas
2013/07/22 22:50:52 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss dukosgas from localhost
2013/07/22 22:50:52 VCS NOTICE V-16-1-50929 Initial tests indicate group Oss is able to switch to system dukosgas. Initiating offline of group on system dukosgbs
07-29-2013 05:15 AM
the resources ware initiating to offline. But they did not come down.
root@dukosgbs> hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A dukosgbs RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService dukosgbs Y N ONLINE
B Oss dukosgbs Y N ONLINE|STOPPING
B Ossfs dukosgbs Y N ONLINE
B Sybase1 dukosgbs Y N ONLINE
-- RESOURCES OFFLINING
-- Group Type Resource System IState
F Oss Application activemq dukosgbs W_OFFLINE_PROPAGATE
F Oss Application alex dukosgbs W_OFFLINE_PROPAGATE
F Oss Application apache dukosgbs W_OFFLINE_PROPAGATE
F Oss Application cron dukosgbs W_OFFLINE_PROPAGATE
F Oss Application ddc dukosgbs W_OFFLINE_PROPAGATE
F Oss Application glassfish dukosgbs W_OFFLINE_PROPAGATE
F Oss Application imgr_httpd dukosgbs W_OFFLINE_PROPAGATE
F Oss Application imgr_tomcat dukosgbs W_OFFLINE_PROPAGATE
F Oss Application ldap_mon dukosgbs W_OFFLINE_PROPAGATE
F Oss Application log_service dukosgbs W_OFFLINE_PROPAGATE
F Oss Application netmgt_nettl dukosgbs W_OFFLINE_PROPAGATE
F Oss Application netmgt_ov dukosgbs W_OFFLINE_PROPAGATE
F Oss Application ovtrc dukosgbs W_OFFLINE_PROPAGATE
F Oss Application restart_mc dukosgbs W_OFFLINE_PROPAGATE
F Oss Application syb_log_mon dukosgbs W_OFFLINE_PROPAGATE
F Oss Application syb_proc_mon dukosgbs W_OFFLINE_PROPAGATE
F Oss Application time_service dukosgbs W_OFFLINE_PROPAGATE
F Oss Application trapdist dukosgbs W_OFFLINE_PROPAGATE
F Oss Application vrsnt_log_mon dukosgbs W_OFFLINE_PROPAGATE
-- WAN HEARTBEAT STATE
-- Heartbeat To State
L Icmp gran_cluster1 ALIVE
-- REMOTE CLUSTER STATE
-- Cluster State
M gran_cluster1 RUNNING
-- REMOTE SYSTEM STATE
-- cluster:system State Frozen
N gran_cluster1:dukosgas RUNNING 0
07-29-2013 11:57 AM
Please show us main.cf for this service group?
It seems you have custom scripts that you have configured as Application type resources.
Have you tested these scripts outside of VCS?
07-29-2013 12:59 PM
The resource "activemq" seems to be having problems with its monitor script. You should check on that resource specifically to see what it is checking during its monitor cycle. You might also need to add some logging to its monitor script to find out what it is having problems checking.
-Wally
07-30-2013 10:57 AM
Yes, these are the tested scriptas and failover has happened many a times before
This time , as we can see from logs that after failover, it got stuck at
Initiating offline of resource activemq .
It continued only when we disable the smf service of activemq.
Only thing we need to know that , why the stopcommand of activemq was not executed by vcs at 20:11.
there were no logs in engine log after this for quite some time.
07-30-2013 10:58 AM
Can you please clarify, how we can say that there is issue with monitor script of activemq.
08-05-2013 07:21 AM
I think the problem is rather in your offline procedure.
You define a offline action and VCS will initiate that when the offline entry point is called.
The question is how do you try to stop activemq? By running a command or a custom script?
Maybe you can share this offline script/command?
Could you stop activemq by manually executing the command or script?
If not, then the problem was with the application itself and you should implement a clean procedure as well.
Can you please clarify, how we can say that there is issue with monitor script of activemq.
Just check the process status (ps output or program -status etc, depending on the application/process) for example or pidfile