cancel
Showing results for 
Search instead for 
Did you mean: 

Root cause needed for failed failover test

symsonu
Level 6

we tried to do failover , however it didnot worked

please find below logs and please help in finding the cause

 

2013/07/22 20:11:17 VCS INFO V-16-1-50859 Attempting to switch group Oss from system dukosgbs to system dukosgas
2013/07/22 20:11:17 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss  dukosgas  from localhost
2013/07/22 20:11:17 VCS NOTICE V-16-1-50929 Initial tests indicate group Oss is able to switch to system dukosgas.  Initiating offline of group on system dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10167 Initiating manual offline of group Oss on system dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource activemq (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource alex (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource apache (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cron (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ddc (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource glassfish (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource imgr_httpd (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource imgr_tomcat (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ldap_mon (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource log_service (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource netmgt_nettl (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource netmgt_ov (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ovtrc (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource restart_mc (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb_log_mon (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource syb_proc_mon (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource time_service (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource trapdist (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:11:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vrsnt_log_mon (Owner: unknown, Group: Oss) on System dukosgbs
2013/07/22 20:45:26 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss  dukosgas  from localhost
2013/07/22 22:19:35 VCS INFO V-16-2-13075 (dukosgbs) Resource(activemq) has reported unexpected OFFLINE 1 times, which is still within the ToleranceLimit(2).
2013/07/22 22:20:35 VCS INFO V-16-2-13075 (dukosgbs) Resource(activemq) has reported unexpected OFFLINE 2 times, which is still within the ToleranceLimit(2).
2013/07/22 22:21:35 VCS ERROR V-16-2-13067 (dukosgbs) Agent is calling clean for resource(activemq) because the resource became OFFLINE unexpectedly, on its own.
2013/07/22 22:21:36 VCS INFO V-16-2-13068 (dukosgbs) Resource(activemq) - clean completed successfully.
2013/07/22 22:21:36 VCS ERROR V-16-2-13073 (dukosgbs) Resource(activemq) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 3) the resource.
2013/07/22 22:21:36 VCS INFO V-16-10001-3 (dukosgbs) Application:activemq:online:Executed /ericsson/hacs/scripts/svc.sh
2013/07/22 22:21:37 VCS INFO V-16-2-13001 (dukosgbs) Resource(activemq): Output of the completed operation (online)
svcadm: Instance "svc:/ericsson/eric_3pp/activemq:default" is not in a maintenance or degraded state.
2013/07/22 22:21:38 VCS NOTICE V-16-2-13076 (dukosgbs) Agent has successfully restarted resource(activemq).
2013/07/22 22:23:05 VCS INFO V-16-1-50135 User root fired command: hagrp -clear Oss  dukosgbs  from localhost
2013/07/22 22:27:08 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Oss  dukosgbs  from localhost
2013/07/22 22:29:40 VCS INFO V-16-1-50135 User root fired command: hagrp -clearadminwait Oss  dukosgbs  from localhost
2013/07/22 22:37:21 VCS INFO V-16-1-50135 User root fired command: hagrp -flush ClusterService  dukosgbs  from localhost
2013/07/22 22:37:21 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Oss  dukosgbs  from localhost
2013/07/22 22:37:21 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Ossfs  dukosgbs  from localhost
2013/07/22 22:37:21 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Sybase1  dukosgbs  from localhost
2013/07/22 22:38:14 VCS INFO V-16-1-50135 User root fired command: hagrp -flush ClusterService  dukosgbs  from localhost
2013/07/22 22:38:14 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Oss  dukosgbs  from localhost
2013/07/22 22:38:14 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Ossfs  dukosgbs  from localhost
2013/07/22 22:38:14 VCS INFO V-16-1-50135 User root fired command: hagrp -flush Sybase1  dukosgbs  from localhost
2013/07/22 22:39:08 VCS INFO V-16-1-50135 User root fired command: hares -refreshinfo activemq  from localhost
2013/07/22 22:40:06 VCS INFO V-16-1-50135 User root fired command: hares -refreshinfo activemq  localclus  from localhost
2013/07/22 22:42:09 VCS INFO V-16-1-50135 User root fired command: hares -flushinfo activemq  localclus  from localhost
2013/07/22 22:47:41 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss  dukosgbs  from localhost
2013/07/22 22:50:52 VCS INFO V-16-1-50859 Attempting to switch group Oss from system dukosgbs to system dukosgas
2013/07/22 22:50:52 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Oss  dukosgas  from localhost
2013/07/22 22:50:52 VCS NOTICE V-16-1-50929 Initial tests indicate group Oss is able to switch to system dukosgas.  Initiating offline of group on system dukosgbs

6 REPLIES 6

symsonu
Level 6

 the resources ware initiating to offline. But they did not come down.
 

 

root@dukosgbs> hastatus -sum
 
 
 
 -- SYSTEM STATE
 
 -- System               State                Frozen
 
 
 
 A  dukosgbs             RUNNING              0
 
 
 
 -- GROUP STATE
 
 -- Group           System               Probed     AutoDisabled    State
 
 
 
 B  ClusterService  dukosgbs             Y          N               ONLINE
 
 B  Oss             dukosgbs             Y          N               ONLINE|STOPPING
 
 B  Ossfs           dukosgbs             Y          N               ONLINE
 
 B  Sybase1         dukosgbs             Y          N               ONLINE
 
 
 
 -- RESOURCES OFFLINING
 
 -- Group           Type            Resource             System               IState
 
 
 
 F  Oss             Application     activemq             dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     alex                 dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     apache               dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     cron                 dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     ddc                  dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     glassfish            dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     imgr_httpd           dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     imgr_tomcat          dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     ldap_mon             dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     log_service          dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     netmgt_nettl         dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     netmgt_ov            dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     ovtrc                dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     restart_mc           dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     syb_log_mon          dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     syb_proc_mon         dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     time_service         dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     trapdist             dukosgbs             W_OFFLINE_PROPAGATE
 
 F  Oss             Application     vrsnt_log_mon        dukosgbs             W_OFFLINE_PROPAGATE
 
 
 
 -- WAN HEARTBEAT STATE
 
 -- Heartbeat       To                   State
 
 
 
 L  Icmp            gran_cluster1        ALIVE
 
 
 
 -- REMOTE CLUSTER STATE
 
 -- Cluster         State
 
 
 
 M  gran_cluster1   RUNNING
 
 
 
 -- REMOTE SYSTEM STATE
 
 -- cluster:system       State                Frozen
 
 
 
 N  gran_cluster1:dukosgas RUNNING              0

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Please show us main.cf for this service group?

It seems you have custom scripts that you have configured as Application type resources. 
Have you tested these scripts outside of VCS?

Wally_Heim
Level 6
Employee

The resource "activemq" seems to be having problems with its monitor script.  You should check on that resource specifically to see what it is checking during its monitor cycle.  You might also need to add some logging to its monitor script to find out what it is having problems checking.

-Wally

symsonu
Level 6

Yes, these are the tested scriptas and failover has happened many a times before

 This time , as we can see from logs that after failover, it got stuck at

Initiating offline of resource activemq  .

It continued only when we disable the smf service of activemq.

Only thing we need to know that , why the stopcommand of activemq was not executed by vcs at 20:11.

there were no logs in engine log  after this for quite some time.

symsonu
Level 6

Can you please clarify, how we can say that  there is issue with monitor script of activemq.

 

Daniel_Matheus
Level 4
Employee Accredited Certified

I think the problem is rather in your offline procedure.

You define a offline action and VCS will initiate that when the offline entry point is called.

 

The question is how do you try to stop activemq? By running a command or a custom script?

Maybe you can share this offline script/command?

Could you stop activemq by manually executing the command or script?

If not, then the problem was with the application itself and you should implement a clean procedure as well.

 

Can you please clarify, how we can say that  there is issue with monitor script of activemq.

 

Just check the process status (ps output or program -status etc, depending on the application/process) for example or pidfile