07-09-2014 08:37 AM
Split off from https://www-secure.symantec.com/connect/forums/help-ha-app-and-dns-resource
The application agent seems to be running on both servers and the monitor is obviously failing on the secondary host.
What attribute do I need to set so that it only runs one monitor instance?
Thanks
Solved! Go to Solution.
07-09-2014 08:40 AM
You can set OfflineMonitorInterval for the Application to 0 (hatype -modify Application OfflineMonitorInterval 0) which will turn off Offline Monitoring, but you shouldn't need to do this:
If you using MonitorProcesses or PidFiles attribute then the resource should just report offline. If you are using MonitorProgram attribute then your program should be on local storage and should report offline when the application is not running (as oppose to the Monitor program failing to run when the application is not running).
Mike
07-09-2014 08:40 AM
You can set OfflineMonitorInterval for the Application to 0 (hatype -modify Application OfflineMonitorInterval 0) which will turn off Offline Monitoring, but you shouldn't need to do this:
If you using MonitorProcesses or PidFiles attribute then the resource should just report offline. If you are using MonitorProgram attribute then your program should be on local storage and should report offline when the application is not running (as oppose to the Monitor program failing to run when the application is not running).
Mike
07-09-2014 08:53 AM
This setup is actually based on shared storage, so ideally would be nice to link it to the host that cfs has as "active"
07-09-2014 09:57 AM
The norm for VCS agents is that the agent will run on all systems that are listed in the resource's Service Group's SystemList.
As Mike says, it would normally return OFFLINE on the host where the Service Group is offline, and ONLINE on the host where the Service Group is online.
The idea here is that it is meant on the offline host to make sure that the resource does not online on the system that should not be hosting the application, in order to protect against concurrency violations.
In order for us to help further than what Mike has already provided, we need to hear details on exactly how the monitor is failing on the offline system.
If it is failing because the monitor program cannot be found on the offline host, then it sounds like the monitor program lives on shared storage that is only accessible on the system where the resource is online on. In such a case you need to move the monitor program to an area where it can always be accessed on each system, whether or not the service group is currently online on that system or not.
A common place to put such programs would be in either of the two locations:
/opt/VRTSvcs/bin/scripts/<yourApplicationName>/<yourMonitorProgram>
/opt/VRTSagents/ha/bin/<yourApplicationName>/<yourMonitorProgram>
If it makes sense in your case, a common design pattern in such cases is to check to see if the shared storage is mounted locally before proceeding with the actual monitor-logic. If the shared storage is mounted locally, carry on with the normal monitor procedure; otherwise, return OFFINE.
07-09-2014 10:23 AM
Thanks for the response.
The agent is actually running on the shared storage and checks for a running process. Because it is shared storage the mount is online on both hosts, so not sure exactly what can be checked / "mounted locally"
What I might do (for now anyway) is to check for the prescence of a file and make that the dependency ...
07-09-2014 11:03 AM
Can you clarify a few things:
Mike
07-09-2014 12:58 PM
Hi Mike,
hatype -modify Application OfflineMonitorInterval 0 seems to have fixed the issue (I think) - just keen to know the best way. So comments inline below.
07-09-2014 01:48 PM
I'm still not clear on point 4. If a monitor fails, then "hastatus -sum" should show that resource is unprobed on the system the monitor fails - i.e something like:
-- RESOURCES NOT PROBED -- Group Type Resource System E failoversg custom_agent custom_res Sys2
If the agent is reporting offline on the in-active node then this is fine - this is what it should do. I don't undertstand why it would call clean as a clean is normally called when the resource has been online and then goes offline outside of VCS control, or if offline fails.
Can you give extract from engine log of the issue you see when the OfflineMonitorInterval is not zero.
Why do yo need to use a custom agent - is there is a reason you cannot use the application agent. Custom agent is only normally used if the application agent doesn't work - an example would be if an application cannot be uniquely identified by the first 80 characters shown by normal ps and so you need to use ucb ps to show more than 80 chars.
Mike
07-10-2014 09:02 AM
Hi Mike,
Firstly - the odd thing is that I haven't been able to reproduce it after chaning the OfflineMonitorInterval to non zero. i.e. it shows the process is dead but doesn't run a clean which it was doing before. I even offlined everything and checked nothing was running. Maybe something strange was in place before after agents were installed/configured.
Anyway for future reference, current state below.
Regarding "custom agents" - in short they were developed by someone else, and built for a DR/GCO setup and done for various applications using a standard approach, and each application type tests for more than just the process running, so that's the main reason.
So we can probably park this for now.
Thanks again.
Mark
hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A host1 RUNNING 0
A host2 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B MyApplication host1 Y N ONLINE
B MyApplication host2 Y N OFFLINE
B cvm host1 Y N ONLINE
B cvm host2 Y N ONLINE
B vrts_vea_cfs_int_cfsmount1 host1 Y N ONLINE
B vrts_vea_cfs_int_cfsmount1 host2 Y N ONLINE
B vxfen host1 Y N ONLINE
B vxfen host2 Y N ONLINE
==============================================
DEBUG [Thu Jul 10 11:47:34 2014] APP::monitor monitor enabled
App APP is stopped
==============================================
(where host, MyApplication and APP represent actual names)