09-12-2012 06:03 AM
I need to set an attribute but I don't know which one I need.
When I do a switch sometimes I have resources hang for up to a minute, they don't fault they just hang. Is there a attribute that will continue to ping the resource every 2 second or so and say "offline" or "online"? It does seem to do this when it hangs but it may take a minute or two.
09-12-2012 11:53 PM
09-13-2012 07:45 AM
The software that I am running fails to offline. It is the first item to offline. It appears that it will try 3 times then wait about 5 min then when it tries to offline again it says "o you are offline" and everything else offlines as it should.
This software is setup under general services and the AMF is as follows:
Mode3
Monitor interval 60
Offline Monitor Interval 300
Registerretrylimit 3
The the Tolerance limit is set to 0
I am thinking that if I change the Monitor interval to 10 and the Offline Monitor Interval to 10 it should fix my problem by continually say, offline.
09-16-2012 11:48 PM
Hi,
You mentioned, your software fails to offline. Why exactly your software fail to offline?
We need more details about your setup like –
- What VCS resource you are using to monitor your software?
- VCS configuration snippet
- Without VCS how much time is taken by your software to completely offline?
If VCS resource monitoring your software is registered with AMF then AMF should provide instantaneous notification whenever software goes offline. This is the purpose of registering any VCS resource with AMF.
If you change MonitorInterval and OfflineMonitorInterval to 10 sec, monitor will be invoked every 10 sec. This will load your system with frequent monitor invocations. We can recommend any configuration change only after reviewing your existing configuration and understanding your software behavior.
Thanks and Regards,
Paresh Bafna
09-17-2012 04:41 AM
It appears that the resource signals for the offline, the application takes about 30 secs to offline but by then VCS doesnt see it anymore, almost like the MonitorInterval stopped working. After about 5 min or say 300 secs. VCS sees it offline and everything else goes down just fine.
It is a GenericService.
What is a snippet?
outside of VCA I can offline the software in about 30 secs.
I have AMF setup:
Mode 3 / MonitorInterval 60 / OfflineMonitorInterval 300 / RegisterRetryLimit 3
I was going to change this to:
Mode 3 / MonitorInterval 10 / OfflineMonitorInterval 10 / RegisterRetryLimit ? (should I set this to 0 or 10)
09-20-2012 04:09 AM
Could you please let me know OS version you are on and VCS version you are using?
Thanks and Regards,
Paresh Bafna
09-20-2012 04:13 AM
Windows 2008r2
SFWHA 5.1 CP13
09-20-2012 04:26 AM
I am not aware of exact behavior of AMF and VCS on Windows.
I would let Windows expert comment on this.
09-20-2012 05:13 AM
This is what I believe should happen:
Offline entry point is called and this will stop the service. When the offline entry point "thinks" the service stopped, the offline entry completes and the monitor entry point is called. Problems can occur if the service is not stopped when the monitor entry point is called and this depends on how the offline entry point is determines the service is stopped. I don't know how the code works for this particular resource type, but 3 main ways are:
Just looked at bundled agents guide and code does do 3 and the "wait" can be set by using "DelayAfterOffline" attribute for your GenericService resource (this is set on the resource, not the resource type/agent) which is 10 seconds by default, so you need to set this to 30 seconds or more.
What you are seeing at the moment is the offine entry point doesn't wait long enough and then the monitor entry point returns unknown (If the monitor entry point returned offline, then resource would be cleaned if OfflineWaitLimit is set to default of zero), so then I guess VCS then runs a monitor determined by the OfflineMonitorInterval.
Mike
09-22-2012 07:35 PM
If this is a case where the actual service stop is taking more time than the Generic Service offline timeout, we would see errors\warnings (as below) in the eventviewer.
"The service <name> did not stop within the specified timeout. Error = <err_code>",
Thanks,
-Amit