cancel
Showing results for 
Search instead for 
Did you mean: 

VCS 5.0 MP3 + HF1 (on Solaris 10 x86) MonitorTimeout - lowest supported value

rationalbytes
Level 2
Hello everyone,

As part of a project I'm working on I have written a custom C++ based Veritas agent (using the Veritas Agent Framework) which polls our application every second (MonitorInterval = 1) on an established TCP/IP connection. The application responds with a pre-defined set of values indicating its health, and the agent code interprets those values and returns the state to the VCS engine.

The agent works well, and in general we are very happy with the performance of both the agent and Veritas in general. We have noticed however that the MonitorTimeout setting does not seem to work exactly as advertised when dealing with short monitoring intervals and timeouts.

We have MonitorInterval set to 1 second, MonitorTimeout set to 2 seconds, and FaultOnMonitorTimeouts set to 4 (the default). We are connecting over the loopback adapter - 127.0.0.1 - so there is no network latency to take into account. Therefore we are expecting to see our monitor entry point time out after MonitorTimeout seconds (i.e. after 2 seconds). What we are actually seeing is that it takes at least 4 seconds for the monitor entry point to time out, so rather than the application faulting in 8 seconds as expected, it takes over 16 seconds for the clean entry point to be invoked.

While the numbers involved are still low, this is a very sensitive application, and we need these values to be as low as feasibly possible. Our workaround is to increase MonitorTimeout to 4 seconds and drop FaultOnMonitorTimeouts to 2 - but this is not ideal.

Has anyone else dealt with such small values for these settings, and have they seen MonitorTimeout exhibit this problem before? On investigation it looks like the SIGCANCEL signal to kill the monitoring thread is not even being sent until 4 seconds have elapsed (regardless of the MonitorTimeout value), so it really looks like some sort of internal limit on MonitorTimeout to me at present.

Thanks for your time.


Dave H.
--


2 REPLIES 2

Gaurav_S
Moderator
Moderator
   VIP    Certified
Hello Dave,

Never seen someone using such a smaller values..... At very simply thinking I could see there could be many reasons in between for response getting late... When you are setting monitorinterval at 1sec, it is probing agent every 1 sec which itself giving lot of work to agent... In such scenarios agent may delay some of responses....

this can also happen if server is running low with mem/cpu, 1 sec & 2 sec is very low margin.... just my thinking...


Gaurav

Gaurav_S
Moderator
Moderator
   VIP    Certified
Hello Dave,

Never seen someone using such a smaller values..... At very simply thinking I could see there could be many reasons in between for response getting late... When you are setting monitorinterval at 1sec, it is probing agent every 1 sec which itself giving lot of work to agent... In such scenarios agent may delay some of responses....

this can also happen if server is running low with mem/cpu, 1 sec & 2 sec is very low margin.... just my thinking...


Gaurav