Manage time interval of ERROR V-16-2-13074 has consistently failed to determine the resource status.
Hi everyone,
I've been performing some tests with monitor program.
I could see that after many monitor failures the messages below came up :
2020/10/15 14:37:45 VCS ERROR V-16-2-13074 (cloud-svc-4) The monitoring program for resource(Res_App_svc_cluster_cmserv_vm_service_cmserv) has consistently failed to determine the resource status within the expected time. Agent is restarting (attempt number 1 of 3) the resource. 2020/10/15 15:13:50 VCS ERROR V-16-2-13074 (cloud-svc-4) The monitoring program for resource(Res_App_svc_cluster_cmserv_vm_service_cmserv) has consistently failed to determine the resource status within the expected time. Agent is restarting (attempt number 2 of 3) the resource. 2020/10/15 15:40:49 VCS ERROR V-16-2-13074 (cloud-svc-4) The monitoring program for resource(Res_App_svc_cluster_cmserv_vm_service_cmserv) has consistently failed to determine the resource status within the expected time. Agent is restarting (attempt number 3 of 3) the resource.
After the third message above we have another monitor program issue and then my service became faulted:
2020/10/15 16:39:49 VCS ERROR V-16-2-13027 (cloud-svc-4) Resource(Res_App_svc_cluster_cmserv_vm_service_cmserv) - monitor procedure did not complete within the expected time. 2020/10/15 16:40:49 VCS ERROR V-16-2-13210 (cloud-svc-4) Agent is calling clean for resource(Res_App_svc_cluster_cmserv_vm_service_cmserv) because 3 successive invocations of the monitor procedure did not complete within the expected time. 2020/10/15 16:41:47 VCS INFO V-16-10031-504 (cloud-svc-4) Application:Res_App_svc_cluster_cmserv_vm_service_cmserv:clean:Executed /sbin/service as user root 2020/10/15 16:41:58 VCS INFO V-16-2-13716 (cloud-svc-4) Resource(Res_App_svc_cluster_cmserv_vm_service_cmserv): Output of the completed operation (clean) 2020/10/15 16:41:58 VCS INFO V-16-2-13068 (cloud-svc-4) Resource(Res_App_svc_cluster_cmserv_vm_service_cmserv) - clean completed successfully. 2020/10/15 16:41:59 VCS INFO V-16-2-13026 (cloud-svc-4) Resource(Res_App_svc_cluster_cmserv_vm_service_cmserv) - monitor procedure finished successfully after failing to complete within the expected time for (3) consecutive times. 2020/10/15 16:41:59 VCS INFO V-16-1-10307 Resource Res_App_svc_cluster_cmserv_vm_service_cmserv (Owner: Unspecified, Group: Grp_CS_svc_cluster_cmserv) is offline on cloud-svc-4 (Not initiated by VCS) 2020/10/15 16:41:59 VCS ERROR V-16-1-10205 Group Grp_CS_svc_cluster_cmserv is faulted on system cloud-svc-4
My question is about the message I mentioned first:
V-16-2-13074 -> The monitoring program for resource(Res_App_svc_cluster_cmserv_vm_service_cmserv) has consistently failed to determine the resource status within the expected time. Agent is restarting (attempt number 1 of 3) the resource.
Is there anything to control the limit of these attempts ? We can see here the limit would be 3 attempts. Where do I see this configuration? I did not see any instruction related to it on main.cf
The second and main question would be regarding time interval of this procedure.
We have the first error at 14:37:45, the second at 15:13:50, the third at 15:40:49 and then monitor program failed again and service got faulted at 16:41:59.
How do I manage the time interval of this procedure / error? It looks like if you have the error "V-16-2-13074 -> has consistently failed" at any time of the day, the counter of attempt will continue running.
If the service has had 3 attempts during 2am and 5am, there is a risk the service becomes faulted at 11am if there is monitor program does not get status of service within expected time.
Regards,
Fernando Santos