03-14-2014 05:21 AM
Agent are in failed status. Below are the messages in engineA.log file. Please let me knwo the cause of this issue
VCS WARNING V-16-1-53025 Agent Script has faulted; ipm connection was lost; restarting the agent
VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/Script/ScriptAgent please check file
VCS WARNING V-16-1-53025 Agent NIC has faulted; ipm connection was lost; restarting the agent
VCS ERROR V-16-1-10008 Agent NIC has faulted 6 times since
VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/NIC/NICAgent please check file
VCS WARNING V-16-10001-4028 (unix) IP:Unix-G1-IP:monitor:Empty NetMask is supplied, default netmask will be used.
VCS WARNING V-16-1-10023 Agent DiskGroup not sending alive messages since
VCS WARNING V-16-1-53025 Agent DiskGroup has faulted; ipm connection was lost; restarting the agent
VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/DiskGroup/DiskGroupAgent please check file
Solved! Go to Solution.
03-14-2014 07:52 AM
Above log shows that all the agents are having issue which is giving a different indication ..
1. Either HAD process is hung or unresponsive OR
2 System itself is unresponsive which means that HAD process is not getting enough resources to communicate to the agents & hence all agents are complaining..
I would suggest to run OS utilities like "Sar" "prstat" or "top" to find what is happening with system performance ..
G
03-14-2014 07:52 AM
Above log shows that all the agents are having issue which is giving a different indication ..
1. Either HAD process is hung or unresponsive OR
2 System itself is unresponsive which means that HAD process is not getting enough resources to communicate to the agents & hence all agents are complaining..
I would suggest to run OS utilities like "Sar" "prstat" or "top" to find what is happening with system performance ..
G
03-28-2014 12:07 AM
03-28-2014 01:14 PM
Do you have this issue with some agents or all agents? Does this issue happen on one node in the cluster or all nodes? Also, does this issue happen during certain times of the day - Like when a backup is running etc? These details would help in troubleshooting this issue.
03-29-2014 11:55 AM
As Gaurva said, it looks like performance issue on the system and none of the agents are communicating with HAD.
Have you tried to stop and start the agent? If you want, you can freeze the SGs, you can manually stop and start the sevice and see how it works.
hagent -force -stop AgentName -sys Name
hagent -start AGENT-sys NodeName
I had similar issue for NIC Agent and stoped and start worked fine.