Forum Discussion

sureshpeters's avatar
11 years ago

Agent failed in VCS

Agent are in failed status. Below are the messages in engineA.log file. Please let me knwo the cause of this issue

 
VCS WARNING V-16-1-53025 Agent Script has faulted; ipm connection was lost; restarting the agent
VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/Script/ScriptAgent please check file


VCS WARNING V-16-1-53025 Agent NIC has faulted; ipm connection was lost; restarting the agent
 VCS ERROR V-16-1-10008 Agent NIC has faulted 6 times since  

VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/NIC/NICAgent please check file
 VCS WARNING V-16-10001-4028 (unix) IP:Unix-G1-IP:monitor:Empty NetMask is supplied, default netmask will be used.


VCS WARNING V-16-1-10023 Agent DiskGroup not sending alive messages since 

VCS WARNING V-16-1-53025 Agent DiskGroup has faulted; ipm connection was lost; restarting the agent
VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/DiskGroup/DiskGroupAgent please check file

  • Above log shows that all the agents are having issue which is giving a different indication ..

    1. Either HAD process is hung or unresponsive OR

    2 System itself is unresponsive which means that HAD process is not getting enough resources to communicate to the agents & hence all agents are complaining..

    I would suggest to run OS utilities like "Sar" "prstat" or "top" to find what is happening with system performance ..

     

    G

4 Replies

  • Above log shows that all the agents are having issue which is giving a different indication ..

    1. Either HAD process is hung or unresponsive OR

    2 System itself is unresponsive which means that HAD process is not getting enough resources to communicate to the agents & hence all agents are complaining..

    I would suggest to run OS utilities like "Sar" "prstat" or "top" to find what is happening with system performance ..

     

    G

  • There is an interprocess communication between the VCS agents and the had daemon. If this communication is disrupted (due to system load), the agent will fault, and had will restart the agent. Pls also reference to http://www.symantec.com/docs/TECH155691
  • Do you have this issue with some agents or all agents? Does this issue happen on one node in the cluster or all nodes? Also, does this issue happen during certain times of the day - Like when a backup is running etc? These details would help in troubleshooting this issue. 

  • As Gaurva said, it looks like performance issue on the system and none of the agents are communicating with HAD.

    Have you tried to stop and start the agent? If you want, you can freeze  the SGs, you can manually stop and start the sevice and see how it works.

    hagent -force -stop AgentName -sys Name

    hagent -start AGENT-sys NodeName

     

    I had similar issue for NIC Agent and stoped and start worked fine.