Agent exiting in vcs 6.1 on rhel6.6
Hi All,
i have 10 nodes vcs cluster setup on rhel 6.6 ,here 28 service groups are running . some are parallel and mostly are failover service groups
i am getting below error around work
Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10121]: VCS ERROR V-16-2-13120 Thread(4151479088) Error receiving from the engine. Agent(LVMVolumeGroup) is exiting
Jun 19 05:36:50 HTNDPUEDSVC01 t of memory [5967]
Jun 19 05:36:50 HTNDPUEDSVC01 lloc() FAILED at file Memory.C, line 376, memory could not be allocated. Process ID 5967 dumping core deliberately!
Jun 19 05:37:12 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 22 sec
Jun 19 05:37:13 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 23 sec
Jun 19 05:37:14 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 24 sec
Jun 19 05:37:15 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 25 sec
Jun 19 05:37:16 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 26 sec
Jun 19 05:37:17 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 27 sec
Jun 19 05:37:18 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 28 sec
Jun 19 05:37:19 HTNDPUEDSVC01 abrt[14811]: Write error: No space left on device
Jun 19 05:37:19 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 29 sec
Jun 19 05:37:19 HTNDPUEDSVC01 abrt[14811]: Error writing '/var/spool/abrt/ccpp-2016-06-19-05:36:50-5967.new/coredump'
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20058 Port h[GAB_USER_CLIENT (refcount 0)] process 5967: heartbeat failed, killing process
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20059 Port h[GAB_USER_CLIENT (refcount 0)] heartbeat interval 30000 msec. Statistics:
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 0 ~ 6000 msec: 75483871
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 6000 ~ 12000 msec: 0
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 12000 ~ 18000 msec: 0
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 18000 ~ 24000 msec: 0
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 24000 ~ 30000 msec: 0
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20088 System information:
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20089 number of cpu: 16
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20090 physical memory: 49426504 K
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20091 free memory: 5399360 K
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20041 Port h: client process failure: killing process
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20161 Port h client process killed, GAB will initiate regmon action syslog after 200 sec
Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20032 Port h closed
Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10130]: VCS ERROR V-16-2-13120 Thread(4151511776) Error receiving from the engine. Agent(HostMonitor) is exiting.
Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10121]: VCS ERROR V-16-2-13120 Thread(4151479088) Error receiving from the engine. Agent(LVMVolumeGroup) is exiting.
Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10133]: VCS ERROR V-16-2-13120 Thread(4151663328) Error receiving from the engine. Agent(Proxy) is exiting.
Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10116]: VCS ERROR V-16-2-13120 Thread(4151875280) Error receiving from the engine. Agent(Application) is exiting.
Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10128]: VCS ERROR V-16-2-13120 Thread(4151543504) Error receiving from the engine. Agent(VMwareDisks) is exiting.
Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10120]: VCS ERROR V-16-2-13120 Thread(4151569120) Error receiving from the engine. Agent(IP) is exiting.
Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10122]: VCS ERROR V-16-2-13120 Thread(4151507680) Error receiving from the engine. Agent(LVMLogicalVolume) is exiting.
Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10124]: VCS ERROR V-16-2-13120 Thread(4151524144) Error receiving from the engine. Agent(Phantom) is exiting.
Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10126]: VCS ERROR V-16-2-13120 Thread(4152023856) Error receiving from the engine. Agent(NIC) is exiting.
Jun 19 05:37:20 HTNDPUEDSVC01 hashadow[4005]: VCS ERROR V-16-1-11103 VCS exited. It will restart
Please suggest what action i should take at this place of error
Hi,
How is the load on the system ? These errors are quite common if system is over loaded. I would recommend to have a look at system performance to begin with. What VCS is displaying above might be a result of poor performance of server.
Also, I see the error that no space left on the device, is system low on space where logs are getting written ?
If you google around "port h client process failure", you will find many articles .. for e.g.
https://www.veritas.com/support/en_US/article.TECH1794.html
G