Solved: Veritas Cluster -engine_A.log

Edisson · ‎06-06-2013

Getting the following error in engine_A.log:

2013/05/31 14:16:12 VCS INFO V-16-1-10307 Resource cvmvoldg5 (Owner: unknown, Group: lic_DG) is offline on dwlemm2b (Not initiated by VCS)

2013/05/31 14:16:14 VCS INFO V-16-6-15004 (dwlemm2b) hatrigger:Failed to send trigger for resfault; script doesn't exist

2013/05/31 14:16:15 VCS NOTICE V-16-10001-5510 (dwlemm2b) CFSMount:cfsmount5:offline:Attempting fuser TERM : Mount Point : /var/opt/sentinel

2013/05/31 14:16:18 VCS NOTICE V-16-10001-5510 (dwlemm2b) CFSMount:cfsmount2:offline:Attempting fuser TERM : Mount Point : /var/opt/BGw/ServerGroup1

2013/05/31 15:11:51 VCS INFO V-16-1-10307 Resource fmm (Owner: unknown, Group: FMMgrp) is offline on dwlemm2a (Not initiated by VCS)

and many more similar kind of error.May anyone knows the cause of any of these error.

stinsong · ‎06-08-2013

And you can setup debug for CVMVolDg resource type to monitor why it gets fault if the problem could be reproduced.

To turn on debug mode for the agent, you will need to edit the /opt/VRTSvcs/bin/CVMVolDg/cvmvoldg.lib, and uncomment one line.

from this:
# Uncomment the following to start debugging
# DEBUG="DEBUG"

to this:
# Uncomment the following to start debugging
DEBUG="DEBUG"

It is not necessary to restart the agent since this lib file is read every time the monitor is ran. By doing this, the agent will place more information in the engine log, which may give a better idea as to why the resources are timing out.

# haagent -stop CVMVolDg -force -sys <node name>
# haagent -start CVMVolDg -sys <node name>

View solution in original post

mikebounds · ‎06-07-2013

Assuming cvmvoldg5 is of type CVMVolDg, then this message means the monitor entry point for CVMVolDg failed - I believe the monitor entry point does a "dd" read of the Volumes you specifiy in the CVMVolumes attribute, so probably a "dd" read to one of these volumes fail so you maybe had a SAN issue - you should check your system log to see of there are any I/O errors around this time.

Mike

stinsong · ‎06-08-2013

Hi Edisson,

CVMVolDg monitoring can fault the CVMVolDg resources on a system if any of the following checks fail during monitoring:

1) If the VCS_VOL_STAT file /var/VRTSvcs/lock/${cvmvoldg_resname}_${cvmvoldg_dgname}_stat does not exist. This file was created by online script when the CVMVolDg resource became online. If the file doesn't exist then monitor will return OFFLINE.

2) The vxnotify process PID file check fails. If the PID file /var/VRTSvcs/lock/${cvmvoldg_resname}_${cvmvoldg_dgname}_pid doesn't exist then the monitor returns OFFLINE.

3) If the PID in the process table is not actually that of the aforementioned vxnotify process.

4) If the WHO_STAT file /var/VRTSvcs/lock/${cvmvoldg_resname}_${cvmvoldg_dgname}_who does not exist.

This file is again created when the CVMVolDg is onlined on the node. If the file doesn't exist then monitor returns OFFLINE.

5) If the "who -b" command output does not match exactly what is in the /var/VRTSvcs/lock/${cvmvoldg_resname}_${cvmvoldg_dgname}_who file. If they differ monitor returns OFFLINE. "who -b" command returns the boot time of the node.

6) dd to read test on CVMVolumes failed by timeout or I/O error, if there is CVMVolumes resource configured.

We have noticed that "who -b" in some occasions return empty string. "who -b" command uses /var/run/utmp (in Linux) and /var/adm/utmpx (in Solaris) to get the boot time. When utmp file gets truncated/purged or modified and the boot time record get removed, "who -b" command will return empty and this causes the CVMVolDg monitoring to return OFFLINE and fault CVMVolDg resources on that node.

stinsong · ‎06-08-2013

And you can setup debug for CVMVolDg resource type to monitor why it gets fault if the problem could be reproduced.

To turn on debug mode for the agent, you will need to edit the /opt/VRTSvcs/bin/CVMVolDg/cvmvoldg.lib, and uncomment one line.

from this:
# Uncomment the following to start debugging
# DEBUG="DEBUG"

to this:
# Uncomment the following to start debugging
DEBUG="DEBUG"

It is not necessary to restart the agent since this lib file is read every time the monitor is ran. By doing this, the agent will place more information in the engine log, which may give a better idea as to why the resources are timing out.

# haagent -stop CVMVolDg -force -sys <node name>
# haagent -start CVMVolDg -sys <node name>

Edisson · ‎06-10-2013

Thanks for the help mike and stinsong.

VOX

Veritas Cluster -engine_A.log