12-29-2014 01:07 PM
Hi Team,
We have alerts of concurrency violation, we have two servers in cluster mapibm625, mapibm626
Logs are,
2014/12/26 19:37:03 VCS INFO V-16-1-10299 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is online on mapibm625 (Not initiated by VCS)
2014/12/26 19:37:03 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group sapgtsprd
2014/12/26 19:37:03 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group sapgtsprd on all nodes
2014/12/26 19:37:04 VCS WARNING V-16-6-15034 (mapibm625) violation:Offlining group sapgtsprd on system mapibm625
2014/12/26 19:37:04 VCS INFO V-16-1-50135 User root fired command: hagrp -offline sapgtsprd mapibm625 from localhost
2014/12/26 19:37:04 VCS NOTICE V-16-1-10167 Initiating manual offline of group sapgtsprd on system mapibm625
2014/12/26 19:37:04 VCS NOTICE V-16-1-10300 Initiating Offline of Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) on System mapibm625
2014/12/26 19:37:04 VCS INFO V-16-6-15002 (mapibm625) hatrigger:hatrigger executed /opt/VRTSvcs/bin/internal_triggers/violation mapibm625 sapgtsprd successfully
2014/12/26 19:37:04 VCS INFO V-16-10011-306 (mapibm625) Application:App_saposcol:offline:Execution of Stop Program (/opt/VRTSvcs/bin/Saposcol/offline) returned (0).
2014/12/26 19:37:05 VCS INFO V-16-2-13716 (mapibm625) Resource(App_saposcol): Output of the completed operation (offline) ==============================================
2014/12/26 19:37:06 VCS INFO V-16-1-10305 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is offline on mapibm625 (VCS initiated)
2014/12/26 19:37:06 VCS NOTICE V-16-1-10446 Group sapgtsprd is offline on system mapibm625
========================================================================================
I have asked the application team to look out as whether they are working on the servers because the resource is of SAP( Resource App_saposcol)
However, application team has replied that they are not working on it and might the App_saposcol is online on both of servers which causes the issue.
Then, I have checked the status of resources in both the servers and it says,
[root@mapibm626]: # hares -state
#Resource Attribute System Value
App_saposcol State mapibm625 OFFLINE
App_saposcol State mapibm626 ONLINE
[root@mapibm625]: # hares -state
#Resource Attribute System Value
App_saposcol State mapibm625 OFFLINE
App_saposcol State mapibm626 ONLINE
and also checked the current logs of the server however found only,
2014/12/27 13:03:42 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/27 17:03:43 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/27 21:03:44 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 01:03:45 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 05:03:46 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 09:03:47 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 10:56:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 61%
2014/12/28 11:26:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 61%
2014/12/28 13:03:48 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 14:26:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 60%
2014/12/28 17:03:49 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 21:03:50 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 01:03:51 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 05:03:52 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 09:03:53 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 13:03:55 VCS INFO V-16-1-53504 VCS Engine Alive message!!
==========================================================================
Please assist what could be the possible reasons for this and in future how to avoid this?
Thanks,
Allaboutunix
Solved! Go to Solution.
01-02-2015 01:51 PM
For a quick fix change CMD line to:
CMD=`/usr/bin/ps -ef | ${GREP} " /usr/sap/hostctrl/exe/saposcol$"`
This is a more specific match so should not cause the inactive node to incorrectly match other process that contain the string saposcol. Before you make this change I would freeze the group containing resource in case you make a mistake when you edit script and once you have confirmed that monitor is successful and resource stays online you can then unfreeze the group.
A better solution would be to set MonitorProcesses to "/usr/sap/hostctrl/exe/saposcol", but you would also have to change "User" attribute to gtpadm as the Application agent matches the user as well and this would mean you would probably need to change online, offline and clean in /opt/VRTSvcs/bin/Saposcol so they can be run by user gtpadm as currently as they run as root, they probably do an su to gtpadm.
Mike
12-30-2014 08:29 AM
The message
2014/12/26 19:37:03 VCS INFO V-16-1-10299 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is online on mapibm625 (Not initiated by VCS)
means as it says the resource App_saposcol came online outside of VCS
This is not allowed (Concurrency Violation) so VCS brought it offline
2014/12/26 19:37:04 VCS NOTICE V-16-1-10300 Initiating Offline of Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) on System mapibm625
So you wouldn't see it offfline when you ran your commands as VCS brought it down within 1second of seeing it online.
So your issue is either:
Mike
12-30-2014 12:58 PM
When i gone with hares -display it shows,
App_saposcol ConfidenceLevel mapibm625 0
App_saposcol ConfidenceLevel mapibm626 100
App_saposcol ConfidenceMsg mapibm625
App_saposcol ConfidenceMsg mapibm626
App_saposcol Flags mapibm625
App_saposcol Flags mapibm626
App_saposcol IState mapibm625 not waiting
App_saposcol IState mapibm626 not waiting
App_saposcol MonitorMethod mapibm625 IMF
App_saposcol MonitorMethod mapibm626 Traditional
App_saposcol Probed mapibm625 1
App_saposcol Probed mapibm626 1
App_saposcol Start mapibm625 0
App_saposcol Start mapibm626 1
App_saposcol State mapibm625 OFFLINE
App_saposcol State mapibm626 ONLINE
App_saposcol CleanProgram global /opt/VRTSvcs/bin/Saposcol/clean
App_saposcol ComputeStats global 0
App_saposcol ContainerInfo global Type Name Enabled
App_saposcol EnvFile global
App_saposcol MonitorProcesses global
App_saposcol MonitorProgram global /opt/VRTSvcs/bin/Saposcol/monitor
App_saposcol PidFiles global
App_saposcol ResContainerInfo global Type Name Enabled
App_saposcol ResourceInfo global State Stale Msg TS
App_saposcol ResourceRecipients global
App_saposcol StartProgram global /opt/VRTSvcs/bin/Saposcol/online
App_saposcol StopProgram global /opt/VRTSvcs/bin/Saposcol/offline
App_saposcol TriggerPath global
App_saposcol TriggerResRestart global 0
App_saposcol TriggerResStateChange global 0
App_saposcol TriggersEnabled global
App_saposcol UseSUDash global 0
App_saposcol User global root
App_saposcol MonitorTimeStats mapibm625 Avg 0 TS
App_saposcol MonitorTimeStats mapibm626 Avg 0 TS
=========================================================
I believe it has Moniter program configured,so what you suggests, what would be our next action plan to resolve this?
12-30-2014 01:16 PM
You need to debug /opt/VRTSvcs/bin/Saposcol/monitor (this is a custom script - not one provided by Symantec) to find out why it is reporting your application is online when you believe it is not online.
If your application is SAP, then you should look to see if you can use the Symantec SAP agents you can download from https://sort.symantec.com/agents.
Mike
01-01-2015 04:22 PM
Its like this,
SU=/sbin/su
PS=/usr/bin/ps
SH=/usr/bin/sh
GREP=/usr/bin/grep
PGREP="/usr/bin/ps -ef | ${GREP}"
ECHO=/usr/bin/echo
SLEEP=/usr/bin/sleep
HALOG="/opt/VRTSvcs/bin/halog -add"
#
#
# This script is used by the application agent. It is done this way because
# the "MonitorProcesses" function looks for the exact string used to start it,
# unlike pgrep -f which can use an exact match of a partial string.
# It is preferred that VCS start it with the full path. If it is started
# manually without the full path, VCS will not detect that it is running.
#
CMD=`/usr/bin/ps -ef | ${GREP} "saposcol" | ${GREP} -v grep | ${GREP} -v App_saposcol`
if [ "${CMD}" ]
then
exit 110 # online (should be 110)
fi
exit 100 # offline (should be 100)
=======================================================
I am unable to judge with this 110 or 100 value.
Wha does this command do,
CMD=`/usr/bin/ps -ef | ${GREP} "saposcol" | ${GREP} -v grep | ${GREP} -v App_saposcol`
Is that relates with the issue?
01-02-2015 01:52 AM
The custom script you have shown is not written well so this is your issue. It is matching saposcol which is not specific enough as for instance if someone ran "vi saposcol" (or ANY file with saposcol somewhere in the filename), then this would be matched by the script and report to VCS that it was online.
You should use MonitorProcesses if you can:
run ps -ef | grep saposcol
Copy the whole command in the CMD column output of VCS for the saposcol process (not the App_saposcol process if this exists) and use this for MoniorProcesses. If you are unsure how to do this - please post the output of "ps -ef | grep saposcol" from the LIVE system.
Mike
01-02-2015 12:08 PM
In The second node mapibm626, when i tried with
[root@mapibm626]: # ps -ef |grep saposcol
gtpadm 13893680 1 0 Dec 21 - 8:48 /usr/sap/hostctrl/exe/saposcol
as saposcol shows online in mapibm626 and shows no o/p in mapibm625.
01-02-2015 01:51 PM
For a quick fix change CMD line to:
CMD=`/usr/bin/ps -ef | ${GREP} " /usr/sap/hostctrl/exe/saposcol$"`
This is a more specific match so should not cause the inactive node to incorrectly match other process that contain the string saposcol. Before you make this change I would freeze the group containing resource in case you make a mistake when you edit script and once you have confirmed that monitor is successful and resource stays online you can then unfreeze the group.
A better solution would be to set MonitorProcesses to "/usr/sap/hostctrl/exe/saposcol", but you would also have to change "User" attribute to gtpadm as the Application agent matches the user as well and this would mean you would probably need to change online, offline and clean in /opt/VRTSvcs/bin/Saposcol so they can be run by user gtpadm as currently as they run as root, they probably do an su to gtpadm.
Mike