cancel
Showing results for 
Search instead for 
Did you mean: 

Service group concurrency violation

allaboutunix
Level 6

Hi Team,

We have alerts of concurrency violation, we have two servers in cluster mapibm625, mapibm626

 

Logs are,

 

2014/12/26 19:37:03 VCS INFO V-16-1-10299 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is online on mapibm625 (Not initiated by VCS)

2014/12/26 19:37:03 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group sapgtsprd

2014/12/26 19:37:03 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group sapgtsprd on all nodes

2014/12/26 19:37:04 VCS WARNING V-16-6-15034 (mapibm625) violation:Offlining group sapgtsprd on system mapibm625

2014/12/26 19:37:04 VCS INFO V-16-1-50135 User root fired command: hagrp -offline sapgtsprd  mapibm625  from localhost

2014/12/26 19:37:04 VCS NOTICE V-16-1-10167 Initiating manual offline of group sapgtsprd on system mapibm625

2014/12/26 19:37:04 VCS NOTICE V-16-1-10300 Initiating Offline of Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) on System mapibm625

2014/12/26 19:37:04 VCS INFO V-16-6-15002 (mapibm625) hatrigger:hatrigger executed /opt/VRTSvcs/bin/internal_triggers/violation mapibm625 sapgtsprd   successfully

2014/12/26 19:37:04 VCS INFO V-16-10011-306 (mapibm625) Application:App_saposcol:offline:Execution of Stop Program (/opt/VRTSvcs/bin/Saposcol/offline) returned (0).

2014/12/26 19:37:05 VCS INFO V-16-2-13716 (mapibm625) Resource(App_saposcol): Output of the completed operation (offline) ==============================================

2014/12/26 19:37:06 VCS INFO V-16-1-10305 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is offline on mapibm625 (VCS initiated)

2014/12/26 19:37:06 VCS NOTICE V-16-1-10446 Group sapgtsprd is offline on system mapibm625

 

========================================================================================

 

I have asked the application team to look out as whether they are working on the servers because the resource is of  SAP( Resource App_saposcol)

However, application team has replied that they are not working on it and might the App_saposcol is online on both of servers which causes the issue.

Then, I have checked the status of resources in both the servers and it says,

 

[root@mapibm626]:  # hares -state
#Resource           Attribute             System     Value
App_saposcol        State                 mapibm625  OFFLINE
App_saposcol        State                 mapibm626  ONLINE

 

[root@mapibm625]:  # hares -state
#Resource           Attribute             System     Value
App_saposcol        State                 mapibm625  OFFLINE
App_saposcol        State                 mapibm626  ONLINE

 

and also checked the current logs of the server however found only,

 

2014/12/27 13:03:42 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/27 17:03:43 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/27 21:03:44 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 01:03:45 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 05:03:46 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 09:03:47 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 10:56:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 61%
2014/12/28 11:26:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 61%
2014/12/28 13:03:48 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 14:26:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 60%
2014/12/28 17:03:49 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 21:03:50 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 01:03:51 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 05:03:52 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 09:03:53 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 13:03:55 VCS INFO V-16-1-53504 VCS Engine Alive message!!

 

 

==========================================================================

 

Please assist what could be the possible reasons for this and in future how to avoid this?

 

Thanks,

Allaboutunix

1 ACCEPTED SOLUTION

Accepted Solutions

mikebounds
Level 6
Partner Accredited

For a quick fix change CMD line to:

CMD=`/usr/bin/ps -ef | ${GREP} " /usr/sap/hostctrl/exe/saposcol$"`

This is a more specific match so should not cause the inactive node to incorrectly match other process that contain the string saposcol.  Before you make this change I would freeze the group containing resource in case you make a mistake when you edit script and once you have confirmed that monitor is successful and resource stays online you can then unfreeze the group.

A better solution would be to set MonitorProcesses to "/usr/sap/hostctrl/exe/saposcol", but you would also have to change "User" attribute to gtpadm as the Application agent matches the user as well and this would mean you would probably need to change online, offline and clean in /opt/VRTSvcs/bin/Saposcol so they can be run by user gtpadm as currently as they run as root, they probably do an su to gtpadm.

Mike

View solution in original post

7 REPLIES 7

mikebounds
Level 6
Partner Accredited

The message 

2014/12/26 19:37:03 VCS INFO V-16-1-10299 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is online on mapibm625 (Not initiated by VCS)

means as it says the resource App_saposcol came online outside of VCS

This is not allowed (Concurrency Violation) so VCS brought it offline

2014/12/26 19:37:04 VCS NOTICE V-16-1-10300 Initiating Offline of Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) on System mapibm625

So you wouldn't see it offfline when you ran your commands as VCS brought it down within 1second of seeing it online.

So your issue is either:

  1. Someone brought the app online outside of VCS control within the 5 mins prior to 19:37:03
  2. The App_saposcol  resource is incorrectly reporting the resource is online - this is unlikely if you are using SAP agent, but if you are using Application agent then if you are using MonitorProcesses, then you should check the process is unique (for example the process is not the same as a test instance the app team maybe starting) or if using MonitorProgram, then you should check the script you are using.

Mike

allaboutunix
Level 6

When i gone with hares -display it shows,

 

App_saposcol        ConfidenceLevel       mapibm625  0
App_saposcol        ConfidenceLevel       mapibm626  100
App_saposcol        ConfidenceMsg         mapibm625
App_saposcol        ConfidenceMsg         mapibm626
App_saposcol        Flags                 mapibm625
App_saposcol        Flags                 mapibm626
App_saposcol        IState                mapibm625  not waiting
App_saposcol        IState                mapibm626  not waiting
App_saposcol        MonitorMethod         mapibm625  IMF
App_saposcol        MonitorMethod         mapibm626  Traditional
App_saposcol        Probed                mapibm625  1
App_saposcol        Probed                mapibm626  1
App_saposcol        Start                 mapibm625  0
App_saposcol        Start                 mapibm626  1
App_saposcol        State                 mapibm625  OFFLINE
App_saposcol        State                 mapibm626  ONLINE
App_saposcol        CleanProgram          global     /opt/VRTSvcs/bin/Saposcol/clean
App_saposcol        ComputeStats          global     0
App_saposcol        ContainerInfo         global     Type               Name            Enabled
App_saposcol        EnvFile               global
App_saposcol        MonitorProcesses      global
App_saposcol        MonitorProgram        global     /opt/VRTSvcs/bin/Saposcol/monitor
App_saposcol        PidFiles              global
App_saposcol        ResContainerInfo      global     Type               Name            Enabled
App_saposcol        ResourceInfo          global     State      Stale   Msg             TS
App_saposcol        ResourceRecipients    global
App_saposcol        StartProgram          global     /opt/VRTSvcs/bin/Saposcol/online
App_saposcol        StopProgram           global     /opt/VRTSvcs/bin/Saposcol/offline
App_saposcol        TriggerPath           global
App_saposcol        TriggerResRestart     global     0
App_saposcol        TriggerResStateChange global     0
App_saposcol        TriggersEnabled       global
App_saposcol        UseSUDash             global     0
App_saposcol        User                  global     root
App_saposcol        MonitorTimeStats      mapibm625  Avg        0       TS
App_saposcol        MonitorTimeStats      mapibm626  Avg        0       TS

=========================================================

 

I believe it has Moniter program configured,so what you suggests, what would be our next action plan to resolve this?

mikebounds
Level 6
Partner Accredited

You need to debug /opt/VRTSvcs/bin/Saposcol/monitor (this is a custom script - not one provided by Symantec) to find out why it is reporting your application is online when you believe it is not online.

If your application is SAP, then you should look to see if you can use the Symantec SAP agents you can download from https://sort.symantec.com/agents

Mike

allaboutunix
Level 6

Its like this,

SU=/sbin/su
PS=/usr/bin/ps
SH=/usr/bin/sh
GREP=/usr/bin/grep
PGREP="/usr/bin/ps -ef | ${GREP}"
ECHO=/usr/bin/echo
SLEEP=/usr/bin/sleep
HALOG="/opt/VRTSvcs/bin/halog -add"
#

#
# This script is used by the application agent. It is done this way because
# the "MonitorProcesses" function looks for the exact string used to start it,
# unlike pgrep -f which can use an exact match of a partial string.
# It is preferred that VCS start it with the full path. If  it is started
# manually without the full path, VCS will not detect that it is running.
#

CMD=`/usr/bin/ps -ef | ${GREP} "saposcol" | ${GREP} -v grep | ${GREP} -v App_saposcol`
if [ "${CMD}" ]
then
        exit 110                # online (should be 110)
fi
exit 100                        # offline (should be 100)

=======================================================

I am unable to judge with this 110 or 100 value.

 

Wha does this command do,

CMD=`/usr/bin/ps -ef | ${GREP} "saposcol" | ${GREP} -v grep | ${GREP} -v App_saposcol`

Is that relates with the issue?

mikebounds
Level 6
Partner Accredited

The custom script you have shown is not written well so this is your issue.  It is matching saposcol which is not specific enough as for instance if someone ran "vi saposcol" (or ANY file with saposcol somewhere in the filename), then this would be matched by the script and report to VCS that it was online.

You should use MonitorProcesses if you can:

run ps -ef   | grep saposcol

Copy the whole command in the CMD column output of VCS for the saposcol process (not the App_saposcol process if this exists) and use this for MoniorProcesses.  If you are unsure how to do this - please post the output of "ps -ef | grep saposcol" from the LIVE system.

Mike

allaboutunix
Level 6

In The second node mapibm626, when i tried with 

[root@mapibm626]:  # ps -ef |grep saposcol
  gtpadm 13893680        1   0   Dec 21      -  8:48 /usr/sap/hostctrl/exe/saposcol

as  saposcol shows online in mapibm626 and shows no o/p in mapibm625.                                                               

 

mikebounds
Level 6
Partner Accredited

For a quick fix change CMD line to:

CMD=`/usr/bin/ps -ef | ${GREP} " /usr/sap/hostctrl/exe/saposcol$"`

This is a more specific match so should not cause the inactive node to incorrectly match other process that contain the string saposcol.  Before you make this change I would freeze the group containing resource in case you make a mistake when you edit script and once you have confirmed that monitor is successful and resource stays online you can then unfreeze the group.

A better solution would be to set MonitorProcesses to "/usr/sap/hostctrl/exe/saposcol", but you would also have to change "User" attribute to gtpadm as the Application agent matches the user as well and this would mean you would probably need to change online, offline and clean in /opt/VRTSvcs/bin/Saposcol so they can be run by user gtpadm as currently as they run as root, they probably do an su to gtpadm.

Mike

View solution in original post