Forum Discussion

allaboutunix's avatar
10 years ago

Service group concurrency violation

Hi Team,

We have alerts of concurrency violation, we have two servers in cluster mapibm625, mapibm626

 

Logs are,

 

2014/12/26 19:37:03 VCS INFO V-16-1-10299 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is online on mapibm625 (Not initiated by VCS)

2014/12/26 19:37:03 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group sapgtsprd

2014/12/26 19:37:03 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group sapgtsprd on all nodes

2014/12/26 19:37:04 VCS WARNING V-16-6-15034 (mapibm625) violation:Offlining group sapgtsprd on system mapibm625

2014/12/26 19:37:04 VCS INFO V-16-1-50135 User root fired command: hagrp -offline sapgtsprd  mapibm625  from localhost

2014/12/26 19:37:04 VCS NOTICE V-16-1-10167 Initiating manual offline of group sapgtsprd on system mapibm625

2014/12/26 19:37:04 VCS NOTICE V-16-1-10300 Initiating Offline of Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) on System mapibm625

2014/12/26 19:37:04 VCS INFO V-16-6-15002 (mapibm625) hatrigger:hatrigger executed /opt/VRTSvcs/bin/internal_triggers/violation mapibm625 sapgtsprd   successfully

2014/12/26 19:37:04 VCS INFO V-16-10011-306 (mapibm625) Application:App_saposcol:offline:Execution of Stop Program (/opt/VRTSvcs/bin/Saposcol/offline) returned (0).

2014/12/26 19:37:05 VCS INFO V-16-2-13716 (mapibm625) Resource(App_saposcol): Output of the completed operation (offline) ==============================================

2014/12/26 19:37:06 VCS INFO V-16-1-10305 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is offline on mapibm625 (VCS initiated)

2014/12/26 19:37:06 VCS NOTICE V-16-1-10446 Group sapgtsprd is offline on system mapibm625

 

========================================================================================

 

I have asked the application team to look out as whether they are working on the servers because the resource is of  SAP( Resource App_saposcol)

However, application team has replied that they are not working on it and might the App_saposcol is online on both of servers which causes the issue.

Then, I have checked the status of resources in both the servers and it says,

 

[root@mapibm626]:  # hares -state
#Resource           Attribute             System     Value
App_saposcol        State                 mapibm625  OFFLINE
App_saposcol        State                 mapibm626  ONLINE

 

[root@mapibm625]:  # hares -state
#Resource           Attribute             System     Value
App_saposcol        State                 mapibm625  OFFLINE
App_saposcol        State                 mapibm626  ONLINE

 

and also checked the current logs of the server however found only,

 

2014/12/27 13:03:42 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/27 17:03:43 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/27 21:03:44 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 01:03:45 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 05:03:46 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 09:03:47 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 10:56:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 61%
2014/12/28 11:26:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 61%
2014/12/28 13:03:48 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 14:26:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 60%
2014/12/28 17:03:49 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 21:03:50 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 01:03:51 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 05:03:52 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 09:03:53 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 13:03:55 VCS INFO V-16-1-53504 VCS Engine Alive message!!

 

 

==========================================================================

 

Please assist what could be the possible reasons for this and in future how to avoid this?

 

Thanks,

Allaboutunix

  • For a quick fix change CMD line to:

    CMD=`/usr/bin/ps -ef | ${GREP} " /usr/sap/hostctrl/exe/saposcol$"`

    This is a more specific match so should not cause the inactive node to incorrectly match other process that contain the string saposcol.  Before you make this change I would freeze the group containing resource in case you make a mistake when you edit script and once you have confirmed that monitor is successful and resource stays online you can then unfreeze the group.

    A better solution would be to set MonitorProcesses to "/usr/sap/hostctrl/exe/saposcol", but you would also have to change "User" attribute to gtpadm as the Application agent matches the user as well and this would mean you would probably need to change online, offline and clean in /opt/VRTSvcs/bin/Saposcol so they can be run by user gtpadm as currently as they run as root, they probably do an su to gtpadm.

    Mike

  • The message 

    2014/12/26 19:37:03 VCS INFO V-16-1-10299 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is online on mapibm625 (Not initiated by VCS)

    means as it says the resource App_saposcol came online outside of VCS

    This is not allowed (Concurrency Violation) so VCS brought it offline

    2014/12/26 19:37:04 VCS NOTICE V-16-1-10300 Initiating Offline of Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) on System mapibm625

    So you wouldn't see it offfline when you ran your commands as VCS brought it down within 1second of seeing it online.

    So your issue is either:

    1. Someone brought the app online outside of VCS control within the 5 mins prior to 19:37:03
    2. The App_saposcol  resource is incorrectly reporting the resource is online - this is unlikely if you are using SAP agent, but if you are using Application agent then if you are using MonitorProcesses, then you should check the process is unique (for example the process is not the same as a test instance the app team maybe starting) or if using MonitorProgram, then you should check the script you are using.

    Mike

  • When i gone with hares -display it shows,

     

    App_saposcol        ConfidenceLevel       mapibm625  0
    App_saposcol        ConfidenceLevel       mapibm626  100
    App_saposcol        ConfidenceMsg         mapibm625
    App_saposcol        ConfidenceMsg         mapibm626
    App_saposcol        Flags                 mapibm625
    App_saposcol        Flags                 mapibm626
    App_saposcol        IState                mapibm625  not waiting
    App_saposcol        IState                mapibm626  not waiting
    App_saposcol        MonitorMethod         mapibm625  IMF
    App_saposcol        MonitorMethod         mapibm626  Traditional
    App_saposcol        Probed                mapibm625  1
    App_saposcol        Probed                mapibm626  1
    App_saposcol        Start                 mapibm625  0
    App_saposcol        Start                 mapibm626  1
    App_saposcol        State                 mapibm625  OFFLINE
    App_saposcol        State                 mapibm626  ONLINE
    App_saposcol        CleanProgram          global     /opt/VRTSvcs/bin/Saposcol/clean
    App_saposcol        ComputeStats          global     0
    App_saposcol        ContainerInfo         global     Type               Name            Enabled
    App_saposcol        EnvFile               global
    App_saposcol        MonitorProcesses      global
    App_saposcol        MonitorProgram        global     /opt/VRTSvcs/bin/Saposcol/monitor
    App_saposcol        PidFiles              global
    App_saposcol        ResContainerInfo      global     Type               Name            Enabled
    App_saposcol        ResourceInfo          global     State      Stale   Msg             TS
    App_saposcol        ResourceRecipients    global
    App_saposcol        StartProgram          global     /opt/VRTSvcs/bin/Saposcol/online
    App_saposcol        StopProgram           global     /opt/VRTSvcs/bin/Saposcol/offline
    App_saposcol        TriggerPath           global
    App_saposcol        TriggerResRestart     global     0
    App_saposcol        TriggerResStateChange global     0
    App_saposcol        TriggersEnabled       global
    App_saposcol        UseSUDash             global     0
    App_saposcol        User                  global     root
    App_saposcol        MonitorTimeStats      mapibm625  Avg        0       TS
    App_saposcol        MonitorTimeStats      mapibm626  Avg        0       TS

    =========================================================

     

    I believe it has Moniter program configured,so what you suggests, what would be our next action plan to resolve this?

  • You need to debug /opt/VRTSvcs/bin/Saposcol/monitor (this is a custom script - not one provided by Symantec) to find out why it is reporting your application is online when you believe it is not online.

    If your application is SAP, then you should look to see if you can use the Symantec SAP agents you can download from https://sort.symantec.com/agents

    Mike

  • Its like this,

    SU=/sbin/su
    PS=/usr/bin/ps
    SH=/usr/bin/sh
    GREP=/usr/bin/grep
    PGREP="/usr/bin/ps -ef | ${GREP}"
    ECHO=/usr/bin/echo
    SLEEP=/usr/bin/sleep
    HALOG="/opt/VRTSvcs/bin/halog -add"
    #

    #
    # This script is used by the application agent. It is done this way because
    # the "MonitorProcesses" function looks for the exact string used to start it,
    # unlike pgrep -f which can use an exact match of a partial string.
    # It is preferred that VCS start it with the full path. If  it is started
    # manually without the full path, VCS will not detect that it is running.
    #

    CMD=`/usr/bin/ps -ef | ${GREP} "saposcol" | ${GREP} -v grep | ${GREP} -v App_saposcol`
    if [ "${CMD}" ]
    then
            exit 110                # online (should be 110)
    fi
    exit 100                        # offline (should be 100)

    =======================================================

    I am unable to judge with this 110 or 100 value.

     

    Wha does this command do,

    CMD=`/usr/bin/ps -ef | ${GREP} "saposcol" | ${GREP} -v grep | ${GREP} -v App_saposcol`

    Is that relates with the issue?

  • The custom script you have shown is not written well so this is your issue.  It is matching saposcol which is not specific enough as for instance if someone ran "vi saposcol" (or ANY file with saposcol somewhere in the filename), then this would be matched by the script and report to VCS that it was online.

    You should use MonitorProcesses if you can:

    run ps -ef   | grep saposcol

    Copy the whole command in the CMD column output of VCS for the saposcol process (not the App_saposcol process if this exists) and use this for MoniorProcesses.  If you are unsure how to do this - please post the output of "ps -ef | grep saposcol" from the LIVE system.

    Mike

  • In The second node mapibm626, when i tried with 

    [root@mapibm626]:  # ps -ef |grep saposcol
      gtpadm 13893680        1   0   Dec 21      -  8:48 /usr/sap/hostctrl/exe/saposcol

    as  saposcol shows online in mapibm626 and shows no o/p in mapibm625.                                                               

     

  • For a quick fix change CMD line to:

    CMD=`/usr/bin/ps -ef | ${GREP} " /usr/sap/hostctrl/exe/saposcol$"`

    This is a more specific match so should not cause the inactive node to incorrectly match other process that contain the string saposcol.  Before you make this change I would freeze the group containing resource in case you make a mistake when you edit script and once you have confirmed that monitor is successful and resource stays online you can then unfreeze the group.

    A better solution would be to set MonitorProcesses to "/usr/sap/hostctrl/exe/saposcol", but you would also have to change "User" attribute to gtpadm as the Application agent matches the user as well and this would mean you would probably need to change online, offline and clean in /opt/VRTSvcs/bin/Saposcol so they can be run by user gtpadm as currently as they run as root, they probably do an su to gtpadm.

    Mike