Service group concurrency violation
Hi Team,
We have alerts of concurrency violation, we have two servers in cluster mapibm625, mapibm626
Logs are,
2014/12/26 19:37:03 VCS INFO V-16-1-10299 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is online on mapibm625 (Not initiated by VCS)
2014/12/26 19:37:03 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group sapgtsprd
2014/12/26 19:37:03 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group sapgtsprd on all nodes
2014/12/26 19:37:04 VCS WARNING V-16-6-15034 (mapibm625) violation:Offlining group sapgtsprd on system mapibm625
2014/12/26 19:37:04 VCS INFO V-16-1-50135 User root fired command: hagrp -offline sapgtsprd mapibm625 from localhost
2014/12/26 19:37:04 VCS NOTICE V-16-1-10167 Initiating manual offline of group sapgtsprd on system mapibm625
2014/12/26 19:37:04 VCS NOTICE V-16-1-10300 Initiating Offline of Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) on System mapibm625
2014/12/26 19:37:04 VCS INFO V-16-6-15002 (mapibm625) hatrigger:hatrigger executed /opt/VRTSvcs/bin/internal_triggers/violation mapibm625 sapgtsprd successfully
2014/12/26 19:37:04 VCS INFO V-16-10011-306 (mapibm625) Application:App_saposcol:offline:Execution of Stop Program (/opt/VRTSvcs/bin/Saposcol/offline) returned (0).
2014/12/26 19:37:05 VCS INFO V-16-2-13716 (mapibm625) Resource(App_saposcol): Output of the completed operation (offline) ==============================================
2014/12/26 19:37:06 VCS INFO V-16-1-10305 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is offline on mapibm625 (VCS initiated)
2014/12/26 19:37:06 VCS NOTICE V-16-1-10446 Group sapgtsprd is offline on system mapibm625
========================================================================================
I have asked the application team to look out as whether they are working on the servers because the resource is of SAP( Resource App_saposcol)
However, application team has replied that they are not working on it and might the App_saposcol is online on both of servers which causes the issue.
Then, I have checked the status of resources in both the servers and it says,
[root@mapibm626]: # hares -state
#Resource Attribute System Value
App_saposcol State mapibm625 OFFLINE
App_saposcol State mapibm626 ONLINE
[root@mapibm625]: # hares -state
#Resource Attribute System Value
App_saposcol State mapibm625 OFFLINE
App_saposcol State mapibm626 ONLINE
and also checked the current logs of the server however found only,
2014/12/27 13:03:42 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/27 17:03:43 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/27 21:03:44 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 01:03:45 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 05:03:46 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 09:03:47 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 10:56:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 61%
2014/12/28 11:26:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 61%
2014/12/28 13:03:48 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 14:26:14 VCS INFO V-16-1-50086 CPU usage on mapibm625 is 60%
2014/12/28 17:03:49 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/28 21:03:50 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 01:03:51 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 05:03:52 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 09:03:53 VCS INFO V-16-1-53504 VCS Engine Alive message!!
2014/12/29 13:03:55 VCS INFO V-16-1-53504 VCS Engine Alive message!!
==========================================================================
Please assist what could be the possible reasons for this and in future how to avoid this?
Thanks,
Allaboutunix
For a quick fix change CMD line to:
CMD=`/usr/bin/ps -ef | ${GREP} " /usr/sap/hostctrl/exe/saposcol$"`
This is a more specific match so should not cause the inactive node to incorrectly match other process that contain the string saposcol. Before you make this change I would freeze the group containing resource in case you make a mistake when you edit script and once you have confirmed that monitor is successful and resource stays online you can then unfreeze the group.
A better solution would be to set MonitorProcesses to "/usr/sap/hostctrl/exe/saposcol", but you would also have to change "User" attribute to gtpadm as the Application agent matches the user as well and this would mean you would probably need to change online, offline and clean in /opt/VRTSvcs/bin/Saposcol so they can be run by user gtpadm as currently as they run as root, they probably do an su to gtpadm.
Mike