12-23-2014 06:00 AM
Hi ,
we are running VERITAS cluster version 6.1 on Red Hat Enterprise Linux Server release 6.5 (Santiago) .
we configured a resource to monitor an application "pricemaker" , when we configure MonitorProcesses parameter for the process below , hacf is showing no errors , however after running haconf -dump -make ro , the parameter is reset as below :
- main.cf before dump :
================
MonitorProcesses={ "java -server -d64 -verbose:gc -XX:+UseCompressedOops -XX:+PrintGCTimeStamps -XX:+UseSerialGC -ms12g -mx12g -XX:NewSize=1g -XX:MaxNewSize=1g -DlocalMode=true -Djava.library.path=/home/rlx/rlmapp2/RLX/instance1/Product/lib:/home/rlx/rlmapp2/RLX/instance1/Config/lib -Dpricemaker.directory=/home/rlx/rlmapp2/RLX/instance1 -Dpricemaker.stopIfDisabledLookup=true -Ddb.Product= -Ddb.DriverClass= -Ddb.ConnectionString= -Ddb.User= -Ddb.Password= -Ddb.Schema= -classpath /home/rlx/rlmapp2/RLX/instance1/Product/properties:/home/rlx/rlmapp2/RLX/instance1/Product/lib/3rdParty.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/pmutil.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/pricemaker.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/sun/jmxremote_optional.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/sun/jmxremote.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/sun/jmxri.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmexecution-3rdParty.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmexecution.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmpoints.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmpointsinterfaces.jar:/home/rlx/rlmapp2/RLX/data:/home/rlx/rlmapp2/RLX/lib/Utilities/riiutil.jar:/home/rlx/rlmapp2/RLX/lib/Utilities/3rdParty.jar:/home/rlx/rlmapp2/RLX/lib/Repository/embedded.jar:/home/rlx/rlmapp2/RLX/lib/Repository/3rdParty.jar:/home/rlx/rlmapp2/RLX/lib/Catalog/deployment.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmexecution.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib:/home/rlx/rlmapp2/RLX/instance1/Config/lib/support.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/plugins.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rp:/home/rlx/rlmapp2/RLX/master/Product/lib/oracle/10.2.0.5/ojdbc14.jar com.rii.pricemaker.engine.RatingEngine" }
** process name is added as one line in main.cf file .. that's not the problem
- main.cf after dump
==============
MonitorProcesses={ "" }
[root@rlm2 ~]# ps -aef |grep rlx
rlx 527 1 1 16:04 pts/1 00:00:41 java -server -d64 -verbose:gc -XX:+UseCompressedOops -XX:+PrintGCTimeStamps -XX:+UseSerialGC -ms12g -mx12g -XX:NewSize=1g -XX:MaxNewSize=1g -DlocalMode=true -Djava.library.path=/home/rlx/rlmapp2/RLX/instance1/Product/lib:/home/rlx/rlmapp2/RLX/instance1/Config/lib -Dpricemaker.directory=/home/rlx/rlmapp2/RLX/instance1 -Dpricemaker.stopIfDisabledLookup=true -Ddb.Product= -Ddb.DriverClass= -Ddb.ConnectionString= -Ddb.User= -Ddb.Password= -Ddb.Schema= -classpath /home/rlx/rlmapp2/RLX/instance1/Product/properties:/home/rlx/rlmapp2/RLX/instance1/Product/lib/3rdParty.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/pmutil.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/pricemaker.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/sun/jmxremote_optional.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/sun/jmxremote.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/sun/jmxri.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmexecution-3rdParty.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmexecution.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmpoints.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmpointsinterfaces.jar:/home/rlx/rlmapp2/RLX/data:/home/rlx/rlmapp2/RLX/lib/Utilities/riiutil.jar:/home/rlx/rlmapp2/RLX/lib/Utilities/3rdParty.jar:/home/rlx/rlmapp2/RLX/lib/Repository/embedded.jar:/home/rlx/rlmapp2/RLX/lib/Repository/3rdParty.jar:/home/rlx/rlmapp2/RLX/lib/Catalog/deployment.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmexecution.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib:/home/rlx/rlmapp2/RLX/instance1/Config/lib/support.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/plugins.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rp:/home/rlx/rlmapp2/RLX/master/Product/lib/oracle/10.2.0.5/ojdbc14.jar com.rii.pricemaker.engine.RatingEngine
root 4228 29319 0 16:42 pts/0 00:00:00 grep rlx
root 32596 31636 0 16:04 pts/1 00:00:00 su - rlx
rlx 32597 32596 0 16:04 pts/1 00:00:00 –bash
** The probe script is enough for our needs , The problem is if we hashed this MonitorProcesses parameter , V-16-1-10283 is reported .
12-23-2014 01:33 PM
I thought it might be a problem with the attribute being too long, but I tried on my Linux 6.1 cluster running on RHEL 5.5 (in Vbox) and it works fine.
main.cf before start VCS is as yours and after starting VCS, the MonitorProcesses in main.cf get reformatted on to 2 lines:
MonitorProcesses = {
"java -server -d64 -verbose:gc -XX:+UseCompressedOops -XX:+PrintGCTimeStamps -XX:+UseSerialGC -ms12g -mx12g -XX:NewSize=1g -XX:MaxNewSize=1g -DlocalMode=true -Djava.library.path=/home/rlx/rlmapp2/RLX/instance1/Product/lib:/home/rlx/rlmapp2/RLX/instance1/Config/lib -Dpricemaker.directory=/home/rlx/rlmapp2/RLX/instance1 -Dpricemaker.stopIfDisabledLookup=true -Ddb.Product= -Ddb.DriverClass= -Ddb.ConnectionString= -Ddb.User= -Ddb.Password= -Ddb.Schema= -classpath /home/rlx/rlmapp2/RLX/instance1/Product/properties:/home/rlx/rlmapp2/RLX/instance1/Product/lib/3rdParty.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/pmutil.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/pricemaker.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/sun/jmxremote_optional.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/sun/jmxremote.jar:/home/rlx/rlmapp2/RLX/instance1/Product/lib/sun/jmxri.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmexecution-3rdParty.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmexecution.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmpoints.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmpointsinterfaces.jar:/home/rlx/rlmapp2/RLX/data:/home/rlx/rlmapp2/RLX/lib/Utilities/riiutil.jar:/home/rlx/rlmapp2/RLX/lib/Utilities/3rdParty.jar:/home/rlx/rlmapp2/RLX/lib/Repository/embedded.jar:/home/rlx/rlmapp2/RLX/lib/Repository/3rdParty.jar:/home/rlx/rlmapp2/RLX/lib/Catalog/deployment.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rlmexecution.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib:/home/rlx/rlmapp2/RLX/instance1/Config/lib/support.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/plugins.jar:/home/rlx/rlmapp2/RLX/instance1/Config/lib/rp:/home/rlx/rlmapp2/RLX/master/Product/lib/oracle/10.2.0.5/ojdbc14.jar com.rii.pricemaker.engine.RatingEngine" }
I then tried opening config, and closing and main.cf does not change. Also made a change to config and dumped and MonitorProcesses stills looks fine. Also tried setting MonitorProcesses using hares and this works fine too.
What are the results if you run:
hares -display app_res_name -attribute MonitorProcesses
before and after seeing issue.
Also are there any errrors in engine log.
If you can't get to work, then using MonitorProgram or PidFiles is a work-a-round instead of using Monitorprogram.
Mike
01-07-2015 07:03 PM
try to shorten the monitorprocess , and test again.
01-08-2015 02:01 AM
Firstly, as you describe the behaviour of VCS somehow mangling the setting of the MonitorProcesses attribute to:
MonitorProcesses={ "" }
after you do a dump of the configuration:
haconf -dump makero
Well, that is a big bad ugly bug that you should report and escalate to Tech Support.
Secondly, as Mike suggests, use MonitorProgram instead of MonitorProcesses to avoid the bug. Create a simple script that will allow you to reliably match the unique process via appropriate egrep regular expression-match, such as:
procUSER="$( hares -value pricemakerResourceName User )" [[ -n $procUSER ]] && procUSER="root" /bin/ps -u $procUSER -wwo args | egrep '^java .*jar com.rii.pricemaker.engine.RatingEngine$' if [[ $? == 0 ]] ; then # indicate to VCS ONLINE exit 0 else # indicate to VCS OFFLINE exit 1 fi
Modify the pattern-matching portion until it will reliably AND uniquely matches ONLY the process you are monitoring.
Double-check what the appropriate exit code should be for your version of VCS and Application agent documentation...
This method will both avoid the bug you are witnessing and it will provide you a much more resonable main.cf file, as you will be avoiding using such a large MonitorProcesses definition.
2015-01-08: Updated and fixed above code snippet to work properly on Linux....