cancel
Showing results for 
Search instead for 
Did you mean: 

Application MonitorProcess failing

dbsanders
Level 3

VCS 5.0MP3, Solaris 10

If I start the application from VCS, it works fine and the resource shows online. If the resource is stopped/started from the command line, it fails the resource and it doesn't come back online in VCS.

I know what you're thinking, the ps output is different between the VCS start and the command line start - not the case- they are the same.

Here is the current output of "ps -u pmadmin -o args":

sax88d-z1# ps -u pmadmin -o args
COMMAND
/users/pmserver8/java/bin/java -Djava.awt.headless=true -Duser.dir=/users/pmser
/users/pmserver8/server/bin/pmrepagent - Y3JtY19xYQ== Y19yZXBvX3Fh Y3JtY19xYV9o
/users/pmserver8/server/bin/pmserver Y3JtY19xYQ== YXNjaWlfcGNfc2Vydg== Y3JtY19x
/users/pmserver8/server/bin/pmrepagent - Y3JtY19xYQ== Y19yZXBvX3N0Zw== Y3JtY19x
/users/pmserver8/server/bin/pmserver Y3JtY19xYQ== dW5pY29kZV9wY19zZXJ2 Y3JtY19x

And here is the resource definition.

        Application PWRCTR-ITE1-SAX88D-PWRCTR-APP (
                Critical = 0
                User = pmadmin
                StartProgram = "/users/cgpowercenter/bin/RUNSERVER"
                StopProgram = "/users/cgpowercenter/bin/SHUTSERVER"
                MonitorProcesses = {
                         "/users/pmserver8/java/bin/java -Djava.awt.headless=true -Duser.dir=/users/pmser" }
                ContainerName = sax88d-z1
                OnlineTimeout = 600
                )

No errors in the engine log - why can't I get this app online?

Bret

5 REPLIES 5

dbsanders
Level 3

Update - I just tried to restart the cluster with "hastop -all -force" and then hastart, but no success. The application shows "offline" still. (The application is still running with the same ps output as what's defined in MonitorProcess)

Kyle_Gleed
Not applicable
Employee

Hi,

How are you starting the app from the command line? Are you running "/users/..../RUNSERVER" as root or pmadmin? 

VCS executes the command "su - pmadmin -c /users/.../RUNSERVER". 

Turn on debuggin for the Application Agent:

/opt/VRTSvcs/bin/haconf -makerw
/opt/VRTSvcs/bin/hatype -modify Application LogDbg DBG_AGDEBUG
/opt/VRTSvcs/bin/haconf -makero 

With debugging enabled look at the application and engine logs in /var/VRTSvcs/log/ to see if there is anything that points to the problem.

Be sure to turn debugging of when you are done :)

/opt/VRTSvcs/bin/haconf -makerw
/opt/VRTSvcs/bin/hatype -modify Application LogDbg -delete DBG_AGDEBUG
/opt/VRTSvcs/bin/haconf -makero 

Eric_Gao
Level 4

from global zone,

zlogin sax88d-z1  'ps -u pmadmin -o args| grep "/users/pmserver8/java/bin/java -Djava.awt.headless=true -Duser.dir=/users/pmser"' && exit 110

echo $?

 

 

check the return value, if it returns 110, the application configuration is not a problem.

 

Also check the agent itself, like file size etc..,  I got the same problem on 5.0, after applied MP3, the problem is resolved.  The case is that the application from 5.0 is unable to handle the application running from local zone very well.

 

though you had applied MP3,  your application agent might be still with 5.0

 

 

slave:/opt/VRTSvcs/bin/Application #ls -ltr
total 202
-rwxr--r--   1 root     sys         1970 Jan 25  2006 online
-rwxr--r--   1 root     sys         2095 Jan 25  2006 offline
-rwxr--r--   1 root     sys         6417 Apr 21  2006 ApplicationDiscovery.pl
-rwxr--r--   1 root     sys        67240 May 31  2006 ApplicationAgent
-rwxr--r--   1 root     sys        13440 May 31  2006 ApplicationDiscover.so
-rw-r--r--   1 root     sys         7973 May 31  2006 Application.xml
drwxrwxr-x   2 root     sys          512 Jan 14 16:12 actions
slave:/opt/VRTSvcs/bin/Application #
 

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hello,

I would think of simple test here quite matching to one explained above:

1. Start the application manually via command line.

2. Run the monitor command manually (mentioned in MonitorProcess) & notice the exit code with "echo $?"

 

If 110 is returned, its fine for VCS & should report online, if anything other than 110 is reported then its concern for VCS,

 

If its anything other than 110, (I guess should be 100/99), then I would think to see MonitorProgram in detail & see which loop does Monitor fails.

 

Gaurav

dbsanders
Level 3
Just to follow up. It seems some files get created when resources come online, and they were not in my case, in /opt/VRTSvcs/bin/Application.

The file should be named .<resname> (dot-resourcename) and should just be an empty file. Once I touched these files and probed each resource, they magically came online.

Don't know what's supposed to create these files (application agent? something else?), or when (the first time they run? Every time they come online?). Don't know why files were missing, but touching the files is a simple workaround for now.