We have a VCS 5.1 cluster running in Solaris 10 zones (SPARC) monitoring our application. Our application uses 3 new custom types.
We are using very simple scripts with the VRTS ScriptAgent. The monitor script effectively calls "ps -ef | grep <process>" and based on this result, will return 110 or 110. For some reason, this command is saying the process is down. We are looking at the process ID in another window and do not see the PID change. "monitor" will return "up" for 10-15 straight times, then suddently return "down".
I did some Googling an it appears that "ps" will hang on occasion in Solaris 10., but this is different. Our "ps" command doesn't hang, it will just not return a process. The same goes for ps's cousins pgrep, pkill, and /usr/ucb/ps.
Has anyone else run into this before? Is there a more accurate way to determine if a process is up? It's a custom Java applicaton.
I don't think VCS is related to the "ps" issues, but just wanted to reach out and see if someone else ran into it, and found a work-around. I am going to explore testing port-connectivity as this app listens on a TCP port when up.
Thanks,
dufftime