cancel
Showing results for 
Search instead for 
Did you mean: 

problem with VRTS ONG resource

bonny6
Level 3

Ive got a problem . i have 2 servers with VRTS cluster

altought VRTS ONG process is online,im getting this message very often

 MIG_MPM_1a mpm1a (Veritas_Cluster_Server): ONG (ONG): Resource state is unknown

and also this message

 VCS INFO V-16-2-13001 (mpm1a) Resource(ONG): Output of the completed operation (monitor)
/opt/VRTSvcs/bin/ONG/monitor: test: unknown operator 2300

 i have compared both configuration files and nothing missing .

what could be the problem here ?

Thanks,

25 REPLIES 25

g_lee
Level 6
bonny,

Glad to have helped :)

If you're getting the "Resource state is unknown" error messages at the same time as the "test: unknown operator" messages?

If so, this would be due to the reason Gaurav mentioned in his first reply ie: "This error means that the monitor procedure did not receive a status code of range 100-110" (which is due to the problem with the monitor script)

If you are seeing the error at different times/separately, please provide an extract of the log from that time (ie: the messages leading up to the time you saw the error) for further investigation.

regards,
Grace

bonny6
Level 3


Hi All,

 after Grace soultion i  added  to monitor script a check to capture the two PID in realtime ,

i need you help here , as you see in the new monitor script i added line that will retrive the PID to a file ,

 i'd also print the process to screen to see what is this PID , i thought that in the file i will see also the name of the  new PID 

as you see in the output of the PID_ONG file  i capture the extra PID 25773 .
also i added the log of VRTS  and indeed saw that PID 25773 is unkown operator ,

my problem is why,it didnt print the name of the proccess like the two others , ive have now 3 PID but only two shown in the file ,

why this is happened ?

--------------------------------------------------------------------
Sat Aug 14 20:07:41 GMT 2010
this is ong_agent PID : 2300, this is ong_alerter PID 2389
  oracle  2300     1  0   Jul 15 ?       21:03 ./ong_agent
  oracle  2389     1  0   Jul 15 ?       10:10 ./ong_alerter
Sat Aug 14 20:07:51 GMT 2010
this is ong_agent PID : 2300, this is ong_alerter PID 2389
  oracle  2300     1  0   Jul 15 ?       21:03 ./ong_agent
  oracle  2389     1  0   Jul 15 ?       10:10 ./ong_alerter
Sat Aug 14 20:08:01 GMT 2010
this is ong_agent PID : 2300
25773
, this is ong_alerter PID 2389
  oracle  2300     1  0   Jul 15 ?       21:03 ./ong_agent
  oracle  2389     1  0   Jul 15 ?       10:10 ./ong_alerter
------------------------------------------------------------------------------

2010/08/14 20:08:02 VCS INFO V-16-2-13001 (mpm1a) Resource(ONG): Output of the completed operation
(monitor)
/opt/VRTSvcs/bin/ONG/monitor: test: unknown operator 25773
2010/08/14 20:08:12 VCS INFO V-16-2-13001 (mpm1a) Resource(ONG): Output of the completed operation
(monitor)
110
 

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi Again,

so one thing for sure that its ong_agent which is getting multiple PIDs .... as it is coming into $process1 variable....

well to be honest, its little difficult to tell on why it would be generating multiple PIDs.... you will need to check from application side only.... is there any temporary process that gets generated ?

As a workaround, it would be simpler to use "wc -l" alongwith ps command as I have indicated previously as well.... The benefit here would be, if you put a if loop saying, if number of process is less then 1, then exit 100, else 110....  so even if multiple entries are returned, wc -l will count to two... & that will suffice your if loop to exit 110...

Gaurav


g_lee
Level 6
The extra PID is for ong_agent - the reason it's being formatted like that is because process1 consists of multiple lines/contains newline characters, so if you quote the variable it will span multiple lines.

using same httpd example:
# ps -ef |grep http
   juser 26974  6924   0 10:40:10 ?           0:00 /usr/local/apache2/bin/httpd -k start
    root  4175  4173   0 11:14:25 pts/7       0:00 grep http
   juser 22486  6924   0 10:18:14 ?           0:00 /usr/local/apache2/bin/httpd -k start
   juser 21948  6924   0 10:16:32 ?           0:00 /usr/local/apache2/bin/httpd -k start
   juser  3087  6924   0 11:07:53 ?           0:00 /usr/local/apache2/bin/httpd -k start
    root  6924  6589   0   Aug 12 ?           0:07 /usr/local/apache2/bin/httpd -k start
   juser 17484  6924   0 09:56:17 ?           0:01 /usr/local/apache2/bin/httpd -k start
   juser 29092  6924   0 10:50:17 ?           0:00 /usr/local/apache2/bin/httpd -k start
   juser 17483  6924   0 09:55:06 ?           0:01 /usr/local/apache2/bin/httpd -k start
# process1=`ps -ef | grep httpd | grep -v grep | awk '{ print $2 }'`
# echo "$process1"
26974
22486
21948
3087
6924
17484
29092
17483
# process2=`ps -ef | grep lpsched | grep -v grep | awk '{print $2 }'`
# echo "$process2"
6858
# echo "this is httpd PID : $process1, this is lpsched PID $process2"
this is httpd PID : 26974
22486
21948
3087
6924
17484
29092
17483, this is lpsched PID 6858

So from your output, the extra process being picked up (25773) is for ong_agent

If you really want all the process IDs on one line, you can remove the quotes from the variables, ie:
# echo "this is httpd PID : "${process1}", this is lpsched PID "${process2}
this is httpd PID : 26974 22486 21948 3087 6924 17484 29092 17483, this is lpsched PID 6858

Note this will still only show you the PIDs, it won't tell you what's running/what the extra process is with the name ong_agent that the script is picking up - if you want to check this you need to perform additional checks/rewrite the script so to echo the whole line somewhere, not just the 2nd (PID) field.

Hope that helps,
Grace

bonny6
Level 3

Hi Grace ,

let me understand , if i want the PID will print in one line i need to change in script to
# echo "this is ong_agent PID : "$process1", this is ong_alerter PID "$process2"    ?


and second of all , my problem was that if there is 3 PID  why its prints only those two lines  ?
  oracle  2300     1  0   Jul 15 ?       21:03 ./ong_agent
  oracle  2389     1  0   Jul 15 ?       10:10 ./ong_alerter

shouldnt  i see 3 lines ?


and Gaurav yes im still investagting  why this is hapning in first place , i  am working on change the script right now ,

BR,

g_lee
Level 6
It's possible that by the time it goes to run the 2nd ps -ef, the second process has finished/stopped (this is likely since you said this problem was intermittent)

To troubleshoot, you need to use the same output, however this requires greater modification to the script.

You can try replacing the section above the "if" statement with below, however I would recommend backing up the original (vendor) script first so you can revert easily once you resolve the problem with multiple processes.

--------------------
# Starting and Monitoring of the ODM processes

proc1=`ps -ef | grep '/'ong_agent | grep -v grep`
proc2=`ps -ef | grep '/'ong_alerter | grep -v grep`

process1=`echo "${proc1}" | awk '{print $2 }'`
process2=`echo "${proc2}" | awk '{print $2 }'`

echo `date` >> /tmp/PID_ONG.txt
echo "this is ong_agent PID : "${process1}", this is ong_alerter PID "${process2} >> /tmp/PID_ONG.txt
echo "${proc1}" >> /tmp/PID_ONG.txt
echo "${proc2}" >> /tmp/PID_ONG.txt
--------------------

Note: you MUST include the double quotes with the echo to preserve the newlines.

Also, for this line
echo "this is ong_agent PID : "${process1}", this is ong_alerter PID "${process2} >> /tmp/PID_ONG.txt
there is no double quote after ${process2} as double quotes were closed before printing this var