cancel
Showing results for 
Search instead for 
Did you mean: 

Application agent falsely detect NetWorker process as offline even when the process is running properly

omiot
Level 3
Partner Accredited

Hi,

I've problem with NetWorker in VCS cluster. VCS kill process and restart it on second node. I've truned on debug in my Application log and I can see that monitor process return state:Offline.

 

2013/01/15 14:33:18 VCS DBG_2 V-16-50-0 Application:nw_server:monitor:Command prepared for getting pid is </bin/ps --cols=100000 --User=root -o pid,args | /bin/egrep '/usr/sbin/nsrd -k clusterFQDN\.domain\.com' | /bin/egrep -v /bin/grep | /usr/bin/tr -s " " " " | /bin/sed -e 's/^ //' | /bin/cut -f1 -d" ">.
        Application.C:processExists[583]
2013/01/15 14:33:19 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:Process:/usr/sbin/nsrd -k clusterFQDN.domain.com; return state: Offline.
        Application.C:application_monitor[300]
2013/01/15 14:38:09 VCS DBG_1 V-16-50-0 Application:nw_server:monitor:UseSUDash:<0>.
        Application.C:application_monitor[163]
2013/01/15 14:38:09 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:User Shell is other than csh, returning 0
        Application.C:getuserinfo[1198]
2013/01/15 14:38:19 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:MonitorProgram returned state:110.
        Application.C:monitorState[920]
2013/01/15 14:38:19 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:return state:STATE_TRUE
        Application.C:monitorState[974]
2013/01/15 14:38:19 VCS DBG_1 V-16-50-0 Application:nw_server:monitor:Total number of Pid Files specified:0.
        Application.C:application_monitor[231]
2013/01/15 14:38:19 VCS DBG_1 V-16-50-0 Application:nw_server:monitor:Total number of Processes specified:<1>.
        Application.C:application_monitor[272]
2013/01/15 14:38:19 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:Process:</usr/sbin/nsrd -k clusterFQDN.domain.com>; User:<root>.
        Application.C:processExists[479]
2013/01/15 14:38:19 VCS DBG_2 V-16-50-0 Application:nw_server:monitor:Command prepared for getting pid is </bin/ps --cols=100000 --User=root -o pid,args | /bin/egrep '/usr/sbin/nsrd -k clusterFQDN\.domain\.com' | /bin/egrep -v /bin/grep | /usr/bin/tr -s " " " " | /bin/sed -e 's/^ //' | /bin/cut -f1 -d" ">.
        Application.C:processExists[583]
2013/01/15 14:38:20 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:Process:/usr/sbin/nsrd -k clusterFQDN.domain.com; return state: Offline.
 
I'm using Storage Foundation for HA ver 5.1 SP1 RP3 on RHEL 5.5.
 
Regards
Pawel
3 REPLIES 3

Marianne
Level 6
Partner    VIP    Accredited Certified

Please post main.cf section for this service group.

omiot
Level 3
Partner Accredited

Hi,

Thanks for your replay. In attachment I put a piece of my main.cf.

 

Regards.

Pawel

Marianne
Level 6
Partner    VIP    Accredited Certified

Please double-check your documentation for the MonitorProcess:

MonitorProcesses = { "/usr/sbin/nsrd -k clusterFQDN" }

should clusterFQDN possibibly the Virtual hostname? 

What does 'ps -ef |grep nsrd' show?