Solved: Wally is probably right in

sclind · ‎12-13-2011

I have a process resource on my system that is operating fine except for the fact that it logs this message in engine_A.log every five minutes:

VCS ERROR V-16-10031-9512 (inactive_nodename) Process:[myprocess]:monitor:Process[myprocess] does not exist.

Why would VCS even be looking for the process on the inactive node?

Here is the status of the resource:

                myprocess               active_node     ONLINE
                myprocess               inactive_node OFFLINE
                myprocess               inactive_node        |STATE UNKNOWN|

mikebounds · ‎12-16-2011

As in earlier post, if there is a graceful way to shutdown cdpmg, rather than just killing process, then use Application agent, else I would advice logging a Support call with Symantec.

Mike

View solution in original post

mikebounds · ‎12-13-2011

VCS monitors, by default, every 5 mins on inactive nodes (determined by OfflineMonitorInterval type attribute) and every minute (determined by MonitorInterval type attribute).

But if a Monitor returns offline it should not normally log a message - do you have LogDbg attribute set on Process Type? I would guess you don't as it looks as though severity of message is ERROR which means the message would be sent even if LogDbg was not set.

As State is unknown this suggests more than VCS just can't find process, but there is some error determining this. Could you post extract from main.cf showing process resource and any other log messages in the engine_A.log AND Process_A.log relating to this resource at around the same time. If there are no other message it maybe worth setting DBG_AGDEBUG and DBG_AGTRACE on LogDbg for Process type:

hatype -modify Process LogDbg DBG_AGDEBUG DBG_AGTRACE

Mike

Marianne · ‎12-13-2011

"Why would VCS even be looking for the process on the inactive node?"

Because it needs to failover to the inactive node should the need arise.

VCS needs to know the state of all resources before a node can be considered as failover target. It needs to know at all times that a failover target is available.

Please verify resource attributes on the inactive node.

Handy NetBackup Links

sclind · ‎12-14-2011

Mike - here is the section from main.cf:

        Process CDproc (
                Critical = 0
                PathName = "/connect_direct/ndm/bin/cdpmgr"
                Arguments = "/connect_direct/ndm/bin/cdpmgr -i /connect_direct/ndm/cfg/aucdirect/initparm.cfg"
                )

The only output in engine_A.log is this every five minutes:

2011/12/14 08:40:25 VCS ERROR V-16-10031-9512 (pvpr1o12) Process:CDproc:monitor:Process(/connect_direct/ndm/bin/cdpmgr) does not exist.
2011/12/14 08:45:24 VCS ERROR V-16-10031-9512 (pvpr1o12) Process:CDproc:monitor:Process(/connect_direct/ndm/bin/cdpmgr) does not exist.
2011/12/14 08:50:25 VCS ERROR V-16-10031-9512 (pvpr1o12) Process:CDproc:monitor:Process(/connect_direct/ndm/bin/cdpmgr) does not exist.
2011/12/14 08:55:24 VCS ERROR V-16-10031-9512 (pvpr1o12) Process:CDproc:monitor:Process(/connect_direct/ndm/bin/cdpmgr) does not exist.

There are no recent messages in Process_A.log (none since Dec 2nd).

sclind · ‎12-14-2011

Marianne - it certainly makes sense that VCS knows the status of the inactive node. But why issue messages that the process resource was not found if VCS is not going to do the same for the DG, NIC, mounts and volumes?

I am pretty new to this; what would you like to see regarding this:

"Please verify resource attributes on the inactive node."

Wally_Heim · ‎12-14-2011

Hi sclind,

The error message is teilling you that it cannot find the path "/connect_direct/ndm/bin/cdpmgr". It is trying to detemine if the Process agent is running but it cannot find the path to the scripts/process that you provided. As a result it cannot validate the state of the resource.

If I had to guess, the path "/connect_direct/ndm/bin/cdpmgr" is being mounted in your service gorup and does not exist on the passive node. VCS is expecting that the binary path is available on all nodes and the data files are being moved during failover.

The NIC, DG, mount and volume resources have all of their binaries needed for the resource to perform online, offline and monitor entry points installed on all nodes.

The process agent has its binaries on all nodes too but you have configured it to run additional binaries from the "/connect_direct/ndm/bin/cdpmgr" path.

Thanks,

Wally

sclind · ‎12-14-2011

Wally - "/connect_direct/ndm/bin/cdpmgr" is the complete path and program name. But why would VCS even be looking for the process on the passive/inactive node? By definition it's not running there.

The base problem we have here is we monitor for any line that has "ERROR" in it (as opposed to INFO, CRITICAL, WARNING, etc). With the exception of this message all other lines I've ever seen that say "ERROR" are *real* errors. It just seems odd to me that VCS writes a "ERROR" level message to the log for what does not seem to really be an error.

Wally_Heim · ‎12-14-2011

Hi sclind,

VCS is checking if the process is running on the passive node or not. As Marianne mentioined, if it does not check the resource state periodically then VCS cannot be sure of the state on the passive node and the resource would be marked as being in an "UNKNOWN" state and would not allow failover to that node.

I'm a Windows guy, so I can just give you basica guidence as to what the resource is doing on non-Windows platforms. There might be a way to disable the logging of this message but the real fix would be to determine why the agent cannot find the path that it needs.

I would recommend opening a case with Symantec Technical Support so that you can get to an expert that deals with this agent on the OS platform that you are using. They would be able to provide more assistance than I can on your configuration.

Thanks,

Wally

mikebounds · ‎12-14-2011

Wally is probably right in that binaries does not exist on inactive node, but from your initial post it looks as though there are 3 nodes where you resource is:

Online on node 1
Offline on node 2
Unknown on node 3

So if Process binary is on shared storage then I don't know why node 2 is not giving error messages also.

I would consider this to be a bug. I had the same issue with the Informix Agent just over a year ago where I was getting errors on the inactive node as the binaries did not exist as there were on shared storage. I raised a call and the issue was resolved in the next release of the agent. So as Wally says, I would log a Support call.

Alternatively you could use Application resource type - this would require you to write an offline script, but if there is a graceful way to shutdown cdpmgr, like "cdpmgrt stop" or equiv, then this would be better as using Process resource type means VCS just send SIGTERM to process.

Mike

sclind · ‎12-15-2011

Mike - it is true the binaries do not exist on the inactive node; they are in the mounted file system.

There are only two nodes in the cluster; node 1 being onlinee and node 2 being offline. The error messages are issued for the offline node.

sclind · ‎12-15-2011

I see what I wrote can be misleading:

myprocess inactive_node OFFLINE
myprocess inactive_node |STATE UNKNOWN|

"inactive_node" is the one offline node. It's odd that it is both OFFLINE and STATE UNKNOWN.

mikebounds · ‎12-16-2011

As in earlier post, if there is a graceful way to shutdown cdpmg, rather than just killing process, then use Application agent, else I would advice logging a Support call with Symantec.

Mike

sclind · ‎12-16-2011

Mike - thanks for the info. This was all setup by a consultant; I'll have to read up on the differences between a process agent and an application agent. Unfortunately now that this is production opportunities to modify it are extremely limited.

g_lee · ‎12-16-2011

sclind,

What is the output if you run:

# hares -display CDproc -attribute State
# ps -ef |grep cdpmgr ### from both good and bad nodes

From your earlier reply ( https://www-secure.symantec.com/connect/forums/vcs-error-v-16-10031-9512-inactivenodename-processmyprocess-process-does-not-exist#comment-6451691 )

        Process CDproc (
                Critical = 0
                PathName = "/connect_direct/ndm/bin/cdpmgr"
                Arguments = "/connect_direct/ndm/bin/cdpmgr -i /connect_direct/ndm/cfg/aucdirect/initparm.cfg"
                )

The Arguments attribute should be the arguments that are passed to the binary, ie: if your full command is

/connect_direct/ndm/bin/cdpmgr -i /connect_direct/ndm/cfg/aucdirect/initparm.cfg

the value for Arguments should be "-i /connect_direct/ndm/cfg/aucdirect/initparm.cfg" (ie: should not include the pathname)

Note: UNKNOWN means VCS cannot determine the state of the process; ie: the Monitor entry point (which "Checks to see if the process is running by scanning the process table for the name of the executable pathname and argument list") is not getting the expected output. If the Arguments attribute is not configured correctly this may be causing/contributing to the issue.

Also, is this a Solaris system?

From Process online entry point (5.0MP3, but same limitation would apply to 5.1 if Solaris)

--------------------
# If the command + argument length happens to be greater than or equal to 78
# characters we can have a problem. Current implementations of the solaris ps
# utility and proc file structures terminate the command + argument string
# at the 80 characters. This could cause a problem if two process resources
# had the same first 78 characters or more , i.e. say they had very long identical
# paths to execution and similar initial arguments. In such a case there is no
# way for the monitor script to know the true identity of the process being
# executed. Hence the warning message.
#
if ($Cmdlength >= 78){
VCSAG_LOG_MSG ("W", "Command length is greater than 78 characters. This might pose potential monitoring problems.", 9001);
}
exit;
--------------------

VOX

VCS ERROR V-16-10031-9512 (inactive_nodename) Process:<myprocess> Process does not exist