Forum Discussion

stucci's avatar
Level 6
9 years ago

Veritas Cluster does not start

Hi all guys,

we does some changes (OS Unix support) in cluster configuration of our Netbackup Master server and now I am not able to start NBU services.

I am not skilled in Veritas Cluster and I ask you some indication to do troubleshooting.

Some informations:

nbu master HARDWARE LINUX_RH_X86
                   VERSION NetBackup

I found this clear post

from /var/VRTSvcs/logs/engine_A.log:

[root@sglmop21 /var/VRTSvcs/log]# tail -f engine_A.log
2016/02/17 09:09:27 VCS INFO V-16-1-10298 Resource nbu_server (Owner: unknown, Group: nbu_group) is online on sglmop21 (VCS initiated)
2016/02/17 09:09:27 VCS NOTICE V-16-1-10447 Group nbu_group is online on system sglmop21
2016/02/17 09:10:27 VCS INFO V-16-2-13716 (sglmop21) Resource(nbu_server): Output of the completed operation (monitor) 
Some Processes are DOWN while others are UP
Following Process are found DOWN: nbemm nbrb 
Following Process are found UP: nbevtmgr nbstserv vmd bprd bpdbm nbpem nbjm nbaudit nbsl nbrmms nbdisco NB_dbsrv 

2016/02/17 09:10:27 VCS ERROR V-16-2-13067 (sglmop21) Agent is calling clean for resource(nbu_server) because the resource became OFFLINE unexpectedly, on its own.
2016/02/17 09:14:04 VCS INFO V-16-2-13716 (sglmop21) Resource(nbu_server): Output of the completed operation (clean) 

Looking for NetBackup processes that need to be terminated.
Stopping nbcssc...
Stopping nbvault...
Stopping nbars...
Stopping nbim...
Stopping nbsl...
Stopping nbrmms...
Stopping nbstserv...
Stopping nbpem...
Stopping nbjm...
Suspending or cancelling selective jobs...
Stopping bprd...
Stopping bpcompatd...
Stopping bpdbm...

The following processes are still active
root     27114     1  0 09:09 ?        00:00:00 /usr/openv/netbackup/bin/nbstserv
root     27124 27114  0 09:09 ?        00:00:00 /usr/openv/netbackup/bin/bpdbm -upgrade_images
 They will be terminated....
Killing remaining processes...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...

Looking for Media Manager processes that need to be terminated.
Stopping ltid...
Error in Receiving User Message
Stopping vmd...

Looking for more NetBackup processes that need to be terminated.

The following processes are still active
root     27114     1  0 09:09 ?        00:00:00 /usr/openv/netbackup/bin/nbstserv
root     29183 27114  0 09:11 ?        00:00:00 /usr/openv/netbackup/bin/bpdbm -upgrade_images
 They will be terminated....
Killing remaining processes...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Waiting for processes to terminate...
Timeout for bp.kill_all command to complete.

2016/02/17 09:14:04 VCS ERROR V-16-2-13069 (sglmop21) Resource(nbu_server) - clean failed.
2016/02/17 09:15:05 VCS INFO V-16-2-13716 (sglmop21) Resource(nbu_server): Output of the completed operation (monitor) 
Some Processes are DOWN while others are UP
Following Process are found DOWN: nbevtmgr vmd bprd bpdbm nbpem nbjm nbaudit nbsl nbrmms nbdisco nbemm nbrb NB_dbsrv 
Following Process are found UP: nbstserv 

from /usr/openv/netbackup/bin/cluster/AGENT_DEBUG.log

Wed Feb 17 09:09:12 2016 vcs/ Calling: /usr/openv/netbackup/bin/cluster/util/online /usr/openv/netbackup/bin/cluster/NBU_RSP

Wed Feb 17 09:10:27 2016 Clean: Calling: /usr/openv/netbackup/bin/cluster/util/offline /usr/openv/netbackup/bin/cluster/NBU_RSP
Wed Feb 17 09:10:27 2016 Clean: /usr/openv/netbackup/bin/cluster/util/offline returned with code = 25344Wed Feb 17 09:10:27 2016 Clean
: /usr/openv/netbackup/bin/cluster/util/offline /usr/openv/netbackup/bin/cluster/NBU_RSP exited with 99
Wed Feb 17 09:15:05 2016 Clean: Calling: /usr/openv/netbackup/bin/cluster/util/offline /usr/openv/netbackup/bin/cluster/NBU_RSP
Wed Feb 17 09:15:05 2016 Clean: /usr/openv/netbackup/bin/cluster/util/offline returned with code = 0Wed Feb 17 09:15:05 2016 Clean: /u
sr/openv/netbackup/bin/cluster/util/offline /usr/openv/netbackup/bin/cluster/NBU_RSP exited with 0
Wed Feb 17 09:16:42 2016 vcs/ Calling: /usr/openv/netbackup/bin/cluster/util/online /usr/openv/netbackup/bin/cluster/NBU_RSP

Wed Feb 17 09:17:58 2016 Clean: Calling: /usr/openv/netbackup/bin/cluster/util/offline /usr/openv/netbackup/bin/cluster/NBU_RSP

from Veritas Cluster Manager:


3 Replies

  • It seems to me that Veritas Cluster is up and running.
    NetBackup resource has a problem, not VCS.

    What kind of changes were done at OS level? 
    We need to know this as this might be the key to the 'broken state' of NBU.

    What is the status of rest of resources - diskgroup, mount, IP, etc?
    Does the IP resource resolve correctly to NBU master server hostname?

    The error is saying that the resource 'became OFFLINE unexpectedly on its own'.

    Why is that? Did anyone manually stop NBU? Outside of the cluster?

    Because it was not VCS that stopped NBU, you need to find out what is wrong with NBU.

    VCS attempted to restart NBU twice, but that also failed.

    Have you tried to issue bpps while VCS is trying to start NBU?
    Can you see which processes are starting?
    Are certain processes starting and terminating again?
    Which processes are not starting up?

    Have you tried to online ALL resources except for NBU, and then start NBU manually from cmd?
    What happens when you do that?

  • Please rather post in NetBackup forum when the issue is with NBU.