Solved: Netbackup Clustered Master Server

blanco_adalbert · ‎04-27-2014

Hello Everybody!!

I would like to know if somebody has faced this:

We are configuring a Netbackup clustered server and it has worked during a failover. During a reboot of the active node the netbackup servers failed over to the second node. the first node rebooted well but when we try to move the netbackup server services to the first node we have faced whit this message under the cluster log:

3:34:10.978 [5908] <4> NBClusterApp::runCommand: Entering, with command:/usr/openv/netbackup/bin/goodies/netbackup start

13:34:11.082 [5908] <4> NBClusterApp::runCommand: NetBackup Authentication daemon started.

13:34:11.143 [5908] <4> NBClusterApp::runCommand: NetBackup network daemon started.

13:34:11.188 [5908] <4> NBClusterApp::runCommand: NetBackup client daemon started.

13:34:12.221 [5908] <4> NBClusterApp::runCommand: NetBackup SAN Client Fibre Transport daemon started.

13:34:13.975 [5908] <4> NBClusterApp::runCommand: NetBackup Database Server started.

13:34:14.011 [5908] <4> NBClusterApp::runCommand: NetBackup Authorization daemon started.

13:34:14.715 [5908] <4> NBClusterApp::runCommand: NetBackup Event Manager started.

13:34:15.526 [5908] <4> NBClusterApp::runCommand: NetBackup Audit Manager started.

13:34:20.706 [5908] <4> NBClusterApp::runCommand: NetBackup Deduplication Manager started.

13:34:30.878 [5908] <4> NBClusterApp::runCommand: NetBackup Deduplication Engine started.

13:34:31.608 [5908] <4> NBClusterApp::runCommand: NetBackup Enterprise Media Manager started.

13:34:32.171 [5908] <4> NBClusterApp::runCommand: NetBackup Resource Broker started.

13:34:32.768 [5908] <4> NBClusterApp::runCommand: Media Manager daemons started.

13:34:33.693 [5908] <4> NBClusterApp::runCommand: NetBackup request daemon started.

13:34:34.080 [5908] <4> NBClusterApp::runCommand: NetBackup compatibility daemon started.

13:34:34.588 [5908] <4> NBClusterApp::runCommand: NetBackup Job Manager started.

13:34:35.373 [5908] <4> NBClusterApp::runCommand: NetBackup Policy Execution Manager started.

13:34:36.079 [5908] <4> NBClusterApp::runCommand: NetBackup Storage Lifecycle Manager started.

13:34:36.788 [5908] <4> NBClusterApp::runCommand: NetBackup Indexing Manager started.

13:34:37.612 [5908] <4> NBClusterApp::runCommand: NetBackup Remote Monitoring Management System started.

13:34:38.263 [5908] <4> NBClusterApp::runCommand: NetBackup Key Management daemon started.

13:34:39.286 [5908] <4> NBClusterApp::runCommand: NetBackup Service Layer started.

13:34:40.215 [5908] <4> NBClusterApp::runCommand: NetBackup Agent Request Server started.

13:34:40.933 [5908] <4> NBClusterApp::runCommand: NetBackup Bare Metal Restore daemon started.

13:34:41.446 [5908] <4> NBClusterApp::runCommand: NetBackup Vault daemon started.

13:34:41.509 [5908] <4> NBClusterApp::runCommand: NetBackup CloudStore Service Container started.

13:34:41.766 [5908] <4> NBClusterApp::runCommand: VCS

13:34:41.784 [5908] <4> NBClusterApp::runCommand: NetBackup Service Monitor started.

13:34:41.963 [5908] <4> NBClusterApp::runCommand: NetBackup Bare Metal Restore Boot Server daemon started.

13:34:41.964 [5908] <4> NBClusterApp::runCommand: Exiting

13:34:41.964 [5908] <4> NBClusterApp::startApp: Exiting

13:34:41.964 [5908] <4> Online::main: Starting Application completed with status 0.

13:34:43.176 [6241] <16> monitor:processStatus: Some Processes are DOWN while others are UP

13:34:43.176 [6241] <16> monitor:processStatus: Following Process are found DOWN: nbevtmgr nbstserv nbpem nbjm nbaudit nbsl nbrmms nbemm nbrb

13:34:43.176 [6241] <16> monitor:processStatus: Following Process are found UP: vmd bprd bpdbm NB_dbsrv

13:35:43.606 [6412] <16> monitor:processStatus: Some Processes are DOWN while others are UP

13:35:43.606 [6412] <16> monitor:processStatus: Following Process are found DOWN: nbevtmgr nbstserv nbpem nbjm nbaudit nbsl nbrmms nbemm nbrb

13:35:43.606 [6412] <16> monitor:processStatus: Following Process are found UP: vmd bprd bpdbm NB_dbsrv

13:36:43.814 [6599] <16> monitor:processStatus: Some Processes are DOWN while others are UP

13:36:43.815 [6599] <16> monitor:processStatus: Following Process are found DOWN: nbevtmgr nbstserv nbpem nbjm nbaudit nbsl nbrmms nbemm nbrb

13:36:43.815 [6599] <16> monitor:processStatus: Following Process are found UP: vmd bprd bpdbm NB_dbsrv

13:36:44.442 [6615] <4> Offline::main: Offline called with 2 Parameters

13:36:44.442 [6615] <4> Offline::main: Initializing NBCluster using /usr/openv/netbackup/bin/cluster/NBU_RSP file

13:36:44.443 [6615] <4> NBClusterApp::stopApp: Executing Command : /usr/openv/netbackup/bin/bpclusterkill -forcekill -timeout 120 15,TERM -verbose

13:36:44.839 [6616] <4> standard_shutdown:

13:36:44.839 [6616] <4> standard_shutdown: Looking for NetBackup processes that need to be terminated.

13:36:44.849 [6616] <4> standard_shutdown: Stopping nbcssc...

13:36:44.936 [6616] <4> standard_shutdown: Suspending or cancelling selective jobs...

13:36:46.106 [6616] <4> standard_shutdown: Stopping bprd...

13:37:06.140 [6616] <4> standard_shutdown: Stopping bpcompatd...

13:37:06.302 [6616] <4> standard_shutdown: Stopping bpdbm...

13:37:11.862 [6616] <4> standard_shutdown:

13:37:11.863 [6616] <4> standard_shutdown: The following processes are still active

13:37:11.865 [6616] <4> standard_shutdown: root 6258 1 0 13:34:56 ? 0:01 /usr/openv/netbackup/bin/admincmd/bpstsinfo -UPDATE

13:37:11.865 [6616] <4> standard_shutdown: root 6145 1 0 13:34:35 ? 0:00 /usr/openv/netbackup/bin/bpdbm

13:37:11.867 [6616] <4> standard_shutdown: They will be terminated....

13:37:11.869 [6616] <4> standard_shutdown: Killing remaining processes...

13:37:11.989 [6616] <4> standard_shutdown:

13:37:11.990 [6616] <4> standard_shutdown: Looking for Media Manager processes that need to be terminated.

13:37:11.994 [6616] <4> standard_shutdown: Stopping ltid...

13:38:45.000 [6616] <2> standard_shutdown: /usr/openv/netbackup/bin/bp.kill_all FORCEKILL 2>&1 < /dev/null has been running for 121 seconds. Timing out.

13:38:45.000 [6616] <16> bpclusterkill main: standard_shutdown() failed: 76.

13:39:00.001 [6616] <2> find_processes: total malloc-ed = 4096

13:39:00.012 [6616] <2> get_solaris_processes: 121 files considered, 119 files opened/read

13:39:00.012 [6616] <2> find_processes: progs_list_size = 4096 == total calculated size = 4096 == (total_space_used = 404 + whats_left = 3692)

13:39:00.013 [6616] <2> find_processes: realloc down to 404

13:39:00.013 [6616] <2> kill_process: PID 5925 /usr/openv/netbackup/bin/vnetd -standalone killed by signal 15.

13:39:00.013 [6616] <2> kill_process: PID 5985 /usr/openv/db//bin/NB_dbsrv @/usr/openv/var/global/server.conf @/usr/openv/var/ killed by signal 15.

13:39:00.013 [6616] <2> kill_process: PID 6101 vmd killed by signal 15.

13:39:00.013 [6616] <2> kill_process: PID 5928 /usr/openv/netbackup/bin/bpcd -standalone killed by signal 15.

13:39:02.013 [6616] <2> find_processes: total malloc-ed = 4096

13:39:02.019 [6616] <2> get_solaris_processes: 118 files considered, 116 files opened/read

13:39:02.019 [6616] <2> find_processes: progs_list_size = 4096 == total calculated size = 4096 == (total_space_used = 140 + whats_left = 3956)

13:39:02.019 [6616] <2> find_processes: realloc down to 140

13:39:02.019 [6616] <8> kill_stragglers: Process 5985 /usr/openv/db//bin/NB_dbsrv @/usr/openv/var/global/server.conf @/usr/openv/var/ still running.

13:39:02.019 [6616] <8> kill_stragglers: 1 processes still running.

13:39:02.019 [6616] <16> bpclusterkill main: kill_stragglers() failed: 11.

13:39:12.229 [6616] <4> bpclusterkill main: Stopping vmd...

13:39:32.476 [6616] <4> bpclusterkill main:

13:39:32.476 [6616] <4> bpclusterkill main: Looking for more NetBackup processes that need to be terminated.

13:39:37.601 [6616] <4> bpclusterkill main: Shutdown command completed with status 0.

13:39:37.606 [6615] <4> NBClusterApp::stopApp: returning status : 2816

13:39:37.606 [6615] <4> NBClusterApp::deleteLinks: Entering

13:39:37.607 [6615] <4> NBClusterApp::deleteLinks: Removing link /usr/openv/volmgr/misc/robotic_db

13:39:37.607 [6615] <4> NBClusterApp::deleteLinks: Removing link /usr/openv/netbackup/db

13:39:37.607 [6615] <4> NBClusterApp::deleteLinks: Removing link /usr/openv/netbackup/vault/sessions

13:39:37.608 [6615] <4> NBClusterApp::deleteLinks: Removing link /usr/openv/var/global

13:39:37.608 [6615] <4> NetBackupApp::postOffline: Entering

13:39:37.608 [6615] <4> NBClusterApp::runCommand: Entering, with command:/usr/openv/netbackup/bin/vnetd -standalone

13:39:37.656 [6615] <4> NBClusterApp::runCommand: Exiting

13:39:37.656 [6615] <4> NBClusterApp::runCommand: Entering, with command:/usr/openv/netbackup/bin/bpcd -standalone

13:39:37.709 [6615] <4> NBClusterApp::runCommand: Exiting

13:39:37.709 [6615] <4> NBClusterApp::stopApp: Exiting

13:39:37.709 [6615] <4> Offline::main: Stop completed with status 99.

blanco_adalbert · ‎04-28-2014

Hello everybody,

I have applied that thecnote but it did not work.

So I opend a support case and the problem was resolved.

Actually the problen was the pbx_exchange process did start after reboot, so Symantec Support help me to check the enviroment and got that the pbx_exchante script was not located under /etc/rc2.d/S20vxpbx_exchanged -> /etc/init.d/vxpbx_exchanged

Thank you so much Symantec Support.

View solution in original post

sri_vani · ‎04-27-2014

Can you please tel us the version of the Netbackup..

also verify the below ref link... I think the solution will be applicable for u

http://www.symantec.com/business/support/index?page=content&id=TECH209415

Cause

NetBackup under a VCS cluster when a reboot is being performed (if all nodes rebooted at same time, or just the active node), it can cause chaos. When a node is rebooted the NetBackup start/stop scripts starts NetBackup services outside of cluster. As some services are upped the cluster gets confused and thinks NetBackup is already up and does not invoke the cluster online for the NetBackup resource. This leaves NetBackup in a partial state and requires manual intervention to recover, if the clean up scripts are not being invoked.

Solution

This is an known issue which is being fixed for NetBackup release 7.5.0.7 (Symantec Etrack reference 3249871). The solution being adopted is to make all services cluster aware, to ensure they will only come online on active node of cluster.

A workaround, move the NetBackup startup scripts from the startup locations. For example, (Solaris) ' mv /etc/rc2.d/S77netbackup /etc/rc2.d/N77netbackup '.

blanco_adalbert · ‎04-28-2014