Thanks for the response guys.
Netbackup version is 7.1
VCS version 5.0 MP3
I have posted the available logs. Just fyi, cluster setup is similar to all the masters but only this master is having this issue. Env is lot customized.
Agent_debug.log:
Fri Jul 24 20:54:35 2015 Start Offline.......
Fri Jul 24 20:57:37 2015 Start Offline.......
Fri Jul 24 21:08:39 2015 Start Offline.......
Sat Jul 25 05:15:41 2015 Start Offline.......
Sat Jul 25 05:18:43 2015 Start Offline.......
Sat Jul 25 05:20:45 2015 Start Offline.......
Sat Jul 25 05:23:47 2015 Start Offline.......
Sat Jul 25 05:26:49 2015 Start Offline.......
Sat Jul 25 05:28:50 2015 Start Offline.......
Sat Jul 25 05:31:20 2015 Start Online.......
Sun Jul 26 19:04:34 2015 Start Offline.......
Sun Jul 26 19:07:36 2015 Start Offline.......
Sun Jul 26 19:10:38 2015 Start Offline.......
Sun Jul 26 19:13:40 2015 Start Offline.......
nbu_rsp:
NBU_GROUP=$master
NODES=$nodes
SHARED_DISK=/usr/openv
VNAME=$master
PROBE_PROCS=vmd bprd bpdbm nbpem nbjm nbevtmgr nbemm nbrb NB_dbsrv nbaudit
CLUTYPE=VCS
PRODUCT_CODE=NBU
START_PROCS=NB_dbsrv nbevtmgr nbemm nbrb ltid vmd bpcompatd nbjm nbpem nbstserv nbrmms nbsl nbvault nbsvcmon bpdbm bprd bptm bpbrmds bpsched bpcd bpversion bpjobd nbproxy vltcore acsd tl8cd odld tldcd tl4d tlmd tshd rsmd tlhcd pbx_exchange nbkms nbaudit nbatd nbazd
DIR=kms mv
nbu_types_cf -- No such file (env is customized for your information)
engine log:
2015/07/24 21:09:40 VCS ERROR V-16-2-13006 (server node) Resource(NetBackup_$master): clean procedure did not complete
within the expected time.
2015/07/24 21:11:26 VCS INFO V-16-2-13026 (server node) Resource(NetBackup_$master) - monitor procedure finished succes
sfully after failing to complete within the expected time for (4) consecutive times.
2015/07/25 01:50:52 VCS INFO V-16-1-50135 User root fired command: haconf -dump from localhost
2015/07/25 05:09:43 VCS ERROR V-16-2-13027 (server node) Resource(NetBackup_$master) - monitor procedure did not comple
te within the expected time.
2015/07/25 05:15:41 VCS ERROR V-16-2-13210 (server node) Agent is calling clean for resource(NetBackup_$master) because
4 successive invocations of the monitor procedure did not complete within the expected time.
2015/07/25 05:16:42 VCS ERROR V-16-2-13006 (server node) Resource(NetBackup_$master): clean procedure did not complete
within the expected time.
2015/07/25 05:20:45 VCS INFO V-16-2-13001 (server node) Resource(NetBackup_$master): Output of the completed operation
(monitor)
Some Processes are DOWN while others are UP
Following Process are found DOWN: nbjm
Following Process are found UP: vmd bprd bpdbm nbpem nbevtmgr nbemm nbrb NB_dbsrv nbaudit
2015/07/25 05:20:45 VCS INFO V-16-2-13026 (server node) Resource(NetBackup_$master) - monitor procedure finished succes
sfully after failing to complete within the expected time for (5) consecutive times.
2015/07/25 05:20:45 VCS ERROR V-16-2-13067 (server node) Agent is calling clean for resource(NetBackup_$master) because
the resource became OFFLINE unexpectedly, on its own.
2015/07/25 05:21:46 VCS ERROR V-16-2-13006 (server node) Resource(NetBackup_$master): clean procedure did not complete
within the expected time.
2015/07/25 05:22:47 VCS INFO V-16-2-13001 (server node) Resource(NetBackup_$master): Output of the completed operation
(monitor)
Some Processes are DOWN while others are UP
Following Process are found DOWN: bprd nbjm
Following Process are found UP: vmd bpdbm nbpem nbevtmgr nbemm nbrb NB_dbsrv nbaudit
2015/07/25 05:23:47 VCS INFO V-16-2-13001 (server node) Resource(NetBackup_$master): Output of the completed operation
(monitor)
Some Processes are DOWN while others are UP
Following Process are found DOWN: bprd nbpem nbjm
Following Process are found UP: vmd bpdbm nbevtmgr nbemm nbrb NB_dbsrv nbaudit
2015/07/25 05:24:48 VCS INFO V-16-2-13003 (server node) Resource(NetBackup_$master): Output of the timed out operation
(clean)
Looking for NetBackup processes that need to be terminated.
Stopping nbpem...
Stopping nbproxy...
Stopping bpcompatd...
Stopping bpdbm...
The following processes are still active
root 3599 1 0 Jul24 ? 00:06:05 /usr/openv/netbackup/bin/bpdbm
root 3605 3599 3 Jul24 ? 00:32:37 /usr/openv/netbackup/bin/bpjobd
root 5370 1 0 Jul24 ? 00:00:05 /usr/openv/netbackup/bin/nbproxy dblib nbpem_email
root 10592 24181 0 05:22 ? 00:00:00 /usr/openv/netbackup/bin/admincmd/bpdbjobs -cancel 1108528
root 10595 24222 0 05:22 ? 00:00:00 /usr/openv/netbackup/bin/admincmd/bpdbjobs -summary -ignore_parent_job
s -all_columns
root 10637 24185 0 05:22 ? 00:00:00 /usr/openv/net
2015/07/25 05:25:49 VCS INFO V-16-2-13001 (server node) Resource(NetBackup_$master): Output of the completed operation
(monitor)
Some Processes are DOWN while others are UP
Following Process are found DOWN: vmd bprd bpdbm nbpem nbjm
Following Process are found UP: nbevtmgr nbemm nbrb NB_dbsrv nbaudit
2015/07/25 05:26:49 VCS INFO V-16-2-13001 (server node) Resource(NetBackup_$master): Output of the completed operation
(monitor)
Some Processes are DOWN while others are UP
Following Process are found DOWN: vmd bprd bpdbm nbpem nbjm
Following Process are found UP: nbevtmgr nbemm nbrb NB_dbsrv nbaudit
2015/07/25 05:27:49 VCS INFO V-16-2-13001 (server node) Resource(NetBackup_$master): Output of the completed operation
(clean)
Looking for NetBackup processes that need to be terminated.
Looking for Media Manager processes that need to be terminated.
Looking for more NetBackup processes that need to be terminated.
Stopping nbrb...
Stopping nbemm...
Stopping nbaudit...
Stopping nbevtmgr...
Stopping nbazd...
Stopping VxDBMS database server ...
Stopping bpcd...
Stopping vnetd...
Stopping nbatd...
/usr/openv/netbackup/bin/bp.kill_all FORCEKILL 2>&1 < /dev/null succeeded.
2015/07/25 05:29:22 VCS INFO V-16-2-13001 (server node) Resource(NetBackup_$master): Output of the completed operation
(clean)
Looking for NetBackup processes that need to be terminated.
Looking for Media Manager processes that need to be terminated.
Looking for more NetBackup processes that need to be terminated.
Stopping bpcd...
Stopping vnetd...
/usr/openv/netbackup/bin/bp.kill_all FORCEKILL 2>&1 < /dev/null succeeded.
2015/07/25 05:29:22 VCS INFO V-16-2-13078 (server node) Resource(NetBackup_$master) - clean completed successfully afte
r 3 failed attempts.
2015/07/25 05:29:22 VCS ERROR V-16-2-13073 (server node) Resource(NetBackup_$master) became OFFLINE unexpectedly on its
own. Agent is restarting (attempt number 1 of 2) the resource.
2015/07/25 05:31:33 VCS INFO V-16-2-13001 (server node) Resource(NetBackup_$master): Output of the completed operation
(online)
no new style logging available
2015/07/25 05:31:34 VCS NOTICE V-16-2-13076 (server node) Agent has successfully restarted resource(NetBackup_$master).
2015/07/25 10:21:34 VCS ERROR V-16-2-13027 (server node) Resource(NetBackup_$master) - monitor procedure did not comple
te within the expected time.
2015/07/25 10:26:56 VCS INFO V-16-2-13026 (server node) Resource(NetBackup_$master) - monitor procedure finished succes
sfully after failing to complete within the expected time for (3) consecutive times.
2015/07/25 17:12:44 VCS INFO V-16-1-50135 User root fired command: haconf -dump from localhost
2015/07/25 17:14:58 VCS INFO V-16-1-50135 User root fired command: haconf -dump from localhost
2015/07/26 00:45:59 VCS INFO V-16-2-13001 (server node) Resource(NetBackup_$master): Output of the completed operation
(monitor)
do_ypcall: clnt_call: RPC: Timed out
2015/07/26 08:40:28 VCS INFO V-16-1-50135 User root fired command: haconf -dump from localhost
2015/07/26 08:42:36 VCS INFO V-16-1-50135 User root fired command: haconf -dump from localhost
Netbackup resource log (vcs):
2015/07/24 21:11:26 VCS INFO V-16-2-13026 Thread(4133485456) Resource(NetBackup_$master) - monitor procedure finished
successfully after failing to complete within the expected time for (4) consecutive times.
2015/07/25 05:09:42 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4133485456)
2015/07/25 05:09:43 VCS ERROR V-16-2-13027 Thread(4145675152) Resource(NetBackup_$master) - monitor procedure did not
complete within the expected time.
2015/07/25 05:11:40 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4145675152)
2015/07/25 05:13:40 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4133485456)
2015/07/25 05:15:40 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4145675152)
2015/07/25 05:15:41 VCS ERROR V-16-2-13210 Thread(4133485456) Agent is calling clean for resource(NetBackup_$master)
because 4 successive invocations of the monitor procedure did not complete within the expected time.
2015/07/25 05:16:41 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4133485456)
2015/07/25 05:16:42 VCS ERROR V-16-2-13006 Thread(4145675152) Resource(NetBackup_$master): clean procedure did not co
mplete within the expected time.
2015/07/25 05:18:42 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4145675152)
2015/07/25 05:19:43 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4133485456)
2015/07/25 05:20:45 VCS INFO V-16-2-13026 Thread(4145675152) Resource(NetBackup_$master) - monitor procedure finished
successfully after failing to complete within the expected time for (5) consecutive times.
2015/07/25 05:20:45 VCS ERROR V-16-2-13067 Thread(4145675152) Agent is calling clean for resource(NetBackup_$master)
because the resource became OFFLINE unexpectedly, on its own.
2015/07/25 05:21:45 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4145675152)
2015/07/25 05:21:46 VCS ERROR V-16-2-13006 Thread(4133485456) Resource(NetBackup_$master): clean procedure did not co
mplete within the expected time.
2015/07/25 05:24:47 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4133485456)
2015/07/25 05:27:49 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4145675152)
2015/07/25 05:29:22 VCS ERROR V-16-2-13078 Thread(4133485456) Resource(NetBackup_$master) - clean completed successfu
lly after 3 failed attempts.
2015/07/25 05:29:22 VCS ERROR V-16-2-13073 Thread(4133485456) Resource(NetBackup_$master) became OFFLINE unexpectedly
on its own. Agent is restarting (attempt number 1 of 2) the resource.
2015/07/25 10:21:33 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4133485456)
2015/07/25 10:21:34 VCS ERROR V-16-2-13027 Thread(4145675152) Resource(NetBackup_$master) - monitor procedure did not
complete within the expected time.
2015/07/25 10:23:33 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4145675152)
2015/07/25 10:25:33 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4133485456)
2015/07/25 10:26:56 VCS INFO V-16-2-13026 Thread(4145675152) Resource(NetBackup_$master) - monitor procedure finished
successfully after failing to complete within the expected time for (3) consecutive times.
2015/07/26 18:38:33 VCS WARNING V-16-2-13139 Thread(4156165008) Canceling thread (4145675152)
2015/07/26 18:38:34 VCS ERROR V-16-2-13027 Thread(4133485456) Resource(NetBackup_$master) - monitor procedure did not
complete within the expected time.
@Marianne :
We aren't taking vault catalog, just regular images catalog backup.We are trying to resize the catalog partition since 2 weeks but due to unsuccessful catalog backup we are sitting tight