Forum Discussion

Amar_Rajan's avatar
Amar_Rajan
Level 3
10 years ago
Solved

Netbackup processes going down

Hi All, nbvault process going down every night but not at same time which is affecting my catalog backup failing with 150. I checked various log but cant find any clue. Could you guys please help me on this, what needs to be checked and where. 

 

Also, on some days all the processes are going down and coming up by itself, due to this all the backups are getting killed with EC 50. Similarly i cant find any issue but only in VCS engine logs which says

*******************************************************log SNIP***********************

"clean procedure did not complete within the expected time"

"monitor procedure finished successfully after failing to complete within the expected time for (8) consecutive times"

"Agent is calling clean for resource(NetBackup_$service) because  4 successive invocations of the monitor procedure did not complete within the expected time"

Some Processes are DOWN while others are UP
Following Process are found DOWN: bprd nbjm
Following Process are found UP: vmd bpdbm nbpem nbevtmgr nbemm nbrb NB_dbsrv nbaudit

Looking for NetBackup processes that need to be terminated.
Stopping nbpem...
Stopping nbproxy...
Stopping bpcompatd...
Stopping bpdbm...

 

*************************************************************************

Could you guys please help.

 

 

  • We see that someone has change the Critical attribute to 0.

    This means that VCS will NOT take the resource down when the MonitorTimeout is exceeded.

    But it seems that someone or something other than VCS has killed some monitored NBU processes:

    2015/07/25 05:26:49 VCS INFO V-16-2-13001 (server node) Resource(NetBackup_$master): Output of the completed operation
    (monitor)
    Some Processes are DOWN while others are UP
    Following Process are found DOWN: vmd bprd bpdbm nbpem nbjm
    Following Process are found UP: nbevtmgr nbemm nbrb NB_dbsrv nbaudit

    This is why the CLEAN entry point was called: 


    2015/07/25 05:20:45 VCS ERROR V-16-2-13067 Thread(4145675152) Agent is calling clean for resource(NetBackup_$master)
    because the resource became OFFLINE unexpectedly, on its own.

     

    You need to enable additional logging as per Martin's suggestion and also check /var/adm/messages for this date and time. 

    Ensure logging is enabled for processes that were found to be DOWN.

17 Replies