cancel
Showing results for 
Search instead for 
Did you mean: 

Vault generates zombie processes

Anton_Panyushki
Level 6
Certified
Sometimes NetBackup vault a number of child processes that appear to be zombie (defunct) ones.
I wonder whether anyone of you faced this issue before?



# ptree 20904
20904 bpbrmvlt -bt 1238644819 -jobid 10601298 -jobgrpid 10601298 -masterversion 60000
  20908 /usr/openv/netbackup/bin/vltrun 1/lib2-vault/vlt-mc-bc-daily-dup -jobid 1060129
    2415  <defunct>
    20982 <defunct>
    26081 <defunct>
    2581  <defunct>
    26887 <defunct>
    21137 /usr/openv/netbackup/bin/vltrun 1/lib2-vault/vlt-mc-bc-daily-dup -jobid 1060129
    4929  /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit mediasrv2-hcart2-robot
    14917 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv-hcart2-rob
    22933 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit mediasrv2-hcart2-robot
    23319 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit mediasrv2-hcart2-robot
    28582 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv5-hcart2-ro
    29543 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv4-hcart2-ro
    6166  /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv5-hcart2-ro
    11587 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit mediasrv2-hcart2-robot
    14737 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv4-hcart2-ro
    17705 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv-hcart2-rob
3 REPLIES 3

reson8
Level 4

Looks like you are running a NetBackup 6.0 Master server, are you running any maintenance packs?

You might want to try to determine what process is leaving the defunct processes.  My bet would be bpduplicate or the less unlikely vtlrun process.

Create the log directories, bpduplicate, vault and admin under the /usr/openv/netbackup/logs directory.

Increase verbose logging level to 5 in the bp.conf and restart NBU daemons.

Once you have some defunct processes, try greping for these process id's in those 3 log directories.

This should atleast start you on the right path. 

Anton_Panyushki
Level 6
Certified
Yes, you are right. They really appear to be bpduplicate processes that abnormally terminated with status code 50. According to bpduplicate log the termination was due to ltid failure on the media server.

I have also noticed that in spite  of a vault job sometimes appears to be terminated with status code 150 in Activity Monitor, it is still goes on. It looks like the second try of vault job.

I wonder what causes the fake termination of vault.

reson8
Level 4

I would download the Vault patch and read the readme file.  This file will contain ETRACK's for identified issue, if your issue is listed then the direction you want to take is to upgrade your Master and Media servers.

Otherwise you may want to open up a support ticket with SYMC and provide your findings.  Might be an issue someone else has seen in which SYMC support may be able to provide an updated binary.