some of parent jobs stuck after their child finish

Hi everybody

  it is about a week that I am encountering this weird event in Activity Monitor , some parent job are still in progress and don't finish while their child finished successfully and if I want to get rid of them I have to forcibly terminate them.

any idea ?

Feb 17, 2019 9:38:26 AM - Info nbjm (pid=872) starting backup job (jobid=3756) for client , policy Catalog-Tape, schedule Full
Feb 17, 2019 9:38:26 AM - Info nbjm (pid=872) requesting CATALOG_BACKUP_RESOURCE resources from RB for backup job (jobid=3756, request id:{6DA0215D-799B-4F49-82A4-B5E2D20AB7AA})

Feb 17, 2019 9:38:26 AM - requesting resource .NBU_CATALOG.MAXJOBS

Feb 17, 2019 9:38:26 AM - granted resource .NBU_CATALOG.MAXJOBS

Feb 17, 2019 9:38:26 AM - estimated 0 kbytes needed

Feb 17, 2019 9:38:26 AM - begin Parent Job

Feb 17, 2019 9:38:26 AM - begin Catalog Backup: Start Notify Script

Feb 17, 2019 9:38:26 AM - Info RUNCMD (pid=8368) started

Feb 17, 2019 9:38:26 AM - Info RUNCMD (pid=8368) exiting with status: 0

Operation Status: 0

Feb 17, 2019 9:38:26 AM - end Catalog Backup: Start Notify Script; elapsed time 0:00:00

Feb 17, 2019 9:38:26 AM - begin Catalog Backup: Database Manager Query

Operation Status: 0

Feb 17, 2019 9:43:13 AM - end Catalog Backup: Database Manager Query; elapsed time 0:04:47

Feb 17, 2019 9:43:13 AM - begin Catalog Backup: Validate Image

Operation Status: 0

Feb 17, 2019 9:43:13 AM - end Catalog Backup: Validate Image; elapsed time 0:00:00

Feb 17, 2019 9:43:13 AM - begin Catalog Backup: End Notify Script

Feb 17, 2019 9:43:13 AM - Info RUNCMD (pid=10216) started

Feb 17, 2019 9:43:13 AM - Info RUNCMD (pid=10216) exiting with status: 0

8 Replies
Highlighted

Re: some of parent jobs stuck after their child finish

Can you please post the successfully completed child job details here to understand it better..Also if required verbose level logging needs to be enabled to check how the processes are communicating.

Do you notice any other performance related issues apart from the reported one.

How are the performance tuning parameters set in the environment..

 

Re: some of parent jobs stuck after their child finish

@sri_vani 

policyname : site10.0.4.228

type:windows

no attributes in policy properties are selected except " multiple data stream"

and in the "backup selections" tab , there are two UNC path .

Child job Details:

---

Feb 18, 2019 8:36:39 PM - Info nbjm (pid=4392) starting backup job (jobid=3798) for client srv83, policy Site10.0.4.228, schedule Diff-Inc

Feb 18, 2019 8:36:40 PM - estimated 11374987 kbytes needed

Feb 18, 2019 8:36:40 PM - Info nbjm (pid=4392) started backup (backupid=srv83_1550509599) job for client srv83, policy Site10.0.4.228, schedule Diff-Inc on storage unit 10-hcart-robot-tld-0

Feb 18, 2019 8:36:40 PM - started process bpbrm (pid=6444)

Feb 18, 2019 8:36:52 PM - Info bpbrm (pid=6444) srv83 is the host to backup data from

Feb 18, 2019 8:36:52 PM - Info bpbrm (pid=6444) reading file list for client

Feb 18, 2019 8:36:58 PM - connecting

Feb 18, 2019 8:37:05 PM - Info bpbrm (pid=6444) starting bpbkar32 on client

Feb 18, 2019 8:37:05 PM - connected; connect time: 0:00:00

Feb 18, 2019 8:37:15 PM - Info bpbkar32 (pid=19136) Backup started

Feb 18, 2019 8:37:15 PM - Info bpbkar32 (pid=19136) change time comparison:<disabled>

Feb 18, 2019 8:37:15 PM - Info bpbkar32 (pid=19136) archive bit processing:<enabled>

Feb 18, 2019 8:37:15 PM - Info bptm (pid=6364) start

Feb 18, 2019 8:37:15 PM - Info bptm (pid=6364) using 65536 data buffer size

Feb 18, 2019 8:37:15 PM - Info bptm (pid=6364) setting receive network buffer to 263168 bytes

Feb 18, 2019 8:37:15 PM - Info bptm (pid=6364) using 30 data buffers

Feb 18, 2019 8:37:15 PM - Info bpbkar32 (pid=19136) not using change journal data for <F:\Daily Backup>: not enabled

Feb 18, 2019 8:37:15 PM - Info bptm (pid=6364) start backup

Feb 18, 2019 8:37:15 PM - Info bptm (pid=6364) backup child process is pid 200.7088

Feb 18, 2019 8:37:15 PM - Info bptm (pid=6364) Waiting for mount of media id PW7702 (copy 1) on server .

Feb 18, 2019 8:37:15 PM - Info bptm (pid=200) start

Feb 18, 2019 8:37:15 PM - mounting PW7702

Feb 18, 2019 8:38:09 PM - Info bptm (pid=6364) media id PW7702 mounted on drive index 13, drivepath {2,0,6,0}, drivename HP.ULTRIUM4-SCSI.013, copy 1

Feb 18, 2019 8:38:09 PM - mounted PW7702; mount time: 0:00:54

Feb 18, 2019 8:38:15 PM - positioning PW7702 to file 113

Feb 18, 2019 8:40:02 PM - Info bptm (pid=6364) waited for full buffer 0 times, delayed 0 times

Feb 18, 2019 8:40:02 PM - positioned PW7702; position time: 0:01:47

Feb 18, 2019 8:40:02 PM - begin writing

Feb 18, 2019 8:40:11 PM - Info bptm (pid=6364) EXITING with status 0 <----------

Feb 18, 2019 8:40:11 PM - Info bpbrm (pid=6444) validating image for client srv83

Feb 18, 2019 8:40:18 PM - Info bpbkar32 (pid=19136) done. status: 0: the requested operation was successfully completed

Feb 18, 2019 8:40:18 PM - end writing; write time: 0:00:16

the requested operation was successfully completed (0)

--

About the verbose logging: I will enable it and tell you the details.

about the performance :  I should say , All the jobs which have a hierarchy structure ( parent and child) are facing this issue.

performance tuning parameters:  nothing , I will search for best practices, but if you have a link, please post it here.

beside these , please consider that the first post was Catalog backup's logs.

Re: some of parent jobs stuck after their child finish

As per your first post, thought it was confined to catalog parent job. However  the issue  is with all the multi data stream enabled backups.

Is it occassional issue or always getting stuck and end up cancelling the backups?It might be backup stuck to execute the next child jobs or child job failed to update the status to parent job.

Netbackup restart/reboot will generally help such issues as a tactial solution.

Please refer to NetBackup Backup Planning and Performance Tuning Guide

 

Re: some of parent jobs stuck after their child finish

Similar issue over here:

https://vox.veritas.com/t5/NetBackup/catalog-backup-policy-does-not-work/m-p/863064

Post was marked as 'Solved', but there was no real solution. 
My recommendation was to log a call with Veritas Support.

Re: some of parent jobs stuck after their child finish

@sri_vani 

 it is occasion issue and as I told you , if we stop all services with bpdown  command the " problematic jobs" will end  . but I want to find the real reason. 

Re: some of parent jobs stuck after their child finish

Veritas Support can assist by digging into logs. 

They will probably need max logging levels. 

Re: some of parent jobs stuck after their child finish

@Marianne  is this a special situation that needs log for support ? cause I see there are lots of issues that were replied in the VOX forums .

in what situations we should log for support calls?

Re: some of parent jobs stuck after their child finish

@Noyan 

VOX is a community forum where fellow NBU users can share advice from own experience .
Sometimes fellow VOX users find previous posts, TNs and KB Articles that they share.
Sometimes fellow VOX users did not have same/similar experience for which they can share a solution. 

Other than the links that I have posted, I do not see anyone here on VOX with similar experience that was solved.

So, if other users do not have helpful advice and we can see that detailed, high level logging is needed, then best to log a Support call. 
Support engineers have the time and tools to analyze huge logs.