02-12-2018 02:36 PM
Every month, we run bpduplicate (manually) to make an encrypted copy of our backups to send off site. This month, I have a set of back up images that appear to never finish. I've cancelled and retried the duplication three times, each time it seems to get stuck. The 7 image set uses about 11TB of data. The duplication seems to run ok for the first day and then it seems to get stuck at 35%. Our backup server (alexandria) is running Netbackup 7.7.3.
According to the trylogs (db/jobs/trylogs), there doesn't appear to be anything wrong:
LOG 1518203352 4 bptm 14933 INF - Waiting for positioning of media id 1063L5 on server alexandria for reading.
LOG 1518203352 4 bptm 14933 INF - Waiting for positioning of media id 1063L5 on server alexandria for reading.
LOG 1518203352 4 bptm 14933 INF - Waiting for positioning of media id 1063L5 on server alexandria for reading.
LOG 1518203352 4 bptm 14933 INF - Waiting for positioning of media id 1063L5 on server alexandria for reading.
LOG 1518203352 4 bptm 14933 INF - Waiting for positioning of media id 1063L5 on server alexandria for reading.
LOG 1518203352 4 bptm 14933 INF - Waiting for positioning of media id 1063L5 on server alexandria for reading.
LOG 1518203352 4 bptm 14933 INF - Waiting for positioning of media id 1063L5 on server alexandria for reading.
POSITIONING 1518203352 1063L5 1
POSITIONED 1518203352 1063L5
POSITIONING 1518203392 1063L5 2
POSITIONED 1518203392 1063L5
POSITIONING 1518203843 1063L5 3
POSITIONED 1518203844 1063L5
I let the job run over the weekend, but it never progressed past the 35% and none of the logs have been updated. I then attempt to cancel the job, and I see that the request was received but the job never cancels. I try to gracefully kill the bpduplicate process, but it won't die. I look in the netbackup/logs/bptm and see there's a couple of bptm jobs that appear to have the "lock". I attempt to gracefully kill the bptm process that has the lock and then everything clears up. The bptm process that seems to be hung, generates a lot of these messages:
process_brm_msg: no pending message from bpbrm
How do I find out what's causing the duplication jobs to just hang. There doesn't appear to be any indication why the jobs are hanging. The only thing I can think of, is that there's some sort of dead lock on resources. This is the third time, I've attempted to copy these backup images, but each time, it hangs. I run bpimagelist and discover none of the images were copied to completion.
Any pointers would be great!
PS, I'm a bit new on being a Netbackup admin, as I recently took over for someone.
Quang
02-12-2018 10:51 PM
My guess is that there is something wrong with the source tape - media id 1063L5.
Before you try again increase bptm logging level to 3 or higher. (I am prepared to assist with reading level 3 log. Veritas employees will ask for level 5.)
bpbrm log (level 3) may also be helpful.
Please copy logs to .txt files (e.g. bptm.txt) and upload as file attachment.