cancel
Showing results for 
Search instead for 
Did you mean: 

Vault job hung?

ixat
Level 4

Hi guys, happy new year to you all. I have issues with one of my vault job which seems to be recurring recently. Require some advice on it.

 

The vault job should vault 4 images - A B C D.

 

Typical time taken for each client image to be vaulted is:

 

A - 1hr15min
B - 10min
C - 10min
D - 10min

 

Yesterday, Client A took much longer than it should and never completed at all. In the logs it showed

 

>07:48:49 INF - Waiting for positioning of media id ABCXXX on server ACME01.
doMonitor(): TCB1:duplicate.log.1_en
>07:48:50 INF - Continuing duplicate on server ACME01 of client A.
runDups(): duplicate.log.1_en: Waiting for completion of duplication (93 min elapsed)
runDups(): duplicate.log.1_en: Waiting for completion of duplication (124 min elapsed)
runDups(): duplicate.log.1_en: Waiting for completion of duplication (155 min elapsed)
runDups(): duplicate.log.1_en: Waiting for completion of duplication (186 min elapsed)
runDups(): duplicate.log.1_en: Waiting for completion of duplication (217 min elapsed)
runDups(): duplicate.log.1_en: Waiting for completion of duplication (248 min elapsed)
runDups(): duplicate.log.1_en: Waiting for completion of duplication (279 min elapsed)
runDups(): duplicate.log.1_en: Waiting for completion of duplication (310 min elapsed)
runDups(): duplicate.log.1_en: Waiting for completion of duplication (341 min elapsed)
runDups(): duplicate.log.1_en: Waiting for completion of duplication (372 min elapsed)
Kill_Signal: shutdown signal received

 

Today, client A did complete within the typical time frame of 1hr15min. However, client C which typically take 10min to duplicate got stuck instead using 1hr20min to read.

 

>10:02:18 INF - Waiting for positioning of media id ABCXXX on server ACME01.

 

doMonitor(): TCB1:duplicate.log.3_en
>10:02:52 INF - Beginning duplicate on server ACME01 of client C.
runDups(): duplicate.log.3_en: Waiting for completion of duplication (0 min elapsed)
runDups(): duplicate.log.3_en: Waiting for completion of duplication (31 min elapsed)
doMonitor(): TCB1:duplicate.log.3_en
>11:22:27 INF - Duplicate of backup id MOFA08_1231251249 failed, termination requested by administrator (150).
 

Point to note

1) No other job running during vault. 

2) Required tapes were available and mounted by vault job

3) Running Netbackup DC 4.5_FP6 on Solaris 9 with 4 x LT02 in L180 Library.

 

What other logs should I be looking at to find the error/cause? Any solution to this issue?

 

Thank you!

1 REPLY 1

Amit_Karia
Level 6

You can have a look at /usr/openv/netbackup/db/error logs and grep the backup id.. (After setting verbosity to 5) in bp.conf Also have a look at detail.log in SID of vault