cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

Image duplication fails at about 70%

Jaap_Hartog
Level 4

Dear all,

in order to be able to restore old SAN data in the future, I am planning to duplicate the images as stored on old LTO2-tapes onto LTO4-tapes.
 

After a succesful start:

5/5/2015 7:53:57 AM - begin Duplicate
5/5/2015 7:53:58 AM - requesting resource mediaserver-hcart3-robot-tld-2-singledrive
5/5/2015 7:53:58 AM - requesting resource QXH138
5/5/2015 7:53:58 AM - reserving resource QXH138
5/5/2015 7:53:58 AM - reserving resource QXH140
5/5/2015 7:53:58 AM - reserving resource QXH125
5/5/2015 7:53:58 AM - reserving resource QXH126
5/5/2015 7:53:58 AM - reserved resource QXH138
5/5/2015 7:53:58 AM - reserved resource QXH140
5/5/2015 7:53:58 AM - reserved resource QXH125
5/5/2015 7:53:58 AM - reserved resource QXH126
5/5/2015 7:53:58 AM - granted resource VAB456
5/5/2015 7:53:58 AM - granted resource HP.ULTRIUM4-SCSI.002
5/5/2015 7:53:58 AM - granted resource mediaserver-hcart3-robot-tld-2-singledrive
5/5/2015 7:53:58 AM - granted resource QXH138
5/5/2015 7:53:58 AM - granted resource HP.ULTRIUM4-SCSI.000
5/5/2015 7:54:01 AM - Info bptm(pid=5628) start           
5/5/2015 7:54:01 AM - started process bptm (5628)
5/5/2015 7:54:02 AM - Info bptm(pid=5628) start backup          
5/5/2015 7:54:03 AM - Info bptm(pid=5388) start           
5/5/2015 7:54:03 AM - started process bptm (5388)
5/5/2015 7:54:04 AM - Info bpdm(pid=5388) reading backup image         
5/5/2015 7:54:04 AM - Info bptm(pid=5388) using 64 data buffers        
5/5/2015 7:54:04 AM - Info bptm(pid=5628) Waiting for mount of media id VAB456 (copy 1) on server mediaserver.
5/5/2015 7:54:04 AM - started process bptm (5628)
5/5/2015 7:54:04 AM - mounting VAB456
5/5/2015 7:54:06 AM - Info bptm(pid=5628) INF - Waiting for mount of media id VAB456 on server mediaserver for writing.
5/5/2015 7:54:06 AM - Warning bptm(pid=5388) media id QXH138 not found in Media Manager, mount request will most likely occur
5/5/2015 7:54:06 AM - Info bptm(pid=5388) Waiting for mount of media id QXH138 (copy 2) on server mediaserver.
5/5/2015 7:54:06 AM - started process bptm (5388)
5/5/2015 7:54:06 AM - mounting QXH138
5/5/2015 7:54:08 AM - Info bptm(pid=5388) INF - Waiting for mount of media id QXH138 on server mediaserver for reading.
5/5/2015 7:55:00 AM - Info bptm(pid=5628) media id VAB456 mounted on drive index 2, drivepath {5,0,2,0}, drivename HP.ULTRIUM4-SCSI.002, copy 1
5/5/2015 7:55:02 AM - mounted; mount time: 00:00:56
5/5/2015 7:55:02 AM - Info bptm(pid=5388) QXH138           
5/5/2015 7:55:04 AM - Info bptm(pid=5388) INF - Waiting for positioning of media id QXH138 on server mediaserver for reading.
5/5/2015 7:55:04 AM - positioning QXH138 to file 1
5/5/2015 7:55:04 AM - positioned QXH138; position time: 00:00:00
5/5/2015 7:55:06 AM - begin reading
5/5/2015 7:55:06 AM - Info bptm(pid=5388) waited for empty buffer 0 times, delayed 0 times   
5/5/2015 7:55:06 AM - end reading; read time: 00:00:00
5/5/2015 7:55:10 AM - Info bptm(pid=5628) waited for full buffer 0 times, delayed 0 times   
5/5/2015 7:55:11 AM - Info bptm(pid=5628) setting receive network buffer to 263168 bytes     
5/5/2015 7:55:11 AM - positioning QXH138 to file 2
5/5/2015 7:55:12 AM - positioned QXH138; position time: 00:00:01
5/5/2015 7:55:14 AM - Info bptm(pid=5388) completed reading backup image        
5/5/2015 7:55:20 AM - Info bptm(pid=5388) using 64 data buffers        
5/5/2015 7:55:23 AM - Info bptm(pid=5388) INF - Waiting for positioning of media id QXH138 on server mediaserver for reading.
5/5/2015 7:55:23 AM - positioning QXH138 to file 3
5/5/2015 7:55:23 AM - positioned QXH138; position time: 00:00:00
5/5/2015 7:55:25 AM - begin reading
5/5/2015 7:56:37 AM - Info bptm(pid=5388) waited for empty buffer 134 times, delayed 805 times   
5/5/2015 7:56:37 AM - end reading; read time: 00:01:12
5/5/2015 7:56:38 AM - positioning QXH138 to file 4
5/5/2015 7:56:38 AM - positioned QXH138; position time: 00:00:00
5/5/2015 7:56:38 AM - begin reading
5/5/2015 7:58:02 AM - Info bptm(pid=5388) waited for empty buffer 129 times, delayed 973 times   
5/5/2015 7:58:02 AM - end reading; read time: 00:01:24
5/5/2015 7:58:02 AM - positioning QXH138 to file 5
5/5/2015 7:58:02 AM - positioned QXH138; position time: 00:00:00
5/5/2015 7:58:02 AM - begin reading
5/5/2015 7:59:07 AM - Info bptm(pid=5388) waited for empty buffer 78 times, delayed 573 times   
5/5/2015 7:59:07 AM - end reading; read time: 00:01:05

the other tapes were mounted consecutively but at about 70% it all halted:

5/5/2015 4:21:24 PM - Info bptm(pid=3976) waited for empty buffer 51 times, delayed 354 times   
5/5/2015 4:21:24 PM - Info bptm(pid=2352) waited for full buffer 208403 times, delayed 825555 times   
5/5/2015 4:21:24 PM - end reading; read time: 00:00:42
5/5/2015 4:21:25 PM - Info bptm(pid=2352) setting receive network buffer to 263168 bytes     
5/5/2015 4:21:25 PM - positioning QXH126 to file 41
5/5/2015 4:21:25 PM - positioned QXH126; position time: 00:00:00
5/5/2015 4:21:32 PM - Info bptm(pid=3976) completed reading backup image        
5/5/2015 4:21:32 PM - Info bptm(pid=3976) EXITING with status 0 <----------       
5/5/2015 4:21:39 PM - Info bptm(pid=2352) EXITING with status 0 <----------       
client process aborted(50)

So after keeping my fingers crossed for about 8 1/2 hours, the duplication ended.

Has anybody got an idea what might be the cause of this problem ?
The detailed status does not give that much information and does not state anything about an error.
I have attached the detailed status in a text-file

Thanks in advance for your advice !
 

Jaap Hartog

1 ACCEPTED SOLUTION

Accepted Solutions

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Run the image duplucationss one at a time and lets see what happens. its should fill up one lto4 but it as long as you make sure to use only 1 write drive. You're using seperate pools for the duplication source and target right?

View solution in original post

17 REPLIES 17

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

What are you duplicating? One images, or a whole bunch of images in one go?

 

I would break it down into batches. Have you checked which ones worked and which ones didn't?

 

Are the source and destination drives on one server? Check the bpbrm and bptm logs.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Are you using the same media server for read and write? We will need logs to find out more about this 'client process' that 'aborted'. bptm and bpbrm at level 3 (or higher ) should do it.

Nicolai
Moderator
Moderator
Partner    VIP   

According to 

http://www.symantec.com/docs/HOWTO103980

a status 50 is :

  • Explanation: The client backup terminates abnormally. For example, a NetBackup master or media server is shut down or restarted when a backup or restore is in progress.

I would look at the all log entires section of the NBU  GUI, and the look around the time when duplication fails. Also take a look at Windows event view 

Jaap_Hartog
Level 4

@Riaan:
I am duplicating a bunch of images (representing one full backup session), spanning 4 LTO2-tapes in 1 go.
Theoretically it should fit on 1 LTO4-tape, somewhere during the last tape it just quits.
Source and destination-drives are attached to one and the same mediaserver 
(we only have 1)
Some time ago I have duplicated backups (made shortly before the duplication) spanning 1 LTO4-tape per backupjob, that went
without any problem.
@Marianne:
we are indeed using the same media server for read and write, because we only have 1 media server.
@Nicolai:
There was no intervention with a possible shutdown/restart of either the Netbackup master or Media server.
The duplication job even stopped hours before regular incremental backup started, so there was nothing running 
at that moment, other than the duplication job.

@all: I'll dive into the logs and let you know

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

No issue with the duplication in the attached log

 

16:21:25.242 [2352.4600] <4> write_backup: successfully wrote backup id OldSan_1128792038

Jaap_Hartog
Level 4

That applies for that specific backup id, but that's the 7th of in total 12 backup id's.....
There are some errormessages after the last "succesfully wrote...." statement, but I cannot deduce anything from that.

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

need the full log.

Jaap_Hartog
Level 4

bpbrm:050515.log shows:
09:59:27.630 [5268.3908] <2> job_monitoring_exex: ACK disconnect
09:59:27.630 [5268.3908] <2> job_disconnect: Disconnected
17:30:02.651 [4712.1392] <2> bpbrm main: INITIATING (VERBOSE = 5): version NetBackup 7.5 0

First two entries must be from the catalog-backup, the last one from the scheduled backup that starts at 5:30 PM.
No entries whatsoever concerning the duplication-attempt.

But the bptm-log showed more.

I have narrowed it down a bit more towards the point in time that the duplicationjob halted and attached it as 
bptm-050515.log.

Hopefully you can find something in in that explains the crash of the job.

 

Thanks in advance !

Jaap_Hartog
Level 4

Ok,

here's the full bptm-log.
 

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

I don't see any issues in that log. please provide the error code its failing with and the image affected.

Born2rise
Level 4
Hi Jaap it looks like, Connectivity has been broken between master and media server. Either you can try to increase connect and read timeout on master server and media server In host properperty. It may workes or you can try to perform duplication in small bunch of backup ids.

Jaap_Hartog
Level 4

@Riaan: I don't have any other errorcode than what's mentioned in the describing text at the beginning of this thread.

 

@born2rise:timeout settings are as follows (all in seconds) :

  Mediaserver Masterserver
Client connect timeout 300 300
Client read timeout 300 300
Backup start notify timeout 300 300
Backup end notify timeout 300 300
File browse timeout 300 300
Media server connect timeout 30 30

Above mentioned attempt comprised of 12 images, spanning 4 LTO2-tapes, theoretically fitting onto 1 LTO4-tape.
Is this behaviour (all resulting duplicate images being written to 1 tape -if it fits- ) still applicable, or will each batch take a new tape to write on ?
It would be a waste of space on the tapes, if the tapes aren't "filled" as far as possible.

 

Thanks for the advices so far !

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Run the image duplucationss one at a time and lets see what happens. its should fill up one lto4 but it as long as you make sure to use only 1 write drive. You're using seperate pools for the duplication source and target right?

Jaap_Hartog
Level 4

Using 1 write drive is defined in the used Storage Unit, right ?
I am indeed using separate pools for source and target...
To get this going, it is indeed a good idea to run 1 image-duplication at a time, that makes more progress than duplicating none at all, which is what is happening now.

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Yes on the storage unit, and yes, also easier to troubleshoot what is happnening.

Jaap_Hartog
Level 4

Running Duplications of 1 image at a time really does the job !

Thumbs up for Riaan !

 

Jaap

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

:)