cancel
Showing results for 
Search instead for 
Did you mean: 

Duplication slow from Media agents

raffl
Level 4

Master server on 7.6.0.4 - WS 2008 R2

4 Media Server on 7.6.0.4 - WS 2012 R2

LTO 5 Drive and LTO 5 Tape

HBA on Master server and Media server is 8 Gbit/sec

NIC is 10 GB on master and Media

Using 6 Drives shared between 4 Media Server and 1 Master Server.

1 STU per server

Master server pointing to \\ABC\A ( already configured the login account for share and nbu services for master and all the media server)

Media server 1 pointing to \\ABC\B and so on

Doing the basic duplication from Catalog and duplication.

Duplication which is going to Master server is with very good speed 150 GB / Hour over the wan on that server.

Duplication from media server is very slow and around 60 GB/ hour.

Number_Data_Buffers on Master and Media is 32

Size_Data_Buffer on master and media is 262144

If i transfer the files directly from server to target STU then it is all good so I am sure the problem is not with the n/w. Am i missing some setting on the media servers.

Please help me.

 

 

 

 

 

 

 

 

 

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

Marianne
Level 6
Partner    VIP    Accredited Certified

I like it when people want to learn...

There is a lengthy chapter in NetBackup Backup Planning and Performance Tuning Guide  : 

Tuning the NetBackup data transfer path 
Look for the section explaining what to look for in bptm w.r.t. 'Waited for full buffers...' and 'waited for empty buffers...' .

Seeing that the only difference between the master and media servers seems to be the OS, you may want to go through OS-related tuning factors for Windows  section as well. 

 

View solution in original post

Tousif
Level 6

Hello,

 

First enable the bpbkar log and perform bpbkar null test on source machine image path. 

In bpbkar log you will find how much time it took to read the data.

Enable the bptm log on target media server and run the duplication. It will show you the waiting for buffer full/empty.

Waiting for full buffer: It mean client have issue to provide the data.

Waiting for empty buffer: It mean media server or tape library have some issue.

NBU is just application and we can tune the buffer size but it will only enhance the buffer which can help in some area. If you are getting low bandwidth then NBU can't help you much in that. 

NBU application never demand specific amount of resources from server. The resources (CPU, Memory and bandwidth) provided by OS. NBU is just requester and it will accept whatever amount of resources provided by OS.

So OS provide low bandwidth the backup will take more time.

In your case, If you are getting low bandwidth then I would suggest to check with OS team.

 

Thanks & Regards,

 

View solution in original post

11 REPLIES 11

Marianne
Level 6
Partner    VIP    Accredited Certified

Please compare bptm log on master server with bptm log on a media server. 

If you need assistance (time permitting) copy logs to .txt file(s) e.g. media1-bptm.txt and upload here.

 

Marianne - what exactly to look in the logs, please tell me. I will try to learn.

Marianne
Level 6
Partner    VIP    Accredited Certified

I like it when people want to learn...

There is a lengthy chapter in NetBackup Backup Planning and Performance Tuning Guide  : 

Tuning the NetBackup data transfer path 
Look for the section explaining what to look for in bptm w.r.t. 'Waited for full buffers...' and 'waited for empty buffers...' .

Seeing that the only difference between the master and media servers seems to be the OS, you may want to go through OS-related tuning factors for Windows  section as well. 

 

Tousif
Level 6

Hello,

 

First enable the bpbkar log and perform bpbkar null test on source machine image path. 

In bpbkar log you will find how much time it took to read the data.

Enable the bptm log on target media server and run the duplication. It will show you the waiting for buffer full/empty.

Waiting for full buffer: It mean client have issue to provide the data.

Waiting for empty buffer: It mean media server or tape library have some issue.

NBU is just application and we can tune the buffer size but it will only enhance the buffer which can help in some area. If you are getting low bandwidth then NBU can't help you much in that. 

NBU application never demand specific amount of resources from server. The resources (CPU, Memory and bandwidth) provided by OS. NBU is just requester and it will accept whatever amount of resources provided by OS.

So OS provide low bandwidth the backup will take more time.

In your case, If you are getting low bandwidth then I would suggest to check with OS team.

 

Thanks & Regards,

 

I was reading the doc provided by you. It is too big but i think i am getting it. Now in my case bpbkar will not use as i am not doing the backup. Just doing the duplication.

Why it is taking that long and socket is closed while waited for empty buffer 

04:58:22.377 [15712.8272] <4> write_backup: begin writing backup id XXXXX_1367647225, copy 2, fragment 1, destination path \\ABC\01
04:58:22.377 [15712.8272] <2> write_data: twin_index: 0 active: 1 dont_process: 0 wrote_backup_hdr: 0 finished_buff: 0 saved_cindex: -1 twin_is_disk 1 delay_brm: 0
04:58:22.377 [15712.8272] <2> write_data: Total Kbytes transferred 0
04:59:16.532 [15712.8272] <2> write_data: first write, twin_index: 0 cindex: 0 dont_process: 1 wrote_backup_hdr: 0 finished_buff: 0
04:59:16.532 [15712.8272] <2> write_data: received first buffer (262144 bytes), begin writing data
04:59:16.532 [15712.8272] <2> send_media_kbytes_written_establish_threshold: CINDEX 0, RB Kbytes for monitoring = 3000000
04:59:39.022 [15712.8272] <2> send_MDS_msg: KBYTES_WRITTEN 0 {A452AB9D-54DF-4C1D-A0FD-FEC3174525B1} 9667 2 3000064 2227666
04:59:39.022 [15712.8272] <2> JobInst::sendIrmMsg: returning
05:03:15.322 [15712.8272] <2> send_MDS_msg: KBYTES_WRITTEN 0 {A452AB9D-54DF-4C1D-A0FD-FEC3174525B1} 9667 2 6000128 2227666
05:03:15.322 [15712.8272] <2> JobInst::sendIrmMsg: returning
05:04:32.178 [8752.9232] <2> fill_buffer: [15712] socket is closed, waited for empty buffer 1373 times, delayed 18682 times, read 6922006 Kbytes
05:04:32.209 [15712.8272] <2> write_data: writing block shorter than BUFF_SIZE, 22528 bytes
05:04:32.209 [15712.8272] <2> write_data: writing short block, 22528 bytes, remainder 0
05:04:32.209 [15712.8272] <2> write_data: waited for full buffer 506 times, delayed 4276 times
05:04:32.209 [15712.8272] <2> write_data: Total Kbytes transferred 6922006

Hello,

Now in my case bpbkar will not use as i am not doing the backup.

I just suggested you general info which can help you to isolate the issue. In you case bpbkar null test will give you the disk I/O info which can help you to isolate client site and Media server site issue.

Waiting for emtpy buffer, it mean something is wrong with Media server and tape library. But if I calculate the delay, its not more than 4 min.  You can also try to tune and increase the buffer size on target media server to enhance the performance.

Could you please check how many time this delay occurred during the duplication?

How much bandwidth are you getting during duplication?

How much bandwidth do you expect during duplication?

Thanks & Regards,

 

I am getting good throuput on Master server, around 120 GB/Hr which is ws 2008 but on all the 4 media server i am getting 50-60 GB/ Hr.

can you tell me who would be reading the data from tape. Is it the media server or master server. I have the robot controller as master server only.

I see that the dupliction starts with good speed but then reduces to very slow speed.

 I see that the media server shows pbx_exchange.exe and receiving data at 50 mbps in the resource monitor.

 

01:58:31.785 [9944.9704] <2> main: Sending [EXIT STATUS 0] to NBJM
01:58:31.785 [9944.9704] <2> bptm: EXITING with status 0 <----------
01:59:26.446 [9936.9940] <2> send_MDS_msg: KBYTES_WRITTEN 0 {F5CC47CB-8C8E-43FD-8563-CE4DF2E44680} 10703 1 75001600 2227801
01:59:26.446 [9936.9940] <2> JobInst::sendIrmMsg: returning

What is this can someone explain, it took 10 minutes for the next line


02:08:38.156 [9296.8200] <4> RedirectNBLoggerToLegacyLog: Setting up redirection of VxUL messages to legacy log
02:08:38.172 [9296.8200] <4> RedirectNBLoggerToLegacyLog: Verbosity from VxUL configuration: 1
02:08:38.172 [9296.8200] <4> RedirectNBLoggerToLegacyLog: logmsg verbosity set: 1
02:08:38.172 [9296.8200] <2> bptm: INITIATING (VERBOSE = 0): -rptdrv -jobid -1489985947 -jm
02:08:38.172 [9296.8200] <2> main: Sending [EXIT STATUS 0] to NBJM
02:08:38.172 [9296.8200] <2> bptm: EXITING with status 0 <----------

again the next log line,  it took 10 minutes for the next line


02:18:34.756 [9400.7444] <4> RedirectNBLoggerToLegacyLog: Setting up redirection of VxUL messages to legacy log
02:18:34.756 [9400.7444] <4> RedirectNBLoggerToLegacyLog: Verbosity from VxUL configuration: 1
02:18:34.756 [9400.7444] <4> RedirectNBLoggerToLegacyLog: logmsg verbosity set: 1
02:18:34.756 [9400.7444] <2> bptm: INITIATING (VERBOSE = 0): -rptdrv -jobid -1489985952 -jm
02:18:34.756 [9400.7444] <2> main: Sending [EXIT STATUS 0] to NBJM
02:18:34.756 [9400.7444] <2> bptm: EXITING with status 0 <----------
02:19:09.713 [9936.9940] <2> send_MDS_msg: KBYTES_WRITTEN 0 {F5CC47CB-8C8E-43FD-8563-CE4DF2E44680} 10703 1 78001664 2227801
02:19:09.713 [9936.9940] <2> JobInst::sendIrmMsg: returning
02:19:16.496 [9936.9940] <2> ConnectionCache::connectAndCache: Acquiring new connection for host XXXXX, query type 1
02:19:16.527 [9936.9940] <2> vnet_pbxConnect: pbxConnectEx Succeeded

Hi,

 

Kindly share complete job detail status (Parent/Child) in txt format. 

 

Thanks & Regards,

 

Hey Tousif,

one more thing i checked, i am running the dupliction from tape to disk and using preserve multiplexing.

I can only use this option when I am ursing the master server STU.

If i am using the media server STU then it gives me "error cannot connect to the writing side process for duplication"

If I am running the duplication on Media server without the preserve multiplexing checked, it is running fine.

Will it cause the slow througput ?

Hi,

 

Thanks for sharing this info.

The multiplexing is option resolve the lack of resources issue in NBU enviornment.

Example:

You have only one tape drive and want to run 2 or more jobs at the same time then multiplexing get use.

The multiplexing resolve the resources issue but it writing data random area which cause delay to read the data.

I never recommend to use the Multiplexing option. It will help to resolve the resource issue but restore performance degrade. I always recommend increase the resources. The restore request is always high priority than backup.

Kindly un-check the multiplexing if you have enough resources. Multiplexing will degrade the read performance and duplication.

 

Thanks & Regards