10-11-2011 04:46 PM
Noticed performance drop when single bptm process spawn multiple duplication jobs.
Problem Description:
ServerA is backed up using "_Test_XYZ1Day" SLP. As per the SLP config, the backup goes to disk staging pool (Advance disk pool XYZStagingpool1) and then duplicated (3copies) to tape. Throughput of the duplication job to tape drive drops directly in proportion to the number of duplication per SLP. (NOTE: each duplication job uses a seperate Tape drive)
Test Results
NOTE: Combined throughtput of all the duplication jobs seems limited to 160 - 170MB/sec.
Test Conditions: All the above tests are stats are under below conditions
Storage lifecycle policy details:
#nbstl _Test_XYZ1Day -L
Name: _Test_XYZ1Day
Data Classification: (none specified)
Duplication job priority: 0
State: active
Version: 5
Destination 1 Use for: backup
Storage Unit: XYZStagingPool1
Volume Pool: (none specified)
Server Group: (none specified)
Retention Type: Capacity Managed
Retention Level: 10 (1 day)
Alternate Read Server: (none specified)
Preserve Multiplexing: false
State: active
Source: (client)
Destination ID: (none specified)
Destination 2 Use for: duplication
Storage Unit: media3-hcart-tld-2
Volume Pool: temp_cp_test1
Server Group: Any
Retention Type: Fixed
Retention Level: 10 (1 day)
Alternate Read Server: (none specified)
Preserve Multiplexing: false
State: active
Source: (primary)
Destination ID: (none specified)
Destination 3 Use for: duplication
Storage Unit: media3-hcart-tld-2
Volume Pool: temp_cp_test1
Server Group: Any
Retention Type: Fixed
Retention Level: 10 (1 day)
Alternate Read Server: (none specified)
Preserve Multiplexing: false
State: active
Source: (primary)
Destination ID: (none specified)
Destination 4 Use for: duplication
Storage Unit: media3-hcart-tld-2
Volume Pool: temp_cp_test2
Server Group: Any
Retention Type: Fixed
Retention Level: 10 (1 day)
Alternate Read Server: (none specified)
Preserve Multiplexing: false
State: active
Source: (primary)
Destination ID: (none specified)
Any ideas would be appreciated.
Thanks.
Solved! Go to Solution.
10-14-2011 01:30 AM
Fully with you and your setup and tuning tests now.
The issue here is that you use inline copy and its handling of the bptm process which is a know problem.
This tech note explains it better http://www.symantec.com/docs/HOWTO56160
It maybe that you can think about this to see if there is a bottleneck anywhere that could help imporve things but the issue stems from using inline copy
Hope this helps
10-11-2011 09:02 PM
Well, the figures you mention seems to correlate well with 2Gbit FC, approx 170-180MB/s.
I think your problem may be that you use all three drives on one HBA port. A simple test here would be to "down" 1-2 drives on the first port, and see if the same behaviour occur when the jobs are forced to run to diffferent ports. If it is a HBA port bandwidth problem, then you should in this case have two duplication jobs, but each running to drives on separate ports, and you should get the full speed. If not, then perhaps the I/O bus on the media server is the limiting factor.
Also, you don't mention how the disk system is connected? Is it connected using the same HBA ports, or dedicated? Even if the I/O direction is different, that is, read from disk, write to tape, there will be overhead handling all I/O on the HBA, the device driver, and OS kernel. In this case it may help to use a higher I/O block size. Exact configuration varies depending on OS.
/A
10-12-2011 04:36 AM
Hi
If this is not all dropped on a single HBA (and hopefully it isn't) then when you do more than one duplication at a time (inline copies) then you can find that the bptm process handles the data through its buffers in such a way that this has been seen to occurr.
Tuning your data buffers can help - just try different numbers.
As you are using LTO4 drives the size for best performance in SIZE_DATA_BUFFERS is 262144
If you are not already using this value you will need to bplabel any previously (BUT EXPIRED!) tapes, otherwise NBU will jsut read the header and change the buffer size to match the old setting.
So then try 16, 32 and 64 in the NUMBER_DATA_BUFFERS file to see what gives you the best performance.
All buffer files in \netbackup\db\config\
32 is usually good but this will vary when using inline copy
Of course you can also tune your DISK buffers which may improve things further
Hope this helps
10-12-2011 05:15 AM
10-12-2011 08:15 PM
Thanks guys for the comments.
I want to clarify few more things:
-2Gbit port: I am able to get 193MB/sec sustained throughput on a 2 gbit port.
- Most of the tests are done with 2xduplication jobs. I started testing 3xduplication to correlate the 160MB/sec - 170MB/sec bptm limitation.
- During the 2xduplication job tests, I made sure that I use the tape drives on a different HBA, which are connected to differnt fabric switchs.
ALL TESTED SCENARIOS:
- Disk is a Storage LUN from Storagetek SAN array dedicated for backup Staging. The array is connected using another set of two single port hba's. (bpbkar to null test provided 270MB/sec sustained throughput)
- All HBA's are PCI-E (64 bit) and so no way close to saturating bus limitation.
- Disk Buffers: - Tested with 128k, 256k, 512k, 1024k. 1024k seems to be slightly better but 512K seems to balance the delayed waits.
- Number data buffers: - 256. (the server is loaded with 32GB RAM) and no more than 200 concurrent jobs running at the same time.
- Tuning: The point to be note is we are getting 120MB/sec sustained to a LTO4 drive. Which is a optimal value for LTO4 and anything over 120MB/sec is just due to compression.
SCENARIO (1): - 1x duplication job goint to 1xTP Drive on Hba1 on Hba2 - (processess involved bptm controlling 1xtape drive , bpdm reading from disk)
I get Throughput 120MB/sec - 140MB/sec
SCENARIO (2): considering (1) when 2xduplication jobs are forked from single bptm writing to -> Different Tape Drives -> on Different Hba ports -> connected to different fabric switches. (processess involved bptm controlling 2xtape drive , bpdm reading from disk)
I get Throughput 80MB/sec - 85MB/sec. But, Ideally I should get the same throughput as in Senario (1). i,e 120MB - 140MB.
The only change between the SCENARIO (1) and SCENARIO (2) is NUMBER OF DUPLICATION JOBS.
This clearly seems like some kind of bptm process contention and not resource contention
10-13-2011 02:43 AM
Hi
Your disk buffers look quite small and your number of buffers quite high
An example of what i have found works really well in multiple drive LTO4 or 5 environments is as follows:
SIZE_DATA_BUFFERS 2162144
NUMBER_DATA_BUFFERS 32
SIZE_DATA_BUFFERS_DISK 1048576
NUMBER_DATA_BUFFERS_DISK 32
Do bear in mind that you need to test this using new tapes or any available expired tapes will need relabelling to overwrite the header with the new block size
Hope this helps
10-13-2011 05:06 AM
10-13-2011 04:02 PM
Hi Mark,
I guess you have mis typed the SIZE_DATA_BUFFERS 262144
I have mentioned the figures in Kilobytes (you might have missed the tailing 'k'). The number you mentioned exists in my tested block size i.e (262144? = 256k). So the our SIZE_DATA_BUFFERS is 512k = 512x1024 = 524288)
Regards to NUMBER_DATA_BUFFERS, yes it is high because, we have sufficient RAM in relation to the number of concurrent process.
Thanks for the suggestions though, but I have done extensive test before I begin to use those numbers.
THE POINTS TO BE FOCUSED ON ARE:
OS Version: Solaris 10
Server Hardware: SPARC-Enterprise-T5220.
Netbackup Version: 6.5.5
10-13-2011 04:25 PM
This looks like an interesting problem. Can you clarify (somehow in words) your HBA and SAN and storage topology and zoning?
10-13-2011 07:43 PM
Share ther OS you have and we can help you to get the proper zoning and binding of the box just to confirm how are your drives setup, with this we can confirm drives are split across the ports, guessing or thinking they might be split is not enough.
Regards.
10-13-2011 08:52 PM
FABRIC:
ZONING: we use soft zoning (wwn based)
OS Version: Solaris 10
Server Hardware: SPARC-Enterprise-T5220.
Netbackup Version: 6.5.5
THE POINTS TO BE FOCUSED ON ARE:
IF THERE IS A PROBLEM WITH THE CONFIG I SHOULD SEE THE ISSUE DURING BACKUP, RESTORE, 1XDUPLICATION.
BUT
1.I DO NOT have any PERFORMANCE issues doing BACKUPS, RESTORES, 1xDUPLICATION. Infact we are getting optimal figures.
2.THE ISSUE only happens when we start using 2 or more Duplication jobs.
For Example:
MediaServer03 –>hba01 ->switchA->(Tape3, Tape4, Tape5)
MediaServer03 –>hba03 ->switchB->(Tape1, Tape2, Robot Contol)
Scenario 1: (2xduplication jobs running at the same time from 2xSLP’s)
SLP1 configured with 1x Duplication job -> writing to Tape1: I get 120+MB/sec
Processess involved: (1x bpdm job @120MB/sec disk read, 1xbptm writing at 120MB/sec to tape)
SLP2 configured with 1xDuplication job-> writing to Tape3: I get 120+MB/sec
Processess involved: (1x bpdm job @120MB/sec disk read, 1xbptm writing at 120MB/sec to tape)
Scenario 2: (2xduplication jobs running at the same time from 1xSLP)
SLP1 configured with 2x Duplication job
Duplication job 1-> writing to Tape1: I get 80 - 85MB/sec
Duplication job 2-> writing to Tape3: I get 80 - 85MB/sec
Processess involved: (1x bpdm job @80MB/sec disk read, 1xbptm writing to 2 x tape drives at 80MB/sec to tape)
NOTE: Bpdm slows down because, bptm is unable to go beyond 80-85 MB/sec and hence it waits for the empty buffers.
10-14-2011 01:30 AM
Fully with you and your setup and tuning tests now.
The issue here is that you use inline copy and its handling of the bptm process which is a know problem.
This tech note explains it better http://www.symantec.com/docs/HOWTO56160
It maybe that you can think about this to see if there is a bottleneck anywhere that could help imporve things but the issue stems from using inline copy
Hope this helps
10-14-2011 04:59 AM
Thanks Mark for the pointer.
"Inline Copy (multiple copies) takes one stream of data that the bptm buffers receive and writes the data to two or more destinations sequentially"
Well that explains the issue.