Solved: Still troubleshooting NetBackup Performance issues...

jmanini · ‎05-02-2012

My tapes drives are failing too often and Quantum says I'm not getting data to the tape drives fast enough...

Here is what I'm seeing in my bptm log

write_backup_completion_stats: waited for full buffer 571151 times, delayed 660055 times

Netbackup 7.1.0.3

I have 2 windows servers 1 master\media & 1 media

Windows 2008 R2 96GB of ram each with dual 6 core

I wrie directly to tape (Quantu, i500 with 10 lto5 tape drives)

I have 4 1GB nics teamed for incoming traffice

Each server has 2 QLogic 4GB fiber cards connected to tape library

D:\Netbackup\db\config

NUMBER_DATA_BUFFERS = 128

NUMBER_DATA_BUFFERS_DISK = 128

NUMBER_DATA_BUFFERS_RESTORE = 32

SIZE_DATA_BUFFERS = 262144

SIZE_DATA_BUFFERS_DISK = 1048576

C:\Program Files\Veritas\NetBackup

NET_BUFFER_SZ =1050624

NET_BUFFER_SZ_REST =1050624

Nicolai · ‎05-02-2012

The buffers only enables fast transfer between host and device. If data can't get to the host - you have another ball game that buffer tuning can't solve.

Has the network admin verified the network infrastructure ?. Not getting a teaming right may impact performance in a negative direction.

By the way:

Set NUMBER_DATA_BUFFERS_RESTORE to a higher value than 32 - I suggest 128 or 256.

View solution in original post

Nicolai · ‎05-02-2012

The buffers only enables fast transfer between host and device. If data can't get to the host - you have another ball game that buffer tuning can't solve.

Has the network admin verified the network infrastructure ?. Not getting a teaming right may impact performance in a negative direction.

By the way:

Set NUMBER_DATA_BUFFERS_RESTORE to a higher value than 32 - I suggest 128 or 256.

mph999 · ‎05-02-2012

Looks like Quantum are right ...

If you log a call with Symntec,and ask very nicely - we'll run AppCritical and see if there are Network issues ...

Martin

Nicolai · ‎05-02-2012

A rather comprehensive blog about teaming:

http://ramazancan.wordpress.com/2011/08/06/how-to-configure-nic-teaming-with-hp-proliant-and-cisco-o...

jmanini · ‎05-02-2012

Thank for the reply and I will make the buffer change. I'm the one who setup the NIC teaming and as far I can see all is looking good there. Even at a busy time I dod not see the incoming traffic over burdening the team... I setup the configuration as Symantec suggested with regards to what to disable and such..

Nicolai · ‎05-03-2012

You can't see if packet loss is extensive by only looking at the teaming. You need to run some performance tools to tell.

mph999 · ‎05-03-2012

NIC teaming often causes issues - but perhpas it is something else ? We do not know.

It might be worth disabling teaming just to see if this clears the problem, then if it does work, you know to investigate other areas.

Martin

devendra_singh · ‎05-13-2012

I have taken backup of 32GB and when we restore it ,restore performance has drastically dropped

I have done all above setting but restore performance has not any effect it remain same.

Kindly help us.

Omar_Villa · ‎05-13-2012

If you File System was formated with a small block size it will create a huge bottleneck because you are pushing big blocks and the FS will need to split them creating a cache which it will hold the data, also if your LUN's and RAID at the SAN level was created with a different Block size than the FS and NBU Buffers you will kill the array and everything will be hold on the cache either trying to slice the blocks or put them back together.

When I have infrastructure performance issue I always start with the Arrays because is typical that the SAN Engineers configure all their Arrays tuned for DB's and normaly a DB demands a High-IO vs Low-Bandwidth because it moves a lot of Small blocks, on the Backup world is totaly the oposite we have a Low-IO vx a High-Bandwidth this because bigger the blocks higher the bandwidth demand but smaller the IO requests, so why you dont share with us:

1. FS Format Data and Type (MSFS or VxFS)

2. LUN's Block Stripe Sizes and RAID info at SAN Level

3. Array Type and Vendor, there are some tricks that you can do depending of the Array mainly are 3:

A. Enable Read-Ahead Cache (Prefetching)

B. Raize the Cache Block Size to the Highiest you can i.e. a CX4 witout FAST CACHE can only go up to 16KB but a HDS AMS2500 can go up to 512KB and you can partition the Cache on this array to better serve the LUN's you want

C. Lower you Qdepth to 8-16

This is a different perspective but I normaly fix all global performance issues tuning the SAN Arrays first.

Hope this helps.

Regards.

V4 · ‎05-17-2012

Check for block size on Tape drive. What is the setting. What is the Paging setting, As you have memory of 96GB Probably your paging would be 144 GB. (If you have corrected it ). We do not require this much huge size of Virtual Memory if we have 96 GB of Memory (if this is completely utilized then can be considered).. Try to reduce Paging file..

mph999 · ‎05-17-2012

Was the backup multiplexed ?

MikeM11 · ‎07-11-2012

I have a similar environment but with HP EML library/drives and same fibre/teamed NIC config and have recently run through a few performance improvement measures within my Windows 2008 R2/NetBackup 7.1.0.4 environment and found a big benefit in disabling TUR on the Media Servers.

Where you have SSO the Test Unit Ready polling can sometimes hugely impact performance of backups going to tape. Please see Microsoft KB842411 for procedure. It's an article for Windows 2003 but appears to apply to 2008 as well.

I hope this helps, Mike

VOX

Still troubleshooting NetBackup Performance issues