cancel
Showing results for 
Search instead for 
Did you mean: 

Slow backups with NBU 7.1.0.3 and MSL8096

Somniumus
Level 3

- Clustered master server with 2 Linux nodes running NBU 7.1.0.3.

- Two Windows Server 2008 R2 media servers on HP ProLiant DL380 G5 (2* E5410 @ 2.33GHz & 24GB RAM each)

- HP MSL8096 tape library with 3 drives (multipath connected, latest drive and library firmware installed), LTO-5 tapes only.

- A mix of Windows 2003 R2 and Windows 2008 R2 clients (VM's and physical systems - approx. 200+ servers). All clients are on NBU Client v7.1.0.3. Physical servers are backed up over the 1GB network. VM's over fiber.

Conducting tests using HP's Library and Tape tools (installed on one of the media servers) it shows the MSL8096 drives are able to backup 30GB of data up to an average speed of approx 310MB/s (unencrypted) and up to 260MB/s (with 2.1 encryption).

Though trying to backup physical systems over the 1GB network this remains between a disappointing 5 ~ 20MB/s. Even if one single backup job is started manually without any other jobs running this does not come close to the magic threshold of 50MB/s. Backing up shares with NDMP on our NetApp sometimes reaches the 40MB/s but still not enough.

Several of our config files with the following values (if applicable):

NUMBER_DATA_BUFFERS = 256

NUMBER_DATA_BUFFERS_DISK = 64

SIZE_DATA_BUFFERS = 524288

SIZE_DATA_BUFFERS_DISK = 524288

SIZE_DATA_BUFFERS_NDMP = 524288

It's very time consuming to get this resolved and quite honestly I'm lost where to look.

Any assistance is appreciated.

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Level 6
Partner    VIP    Accredited Certified

1 Gb network can at BEST receive data at 100MB/sec.

This means you will never be able to get close to 1 tape drive's capability.

Have you configured Multiplexing in STUs as well as in schedules?

Have you increased Max Jobs per Client?

99.9% of slow backups are due to clients' slow read speed from disk.
For this reason, we use multiplexing to increase number of simultaneous jobs to a tape and so increasing transfer rate.

Enable bptm log on media servers as a starting point. 

At the end of backups, look for 'waiting for full/empty buffer'.
'Waiting for full buffers' means that the media server was waiting for data - problem is on client or network.

Please see these URLs for performance tuning tips:

http://www.symantec.com/docs/TECH147296

 
 
 

View solution in original post

7 REPLIES 7

Marianne
Level 6
Partner    VIP    Accredited Certified

1 Gb network can at BEST receive data at 100MB/sec.

This means you will never be able to get close to 1 tape drive's capability.

Have you configured Multiplexing in STUs as well as in schedules?

Have you increased Max Jobs per Client?

99.9% of slow backups are due to clients' slow read speed from disk.
For this reason, we use multiplexing to increase number of simultaneous jobs to a tape and so increasing transfer rate.

Enable bptm log on media servers as a starting point. 

At the end of backups, look for 'waiting for full/empty buffer'.
'Waiting for full buffers' means that the media server was waiting for data - problem is on client or network.

Please see these URLs for performance tuning tips:

http://www.symantec.com/docs/TECH147296

 
 
 

BTLOMS
Level 5

do SAN based backups for best perfromance. Or implement SLP.

Somniumus
Level 3

Thanks Marianne!

It looks like setting Multiplexing in schedules solved my problem - it was set to 1 and changed it to 4 for testing purposes and started a backup; whole process ran on approx 70MB/s instead of e.g. 20MB/s

Question now is, what would be the appropriate setting for Multiplexing in schedules? Is it dependant on several factors?

mph999
Level 6
Employee Accredited

A MPX setting of about 4 seems to be generally accepted as a good balance between backup speed and restore speed.

The higher the mpx value, 1. The more memory you need (each stream takes up 'buffer' space, and 2. How long a restore takes.

A mpx backup is of course multipple clients data 'mixed' together, so for a estore, all of it has to be read, even if only restoreing data for one client.

Just occasionally, I see issues where a restore is running slowly, to find that a very high mpx value whas used >12 that soet of value.

I have seen a restore that was taking >36 hours.  Re-running the backup with a sensible mpx value allowed exactly the same data to be restoed in a much much short time (can't remember the exact value, but it was single figures).

Martin

Somniumus
Level 3

Thanks for the useful info Martin.

Would you say the buffer sizes are correct to use with a multiplexing setting of 4?

 

Marianne
Level 6
Partner    VIP    Accredited Certified

You need to check bptm logs to see what the effect is on throughput and the difference in 'waits' at the end of the backup.

The key to finding MPX value that is good in your environment, is to increase in small increments and perform restore tests.
Test restore with MPX value at 4. Record result.
Try MPX of 6 for backups; do test restore.
See if there is significant difference in restore speed/time.
 

mph999
Level 6
Employee Accredited

Difficult to say, as Marianne has pointed out below you may find better performance with 6, but then again, you have a trade off with restore speed,

In a nutshell, you'll have to do some serious testing, run some backups, then try a restore and see how long it takes - if the time is acceptable to you, then you have the answer.

Be aware, there is nothing that can be done to speed up a restore that is slow due to a high mpx value.

Also, the higher the mpx, the more memory required ...

 

 
The total amount of shared memory that is used for each tape drive is:
 
(number_data_buffers * size_data_buffers) * number_tape_drives * max_multiplexing_setting
 
For two tape drives, each with a multiplexing setting of 4 and with 16 buffers of
256KB, the total shared memory usage would be:  (16 * 262144) * 2 * 4 = 32768 KB (32 MB)
 
 
So from this we can see that suddenly increasing MPX to x16 across one of more drives causes a massive increase in the amount of memory required.
 
 
Additionally ...
 
 
From my bptm log, we see I have x12 data buffers, each with a size of 131072 ...
 
08:55:45.665 [18200] <2> io_init: using 131072 data buffer size
08:55:45.665 [18200] <2> io_init: CINDEX 0, sched Kbytes for monitoring = 60000
08:55:45.665 [18200] <2> io_init: using 12 data buffers
 
 
... therefore, each tape drive, or each stream to a tape drive will require 131072 x 12 = 1572864
 
 
Now this example is actually from a MPX backup with 2 streams, so you might think that the amount of shared memory will be 1572864 x2 = 3145728 
 
Now, here is the catch ...
 
Looking down my bptm log I find these lines ...
 
08:55:45.748 [18200] <2> mpx_setup_shm: buf control for CINDEX 0 is ffffffff79a00000
08:55:45.748 [18200] <2> mpx_setup_shm: shared memory address for group 0 is ffffffff76800000, size is 6291456, shmid is 117440636
08:55:45.748 [18200] <2> mpx_setup_shm: shared memory address for CINDEX 0 is ffffffff76800000, group 0, num_active 1
 
 
So we see the amount of memory is 6291456
 
Now, 6291456/ 1572864 = 4
 
So, what has happened, is even though I have one tape drive, and 2 streams the amount of memory NBU will allocate is the amount of memory required by x4 streams - NBU has 'rounded it up' .  In fact, it will round up to the nearest factor of 4, so it you have 5 streams, it will allocated the same amount of memory as if it was 8 streams.  I cannot remember the exact explanation for this, it was once explained to me by a BL engineer, and although I can't remember the explanantion, NBU has always worked this way and it done for efficiency.
 
Martin