Re: LTO9 tape performance poor - bottlenecks?

cian · ‎10-10-2023

Hi all

We've recently completed a staged total replacement of our environment from NBU8.2, writing first to HPE MSA2040 storage and then out to LTO5 drives; to NBU9.1 writing to HPE MSA2060 storage and out to LTO9 drives

Write performance on the 2040/LTO5 setup was approximate 50GB/hr/drive * 4 drives (* 2 setups, but each library was writing data from a single MSA); and this jumped to approximately 200GB/hr/drive when we moved to a 2060/LTO5 setup. This improvement was badly needed as I had reduced our tapeouts to a bare minimum to ensure they actually got written out. Turning everything that should be written out back on got us close to exhausting the write-out capacity of the setup again.

Now we have LTO9 drives, I am seeing approx 250GB/hr/drive * 3 drives on each site, which is less than we were getting with the four LTO5 drives. Some writes peak to 300GB/hr, but this isn't common. This is causing backlog issues again.

This is a fraction of the rated read speeds for the MSA2060, the write speeds on the LTO9 libraries and the fibrechannel throughput with nothing showing saturation at any point. Back when we had the MSA2040s I would frequently see waits recorded in the duplication job logs but this isn't the case anymore

Is there something obvious I am missing such as a rate limit set somewhere? Or is there a cap on encryption speed? We use the drive native encryption handed via an ENCR_ media pool and I can confirm that the LTO9 drives are encrypting.

CConsult · ‎10-10-2023

Hi,

how exactly are you measuring the GB/h?

Theoretically a LTO9 tape has a maximum native rate of 160-180 GB/h

and LTO5 should be somewhere at 60 GB/h

Things that can improve transfer rates on tape

1. cache to reduce "shoeshining" - to have a consistent stream of data

2. Multiplexing to write to tape with more than one job

3. HW compression

4. Check everything else - drivers / connection / throughput of source system etc.

cian · ‎10-10-2023

Counting the number of bytes shown as written by the duplication job, and dividing by the time since the job was assigned a drive. Crude but when trying to look for large variation it gives enough info.

Those figures for throughput seem on the low side compared to both HPEs specs and what we were getting - HPE claim 1.44TB/hr for LTO9, 504GB/hr for LTO5 - and we were getting 200GB/hr per single LTO5 drive, uncompressed but encrypted

I would presume I would see wait messages within a duplication job if there was a problem with data being provided (as I used to get when we were running at the maximum read performance of the MSA2040 kit). HW compression and encryption don't generally work together I thought?

Nicolai · ‎10-10-2023

Hi @cian

Tape subsystem performance has been discussed at VOX at lot of times. You may want to do searches for clues.

Also the "backup planning and performance tuning guide is worth reading":

https://www.veritas.com/support/en_US/doc/21414900-146141073-0/index

The number 1 tweak to look for first:

What is SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS set to ?

Default on SIZE_DATA_BUFFERS i 262144 and for NUMBER_DATA_BUFERS value is 64. Try increasing the value to 512.

https://www.veritas.com/support/en_US/article.100008276

Please change for all media servers.

Then second, we need to look at the "waited for" messages in the job activity monitor. Why - well we don't know if data is coming in to slow or out to slow. But we can get a indication

Waited for full buffer - data coming in to slow

Waiting for empty buffer - Writing to tape is a issue.

https://www.mass.dk/netbackup-guides/netbackup-buffer-tuning-2/

cian · ‎10-11-2023

There's no settings files / registry entries for any of the settings so presumably all are default. I'm creating the NUMBER_DATA_BUFFERS files - the doc only mentions a reboot required for registry changes, will that be required for creating the files?

Not every job has waited for messages within it, indeed most don't - but I've found a few in some duplication jobs - some of them quite large in numbers:

11-Oct-2023 12:04:26 - Info bptm (pid=10408) waited for full buffer 2556 times, delayed 4739 times

By comparison, there is a duplication job that is writing out at an insanely slow pace - 363GB in 22 hours - that has no waited for messages at all. This 'just happens' sometimes, generally from one specific media server and killing the job will often result in another duplication starting from the same media server at ~200GB/hr speed.

cian · ‎10-11-2023

The answer to do I need to reboot the servers is no, they've taken the new buffer size on new jobs started since the change.

LTO9 performance is markedly better on the first duplications started since - between 600GB/hr to 800GB/hr - but the waited for numbers are stratospheric on one of them:

waited for full buffer 42564 times, delayed 75144 times

Coincidentally they're nearly all on the same media server, which is currently under immense load - two duplications running with 512 buffers, and three with the old default (there is a backlog on the LTO5 SLPs that we decided to let run to LTO5 tape, hence more jobs than LTO9 drives) - so that might now be a very temporary limiting factor.

Nicolai · ‎10-12-2023

NUMBER_DATA_BUFFERS will be read by the BPTM process when starting up. Long running jobs before the change will continue to use the old value.

To me the "waited for" doesn't look to bad. You can't avoid them, but keeping the numbers low should be keep a eye on

What OS are the media servers ?

If Linux then try the GEN_DATA directive. And no, this feature isn't available on Windows.

https://www.veritas.com/support/en_US/article.100030600

GEN data will generate data in memory and write the data to eiter disk or tape. This is a brilliant way to test if the tape subsystem has the intended performance. You can also "restore" the data, but no data will really be written to any location.

Duplication: If the duplication is from disk to tape, pls ensure you don't mix write and read workloads. It is really kill performance, especially on spinning disk

cian · ‎10-12-2023

Entire environment is Windows. Duplications are disk to disk to tape; storage array is spinning disk I believe (I don't admin it thankfully). Generally there would only be write operations to the secondary disk storage in the early morning; tape windows are currently 24/7 but if I can get out of a backlog situation I have been considering controlling the windows to avoid overlap.

I won't have any particularly large jobs to write to the LTO9 environment until the weeklies run on Friday night; but the old SLPs that are writing to the LTO5 drives have accelerated significantly now too - one completed at nearly native drive speed (job was a little above 500GB/hr) so I'm hopeful that this has solved the problem.

Nicolai · ‎10-23-2023

Hi @cian

I highly recommend not to mix read and write workload on spinning disk, performance does not degrade a bit, it degrades a lot. Pls consider to implement writes to disk during night time and SLP operation during day time. The advantage of day time SLP, is that they can be canceled for maintenance and Netbackup will automatic re-run the SLP operation.