Poor performance on NFS backup.

o1crimson1o · ‎10-13-2017

Master server (nbu1):

Gen9 HP, 2x 10G ethernet, 2x 8GB HBA.

Media server (nbu2)

Gen9 HP, 2x 10G ethernet, 2x 8GB hba

Storage server:

R730, 256GB RAM, 2x 10GB ethernet, only client reading data (or writing to is netbackup.

Doing iperf tests, I can get 9Gbit/s no problem (sending or receiving).

Drives are LTO7, media is LTO7.

Right now I'm doing some test back ups on 30TB of data.

rsyncing the data I can easily hit 100-200MB/s on big files.

Using my policy (which only uses 1 drive) I avg around 17-22MB/s being written to tape.

This seems really off.

I've done kernel tuning, some netbackup tuning

[root@nbu1 netbackup]# find .|grep BUFFER
./NET_BUFFER_SZ
./db/config/SIZE_DATA_BUFFERS
./db/config/NUMBER_DATA_BUFFERS
./db/config/SIZE_DATA_BUFFERS_DISK
./db/config/NUMBER_DATA_BUFFERS_DISK

These have been set on both nbu1 and nbu2.

I'm really at a loss right now where to turn to get better performance. Right now our LTO4 drives avg around 18MB/s So I'm finding it really unbelivable that The speeds I'm getting are it.

Also note, putting a client on the heads is not really an option (It would take a lot of retooling of our tools to make this happen.)

sdo · ‎10-14-2017

Try a disk speed read test on the client:

https://vox.veritas.com/t5/NetBackup/BPBKAR-TEST/td-p/633298

.

Try a GEN_DATA test from the client:

http://www.mass.dk/netbackup-quick-hints/netbackup-admins-test-tool/

o1crimson1o · ‎10-16-2017

Very nice I'll do some more testing.

o1crimson1o · ‎10-16-2017

So using disk performance tools, I get the kind of performance I would expect from good SSD (500+MB/s).

Backing up nbu2 to itself (since it's the media server) I see speeds that top out at around 83450KB/s with avgs during the prcess being 40000KB/s down to 17000KB/s Which is well below what I remember having using a different backup software.

I'm still really lost :(

Michael_G_Ander · ‎10-16-2017

Have you used the Veritas recommended kernel parameters for the linux media servers ?

You write the you have the buffer files setup, but not what values your using. Increasing these will use more shared memory. Think 256144 (256 KB) still is the prefered/recommended value for LTO drives.

Can you verify the bpkbar is waiting on empty buffers ? should be in the job details now.

On some OSes is better to not have a NET_BUFFER_SZ and just let the OS handle the tcp traffic

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

sdo · ‎10-17-2017

When you say disk performance tools, do you mean for example dd for raw LUN/partition read, or perhaps something like IOzone or IOmeter doing IO to/from a large private container IO test file... but that's not really my question... because to compare a disk IO exerciser with actually having to walk a file system and open files are two very different things.

Also be aware that any IO test file needs to be at least twice the size of RAM - usually I go for three times the size of RAM - in an effort to avoid/defeat file caching. e.g. IO testing with a 10GB file on a system with 128GB of RAM is going to give some totally unreal yet wow amazing IO rates. Try it with a 500GB test container file and you'll start to see some real world disk IO transfer/throughput rates.

What about trying a tar of the file system that you are trying to backup to /dev/null. Or maybe a gzip (but choose to simply store and not compress) of that mount-point / folder-tree / file-system to /dev/null. I guess you could even try a dd from /dev/zero to /dev/null for say 100GB just to see what the CPU is capable of. What I mean is you need to compare apples with apples, and reading and walking the same source storage in the same manner... i.e. via the file-system.

My actual question then is... this... did you not try the bpbkar read test to null... which will walk the file system. If the bpbkar test read speed is the same, or even nearly the same, as your real backup job speed then the problem is with the disk storage that you are reading, and not with NetBackup, or your client to media server connectivity.