11-23-2017 03:07 PM
I have a weird performance issue I can't resolved.
I have an older HP DL380 with 4 x 1GB NIC"s in an LACP team and when I backup my NetApp to the AltaVault with more than 1 path, I get amazing performance at around 125MB/sec or 1000GB/sec. Once there's just 1 path to backup, it crawls at ~10MB/sec or ~40Mb/sec.
While this path is going slow, if I kick off another job, and it doesn't matter which one, both jobs will go to 125MB/sec total and the original path that was slow all of a sudden goes fast again.
This tells me it's not an issue of having millions of very small files but some sort of self-imposed bottleneck or throttling.
I have buffers set to 0 and I also applied all the performance tweaks in the AltaVault guide and I know it works well, providing I'm backing up 2 paths. Even 1 job with 2 paths is fast. Or 2 jobs with 1 path each are fast. The second there's just 1 path left to backup, it goes slow.
Hopefully someone else has seen this and will save me a support call.
Thanks.
11-23-2017 11:09 PM
I am certainly not a network expert, but is IMHO a network switch issue rather than NBU.
11-24-2017 02:31 AM
Have you looked at the hash algorithm on the lacp ? Have seen a 4 Gbit team/trunk never get about 1 Gbit due to wrong algorithm.
Do be aware that the algorithm on the network switch and connected boxes should match.
11-24-2017 06:16 AM
>>I get amazing performance at around 125MB/sec or 1000GB/sec. Once there's just 1 path to backup, it crawls at ~10MB/sec or ~40Mb/sec.
-
Gigabit Ethernet (1 GigE) = give maximum speed somewhere near 120 megabyte per second. Real speed for TCP without tuning - somewhere near 400-600 mbit/s or 40-60 megabyte per second . You can check it by iperf.
It's normal for one flow throw 1Gb ethernet interface. You can monitor it by taskmgr in Windows or ethtool eth1 / ethtool -S eth1 in Linux and you can see network switch statistic direclty (e.q. cisco -show interfaces Gi0/1) or with SNMP (zabbix /cacti /prtg /mrtg / etc).
When you have LACP ( link aggregation control protocol) / etherchannel) - then
LACP will not split packets across multiple interfaces for a single stream/thread. For example a single TCP stream will always send/receive packets on the same NIC
that mean that:
1. Usually you can have 120 megabyte per second total bandwith by using 4*30 mbps (4 data flow), but only 30-40 mbps for every single flow
2. You need equal network configuration on both side of the etherchannel - on the server and on the swith.