cancel
Showing results for 
Search instead for 
Did you mean: 

Mismatched MTU sizes between Media Server & Clients

RonCaplinger
Level 6
Backups from our Solaris 10 NBU media server are slower than we expect (10MB/sec to LTO3 drives, with nothing else running).  The Solaris media server is set up for an MTU of 8000, and all of the clients (Windows & Solaris) are set to an MTU of 1500.  I suspect that this mismatch could be the cause of (or contributing factor to) the slow backups.

I would expect that TCP/IP would negotiate down on the media server from 8000 to 1500, but is there anything I can check to confirm my suspicions?  FYI, I have already gone through the Performance & Tuning Guide and performed the steps there, and MTU is only mentioned in passing in one section. 

And does anyone have any supporting docs that I can show to my co-worker that the higher MTU size on the Solaris 10 media server could be causing performance issues and should be set back down to 1500?  We apparently have a lot of instances of MTU sizes being mismatched (but they arell seem to be 1500 or below, and not in the Jumbo range) but they haven't caused any problems before, and he wants to see some documentation to that effect. 
4 REPLIES 4

Scott_Lundberg
Level 4
Ron,

The only way I know of see what is happening on an Ethernet Frame level is to put a sniffer on one host or the other and capture the packets...
wireshark.org
has a free one that is pretty good at decoding common protocols.

BTW, what is the speed of the network?  on a 100mbit network, 10MB is the best you will ever get.  Maybe the frame size is not the problem...

Scott

S_H1
Level 5
Partner

snoop (solaris) and Network monitor (Windows) can also be used instead of wireshark (or Etheral),


You should measure the disk (read) speed independent of the network components and tape components.

Check [Performance & Tuning Guide P.92 "To measure disk I/O using bpbkar"]

*** Perform bpbkar (disk read only) by Solaris client
 bpbkar -nocont -dt 0 -nofileinfo -nokeepalives <PATH> system > /dev/null

*** Perform bpbkar (disk read only) by Windows client
 bpbkar32 -nocont <PATH> > nul 2>&1

*** Perform local backup by Media server



 

RonCaplinger
Level 6
Thanks for all the advice!  Some more backstory:

I've already gone through the Performance & Tuning guide.  That's where I started realizing the test results were coming up inconsistent from one test to the next.  A regular Win file test one day would give us 10MB/sec, and another test it might be 30MB/sec.  The Exchange backups, however, never get higher than about 10MB/sec per stream.  We can break that backup by Storage Group and still get 10MB/sec per stream, for a total of 40MB/sec, whether going to one tape or 4 different ones.  So we really have two problems: inconsistent backup speeds of the same data going to the same drives, and Exchange backups that used to run at least at twice this speed, but all backups seemed to have slowed to current levels when the media server was updated from Solaris 9 to 10.

The NIC's and switch settings on all involved are set to auto/auto, negotiating to 1000/Full.  Traceroute shows no hops; network guys say that the switch can't go higher than 1500, but thinks these servers should be negotiating down to that.  None of the logs are reporting any dropped packets or retransmits.

My concern with the MTU on the Solaris box is that Jumbo Frames are not industry standard and I've read that some manufacturers might implement the negotiation process differently.  That's why I'm suspicious of the Solaris media server's high MTU.

Ran the bpbkar32 on the Windows (Exchange) client, 50MB/sec. throughput.

Ran the bpbkar on the Solaris media server, 56MB/sec. throughput.

Ran self-backup of the Solaris media server (it is also our master), getting about 35MB/sec.  Other jobs are also running to other tape drives on it, so I could accept that they could account for some of the slower performance of this backup.


RichardXClark
Level 4

We have full jumbo frames support throughout our gig datacenter, end to end.
We have not yet implemented it.
AIX master / media server is 1500MTU & so are wintel boxes, exchange etc.
We get 10MB per storage group / file system volume / stream

It seems that most customers experience this on wintel

I have duplicated the setup using Legato Networker with dedicated hardware & the rates are just the same.

Our AIX boxes can send to the master / media at 40+ per stream

We are looking at non LAN methods to backup large systems in order to meet our restore SLA. Maybe dedupes, staging, SAN snaps. We have a design excercise ongoing.