cancel
Showing results for 
Search instead for 
Did you mean: 

Experts opinion on client server interface vs network interface

DPO
Level 6

We have a situation where NetBackup hosts (master, media and clients) have 1 G interfaces, but underlying network link capacity is 10 G. There is only 1 interface on all clients i.e. 1 G. These are mostly huge databases. I'm raising a concern that though network link speed is 10G , actual client server interface capacity 1 G, this doesn't add any value for the data being transferred from the client. Is my argument correct ? Please add your thoughts on interface speed vs network link speed and how it affects backups or client server performance for busy environments.

1 ACCEPTED SOLUTION

Accepted Solutions

sdo
Moderator
Moderator
Partner    VIP    Certified

When Nicolai says "LACP is not as effective as 10G, but much better than 1G."

...I think he is referring to this:

Balance-alb bonding wouldn't normally be used because balance-alb manages ARP replies from different hosts.  As ARP requests are only ever sent to devices on the local LAN ‘IP Subnet’ that means that balance-alb will only provide balancing for local IP addresses.  All other requests off of the subnet will go to the same NIC as they will all originate from the same source, i.e. the subnet’s gateway.

It is better to implement bonded resiliency and balancing by using LACP and thus have the LAN switches load balance across the links to the backup server/appliance.  Using the ‘source and destination IP’ configuration of LACP will result in a fairly even distribution across all links.  Although the destination IP address will always be the same, the source will vary and thus result in some variation - but note that LACP is not guaranteed to always result in variation.

The net effect of say 4x1Gb in an LACP bond is that you cannot guarantee to get 4Gb ingest at the backup server.  Sometimes four backup clients will be spread nicely across the four links, and sometimes the four backup clients will be assigned to just one of the 1Gb links of the backup server, and so each client will only get around 300Mb/s, and the other 3 x 1 Gb NICs on the backup server will sit there doing nothing.  So, if you need to get closer to guaranteeing that many 1Gb based backup clients can all get close to concurrently sending at 1Gb and that NetBackup server/appliance can get close to concurrently receiving at 4Gb, or better still exceeding this, then you need at least one 10Gb link on your backup server.

HTH

View solution in original post

8 REPLIES 8

Marianne
Level 6
Partner    VIP    Accredited Certified
It would be great if all servers could be on 10Gb network, but before you submit such a request - can you actually proof that the 1Gb interface is actually the bottleneck? Are that clients backing up at close to 100MB/sec sustained throughput? Can you run backup tests on the clients to a null device to see what the max speed is at which each client can produce a backup stream? And do you have sufficient infrastructure on media server(s) to handel increased throughput?

Nicolai
Moderator
Moderator
Partner    VIP   

I would ask for the backup servers to be attached to 10G, that would in theory allow you to backup 10 client at full speed vs only 1 today.

Backup of client at full 1G (100MB/sec) is relative rare.

You should ask network department if ports are blocking or non-blocking.

A blocking port share a given bandwith with more client, typical 10 client sharing 1G, while a non-blocking port give access to the full bandwith.

Blocking ports are cheaper that non-blocking.

 

 

DPO
Level 6

Thanks for your responses.

 

I have seen in our cases that our DB backups run with 4 channels . Each channel with a speed of 30 MB/Sec (total > 100 MB/sec). I'm just worried to use the same interface again for actual DB listener communications. Though network guys are claiming that network link/device support 10G, in the end client server has only 1 G interface which I feel will create a bottleneck. Ours is quite busy environment.

 

What is your suggestion now ?

sdo
Moderator
Moderator
Partner    VIP    Certified

If you are meeting your backup SLAs, then you do not have a problem.

You only have a problem when you are not achieving your backups within your backup window - or possibly about to have a problem if you are only just completing backups within the backup SLA window.

If you have no SLA, then it could be argued that you can never have a problem ;) 

.

Hi Ashok, I'm no expert, but I have been known to have an opinion.  I would always go for 10Gb connectivity on the LAN ingest ports of a medium to large sized NetBackup Server or NetBackup Appliance, just as Nicolai suggests.

Any responsiveness/latency caused by a flooded network port will be largely down to the implementation of the TCP driver, and how it allows traffic into its buffers and the round robin nature of shared resources.  The only way to detect whether you have an application response time issue (whilst a NIC appears to be flooded) is to somehow actually measure the application response times, i.e. do not just measure the packet response times.  In short, a full network port is not symptomatic of an issue.  A full network port can easily and equally be construed as nothing more than making full use of available resources.

This link (scroll down when opened) has a nice description of what Nicolai was referring to, re: blocking vs. non-blocking.

http://www.lantronix.com/resources/networking-tutorials/network-switching-tutorial/

...and the topic of "blocking vs. non-blocking" is of significant importance for both designers and implementors of backup AND also for designers and implementors of storage.

e.g. I have LAN switch port of 40 x 1Gb ports.  Behind these 40 ports, there are 4 x ASIC chips.  Each ASIC looks after 10 ports and each ASIC can only handle 2Gb.  My LAN switch appears to be able to handle 40 x 1Gb =  40Gb... but really it can only handle 4 x 2Gb = 8Gb (the aggregated total of the ASICs), i.e. inter-port switching within the LAN switch can only handle 4 x 2Gb = 8Gb.  This issue is further compounded when there are say only two up-link out of the LAN switch into the LAN switching infrastructure, e.g. only 2Gb out - in which case all out-going traffic from the LAN switch is only capable of 2Gb, and so if all 38 remaining ports want to send out, then that 2Gb is shared/divided across 38.

Also, due the ASIC bottlenecks, do you see how LAN and SAN switches which are "blocking" can become very sensitive to where cables are plugged in.

In short, to really understand whether you have a bottleneck... then viewing numbers using OS tools is only ever half the story.  One has to understand ones switching infrastructure in greate detail to be able to spot bottlenecks.

.

Of equal importance for backup and storage, is another related topic in the above link... that of "store and forward" vs. "cut-through" routing.  Here's a very nice article with (futher down once opened) a very nice description of cut-through routing:  (i.e. packet/frame routing (not inter-LAN routing)):

http://community.brocade.com/t5/Mainframe-Solutions/Building-FICON-I-O-Fabric-Super-Highways-in-the-...

Nicolai
Moderator
Moderator
Partner    VIP   

I have seen in our cases that our DB backups run with 4 channels . Each channel with a speed of 30 MB/Sec (total > 100 MB/sec)

Then I would argue missed backup windows is just  a question of time. Bottlenecks should be removed as they appear, and not piled up.

You should ask for a 10G NIC. Alternative get 2-4 1G and configure them in a LACP type 4 configuration. LACP is not as effective as 10G, but much better than 1G.

LACP type 4 require both server and network side configuration.

 

 

sdo
Moderator
Moderator
Partner    VIP    Certified

When Nicolai says "LACP is not as effective as 10G, but much better than 1G."

...I think he is referring to this:

Balance-alb bonding wouldn't normally be used because balance-alb manages ARP replies from different hosts.  As ARP requests are only ever sent to devices on the local LAN ‘IP Subnet’ that means that balance-alb will only provide balancing for local IP addresses.  All other requests off of the subnet will go to the same NIC as they will all originate from the same source, i.e. the subnet’s gateway.

It is better to implement bonded resiliency and balancing by using LACP and thus have the LAN switches load balance across the links to the backup server/appliance.  Using the ‘source and destination IP’ configuration of LACP will result in a fairly even distribution across all links.  Although the destination IP address will always be the same, the source will vary and thus result in some variation - but note that LACP is not guaranteed to always result in variation.

The net effect of say 4x1Gb in an LACP bond is that you cannot guarantee to get 4Gb ingest at the backup server.  Sometimes four backup clients will be spread nicely across the four links, and sometimes the four backup clients will be assigned to just one of the 1Gb links of the backup server, and so each client will only get around 300Mb/s, and the other 3 x 1 Gb NICs on the backup server will sit there doing nothing.  So, if you need to get closer to guaranteeing that many 1Gb based backup clients can all get close to concurrently sending at 1Gb and that NetBackup server/appliance can get close to concurrently receiving at 4Gb, or better still exceeding this, then you need at least one 10Gb link on your backup server.

HTH

Nicolai
Moderator
Moderator
Partner    VIP   

Yes, you are right in a highly technical fashion laugh

From my own practical use, bonding more than 4 NIC's does not add noticeable bandwidth.

sdo
Moderator
Moderator
Partner    VIP    Certified

Ashok - your LAN switches should have a choice of several different load balancing algorithms on "ether-channel" (aka LACP) bonds, such as:

src-mac, src-ip, dst-mac, dst-ip, src-dst-mac, src-dst-ip

...and this choice of "ether-channel" load balancing algorithm is usually a switch wide setting, i.e. the same algorithm is used for all ether-channel bonds on a switch, i.e. you cannot usually have one ether-channel using src-mac and another ether-channel on teh same switch using src-ip.

The way these algorithms will actually distribute load is utterly dependent upon your connectivity, and so... it some odd situations some algo's are better in certain/differing situations.   To really understand what is happenning on your backup server LAN ingest ether-channels, you need to sit down and think about where your backup data is being sent from and where it is arriving... i.e. how many different MAC addresses are involved (for src-mac, src-dst-mac, dst-mac), or how many different IP addresses are involved (for src-ip, src-dst-ip, dst-ip).