May NIC bonding on linux media server cause backup failure with network error
Hello Everyone,
We have 4 media server running RHEL 6.1, netbackup 7.6.0.1.
Storage target is DataDomain
Master server is Solaris 10, running netbackup 7.6.0.1
all 4 media servers have bonding configured. ex.as below:
below is bonding status of one of the media server :
Link speed is 10gbps for each nic
[root@xxxxxx-2~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 33
Partner Key: 221
Partner Mac Address: 00:23:04:ee:c1:85
Slave Interface: eth4
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1e:27:57:1j:7g
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: eth5
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1e:27:57:1j:7g
Aggregator ID: 1
Slave queue ID: 0
================================================================
I am seeing my huge file servers full backup is only failing with code 13/24 and its failing randomly after 50gb, or lets say 1 tb.
Even Oracle jobs are failing intermitently with Network error.
No issues with incremental backup.
=================================================
Testing of one full backup with one of the media server which does not have bonding was successfull without any retry.
logged a case with Symantec still no luck yet.
so does it look bonding may cuase network drop.
Kindly adivse, also let me know if any recommneded bonding/ mostly used bonding mode.
What is the best bonding mode.
i would say try bond mode as mode=1 (active-backup) for one media server and try the backups using the media server, if that works.. then discuss this configuration with Architect or Designer\ systems admins\Network admins then implement it .
you need to make sure that when you are using this mode 1, you need tp balance the interface/network usage across the switches to make sure the load it distributed evenly for all media servers with the help of network admins
For sure if backup run OK without bonding, then bonding IS the problem.
"Testing of one full backup with one of the media server which does not have bonding was successfull without any retry"
Lot of documentation out there how to configure bonding, so I recommend take a look of the switch vendors documentation and team up with the network admin