cancel
Showing results for 
Search instead for 
Did you mean: 

May NIC bonding on linux media server cause backup failure with network error

rVikas
Level 5
Partner

Hello Everyone,

We have 4 media server running RHEL 6.1, netbackup 7.6.0.1.

Storage target is DataDomain

Master server is Solaris 10, running netbackup 7.6.0.1

 

all 4 media servers have bonding configured. ex.as below:

below is bonding status of one of the media server :

 

Link speed is 10gbps for each nic

[root@xxxxxx-2~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 2
        Actor Key: 33
        Partner Key: 221
        Partner Mac Address: 00:23:04:ee:c1:85

Slave Interface: eth4
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1e:27:57:1j:7g
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth5
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1e:27:57:1j:7g
Aggregator ID: 1
Slave queue ID: 0

================================================================

I am seeing my  huge file servers full backup is only failing with code 13/24 and its failing randomly after 50gb, or lets say 1 tb.

Even Oracle jobs are failing intermitently with Network error.

No issues with incremental backup.

================================================= 

Testing of one full backup with one of the media server which does not have bonding was successfull without any retry.

logged a case with Symantec still no luck yet.

so does it look bonding may cuase network drop.

 

Kindly adivse, also let me know if any recommneded bonding/ mostly used bonding mode.

What is the best bonding mode.

2 ACCEPTED SOLUTIONS

Accepted Solutions

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

i would say try bond mode as mode=1 (active-backup) for one media server and try the backups using the media server, if that works.. then discuss this configuration with Architect or Designer\ systems admins\Network admins then implement it .

you need to make sure that when you are using this mode 1, you need tp balance the interface/network usage across the switches to make sure the load it distributed evenly for all media servers with the help of network admins

View solution in original post

Nicolai
Moderator
Moderator
Partner    VIP   

For sure if backup run OK without bonding, then bonding IS the problem. 

"Testing of one full backup with one of the media server which does not have bonding was successfull without any retry"

Lot of documentation out there how to configure bonding, so I recommend take a look of the switch vendors documentation and team up with the network admin

View solution in original post

8 REPLIES 8

Nicolai
Moderator
Moderator
Partner    VIP   

Has the network switches been configured for LACP ?

Bonding on Linux is very stable, however if the switch side isn't configured for LACP network package drops will occour.

Veritas won't be able to help you - this is a network issue outside the control of Netbackup

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

i would say try bond mode as mode=1 (active-backup) for one media server and try the backups using the media server, if that works.. then discuss this configuration with Architect or Designer\ systems admins\Network admins then implement it .

you need to make sure that when you are using this mode 1, you need tp balance the interface/network usage across the switches to make sure the load it distributed evenly for all media servers with the help of network admins

Nicolai
Moderator
Moderator
Partner    VIP   

For sure if backup run OK without bonding, then bonding IS the problem. 

"Testing of one full backup with one of the media server which does not have bonding was successfull without any retry"

Lot of documentation out there how to configure bonding, so I recommend take a look of the switch vendors documentation and team up with the network admin

rVikas
Level 5
Partner

 

Hi Ram,

I willl try backup by modifying bnding mode as 1 (active-backup)..

I nned to :

just modifying bonding parameter

restart network services on that media server

or also do i need to modify anything from switch level as well.

 

Hi Nicolai,

At thi point i am unable to show management if bonding is the cause. i will try one more backup with same media server and see if its succeed.

 

 

Nicolai
Moderator
Moderator
Partner    VIP   

No problem 

Correct - after network config files has been edited you have to restart network services. Ensure you are connect via a terminal not requiring network connection. If you have a typo in the network configuration and restart IP services you may have no IP connection.

Mode 1 does not need switch side configuration - it basically just a active/passive setup where NIC redundancy is obtained. Network bandwidth is not increased.

Mode 4 require switch side configuration. Mode 4 increases network bandwidth and provide fault tolerance on the NIC level

Please see https://help.ubuntu.com/community/UbuntuBonding

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
You have actually confirmed in your post over here: https://www.veritas.com/community/forums/most-windows-20082012-file-servers-backup-fails-code-2413 that backups are successful where bonding is disabled. The post over here also confirms that bonding can cause a backup issue: https://www.veritas.com/community/forums/rhel-5-netbackup-7nic-bonding-question-and-issue

rVikas
Level 5
Partner

wiHi All,

 

sorry for late reply, as due to critical business days , i could not run the backup.

3 days backup i ran the backup again with a media server that does not have bonding. backup run nearby 58 hours to write 3.7 TB of data and completed in one attempt.

 

So i am going to test backup by changing bonding mode configuration to active-backup from Dynamic link aggregation.

I will change the bonding mode and will restart the network services and will try the backup. 

 

Thank you all for help.

bbahnmiller
Level 4

We are successfully running LACP bonding on our RHEL 5 systems. But we are using hash policy "layer3+4". Other than that, our bonding configuration looks to be the same.

We did have some issues getting this to work successfully. The RHEL bonding driver actually had some issues at one point. We had a bug that crept back in the bonding driver when we did a minor version upgrade at one time. And you definitely need to have your network team configure the switch ports in a corresponding bond configuration.

BTW, make sure that you are NOT running jumbo packets to the Data Domain. That gave us a bunch of status 84 errors. In fact, we ended up using 1500 MTU on all of our bonds (10 GbE.) That seemed to minimize strange errors we were seeing.

  Bryan