NBU Master Server: 188.8.131.52
Client 7.5.07/ Client Name :test.123
I have gone through all Tech Notes and found 2 mount points is creating issue and for that i have enabled logging to 5 and also created touch file bpbkar_path_tr and fired backup.It fails with 24.
Can someone please check the logs and suggest what needs to be excluded.So far there is no exclusion list.I have ached bpbkar logs and it seems there is some issue.Please can someone check...
Solved! Go to Solution.
The above settings proof that the issue is not with any NBU timeout settings.
You need to work with OS and Firewall admins to find out where this 15 minute timeout is happening.
As per Martin's excellent post:
"NBU is the casualty in this issue, not the cause."
PS: Those timeouts are way too big and can cause backups to appear to be hanging for almost 3 hours before NBU timeout will kick in.
There should be no reason for timeouts larger than 1800.
The problem is, these are tuning settings, what works for one environment, may not work for another - hence why we can't really provide a document, beacuse it is unique to your enmvironment.
The last 'case' I had that was along these lines, these are the settings that was found to work:
However, the system these were applied to was 'probably' totally different from your environment, so you can see why I am reluctent to start dishing out values, and recommend you to work with the network/ OS admins.
Strange thing is i initiated Diff Backup which completed successfully after 4 attempts but Full Back fails after 15min...
1 attempt took 15 min
2 attempt took 15 min
3 attempt took 15 min
4 attempt took 1hrs and backup completed.
Can you guys please suggest me what parameters needs to be checked from OS and Networking side.Customer is saying other systems are working with the same configurations.So now need to be very specific what needs to be checked...The System is Linux Box.
We cannot say as the error is not caused by NBU.
Your customer needs to understand:
NBU is the casualty. Not the cause.
Your customer will have to investigate to see 'what is different' on this Linux box.
A couple of years ago we had a similar situation where backups kept on failing for certain Linux clients - especially over weekends when large, full backups were running.
All troubleshooting from NBU point of view did not reveal anything and no co-operation from OS team.
We simply VPN'ed in over a weekend and resumed the backup every time we saw status 24 (with checkpoint restart we got a bit further each time...)
At some point, these problematic Linux clients got replaced with new hardware and obviously newer OS version.
Status 24's magically disappeared.
Do you have a support agreement with Symantec.
If so, log a call and ask to run Appcritical between the Media server and the client, and then between the client and the media server (it only goes in one direction, hence why run twice).
If not, there is a free alterative to AppCritical, hopefully someone on here will know what it is called, as for the life of me, I can't remember.
the problem with this is, how many possible causes would you like :
Faultly hardware (including cables)
Drivers or Firmware (on any of the hardware involved (eg NIC card, switch)
Faulty ports on switch
Firewalls / Routers
OS settings (we have discussed some, did you try the ones I posted up ?)
** If you change them make a note of what they were, if the new setings don't fix, or improve, put them back else we could start introducing more faults that cause the same symptoms, which will be very very hard to sort out) **
I've even seen an error where the network card wouldn't send a particular type of data, all was ok until it hit a certain file, and just wouldn't send it ...think it was a .tar file, can't remember the fix though I think it was hardware related.
When the issue happens again, at the 'exact' time of failure - run netstat -a and attach the output to a file on this thread.
Did this client ever work, and if so, when did it stop working.
If so, on the day it stopped working, what was changed, bause it's 99.9% certain that something has - and this could be the key to finding the cause. Your customer is going to have to try and remember, because I really don't believe nothing changed.
If none of the above lead to anything then I can only think to get a tcpdump of the interface whist the backup is running and, until it fails, then look at it in wireshark (free) - and for that, you will really need to find a network type person who is used to looking at tcpdumps. From that, it should be possible to see why it fails, or at the very least narrow it down.
Another source of these problems could be the TCP "ring buffers" and TCP offloading. We tried using some Cisco UCS blades as media servers last year and had to switch back to physical hardware when we had intermittent status 24's and 2074's. A couple of the steps we tried were to increase the TCP Ring Buffers to their max value, 4096, and disabled all TCP offloading.
Here's a link describing how the ring buffers in Linux work:
mph999 : Before backup fails the backup speed was very good but it stucks anywhere and then within 15min it will throw error 24.I have re-installed the client binaries thinking may be the binaries are corrupt but no luck.
RonCaplinger: I will check with linux guy on this TCP "ring buffers" and TCP offloading.
Thanks for your efforts.I will update you soon on this.
Just need to know as its a Production server will it be safe to initiate the below command on Linux.Also i need to know what is the default value of ring buffer?
ethtool -G eth2 rx 4096
Is 4096 is the highest value? If the issue not resolved can we put the dafault value?
Also do i need to disable TCP offloading? Is it safe as its a prod box...