06-25-2014 08:47 PM
Hello Team,
NBU Master Server: 7.5.0.7
Client 7.5.07/ Client Name :test.123
Policy :policy.xxx
I have gone through all Tech Notes and found 2 mount points is creating issue and for that i have enabled logging to 5 and also created touch file bpbkar_path_tr and fired backup.It fails with 24.
Can someone please check the logs and suggest what needs to be excluded.So far there is no exclusion list.I have ached bpbkar logs and it seems there is some issue.Please can someone check...
Solved! Go to Solution.
07-02-2014 09:29 AM
Another source of these problems could be the TCP "ring buffers" and TCP offloading. We tried using some Cisco UCS blades as media servers last year and had to switch back to physical hardware when we had intermittent status 24's and 2074's. A couple of the steps we tried were to increase the TCP Ring Buffers to their max value, 4096, and disabled all TCP offloading.
Here's a link describing how the ring buffers in Linux work:
http://www.linuxjournal.com/content/queueing-linux-network-stack
06-25-2014 08:52 PM
Uploaded BPBKAR Logs
06-25-2014 10:09 PM
With this kind of issue we need logs on the media server as well: bptm and bpbrm.
We can see this error in bpbkar log:
ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
What is Client Connect and Client Timeout on the media server?
There is about 15 minutes between the last file entry and the failure:
10:21:35.971 [24473] <2> bpbkar SelectFile: INF - cwd = /export/muse/home/OvidSPUS_template/backup 10:21:35.971 [24473] <2> bpbkar SelectFile: INF - path = PUBMED.1215558944584.bak 10:36:29.828 [23897] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 110: Connection timed out 10:36:29.828 [23897] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 24: socket write failed
What is the size of these .bak files?
Similar issues here:
http://www.symantec.com/docs/TECH74690
https://www-secure.symantec.com/connect/forums/failed-error-24
06-26-2014 01:22 AM
Thanks for the reply.
What is the size of these .bak files?
-rw-r--r-- 1 webman webman 31583 Aug 4 2008 /export/muse/home/OvidSPUS_template/backup/PUBMED.1215558944584.bak
31K
-----------------------------------------------------------------------------------
Ipv6 is disabled on Linux box
-----------------------------------------------------------------------------------
Media Server:
Client connect timeout :9600
Client read Timeout :9600
-------------------------------------------------------------------------------------
Client :
Client read timeout :9600
-------------------------------------------------------------------------------------
The Backup is going on disk.Also i will assure that there is no issue with Media server bcoz many backup are going on the same media server and its going good.Also there are other media servers too and when backup run it fails with 24.So it seems the issue with this client.
Also i found in bpbkar logs :
10:04:32.065 [7762] <2> bpbkar SelectFile: INF - path = failed
10:04:32.065 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.065 [7762] <2> bpbkar SelectFile: INF - path = class@input@input0
10:04:32.065 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.065 [7762] <2> bpbkar SelectFile: INF - path = class@input@input1
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = class@input@input2
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = class@input@input3
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = class@input@input4
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = class@misc@nvram
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:00.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:10.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:10.1
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:10.2
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:11.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:13.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:15.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:16.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1d.0@usb1@1-0:1.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1d.1@usb2@2-0:1.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1d.2@usb3@3-0:1.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1d.3@usb4@4-0:1.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1d.7@usb6@6-0:1.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1e.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1e.0@0000:01:03.0
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1e.0@0000:01:04.0
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1e.0@0000:01:04.2
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1e.0@0000:01:04.4@usb5@5-0:1.0
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1e.0@0000:01:04.4@usb5@5-1@5-1:1.0
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1e.0@0000:01:04.4@usb5@5-1@5-1:1.1
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1e.0@0000:01:04.4@usb5@5-2@5-2:1.0
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1e.0@0000:01:04.6
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pci0000:00@0000:00:1f.0
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@platform@i8042
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@platform@i8042@serio0
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@platform@ipmi_bmc.0000.17
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@platform@serial8250
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@platform@vesafb.0
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pnp0@00:00
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pnp0@00:01
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pnp0@00:02
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pnp0@00:03
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pnp0@00:04
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pnp0@00:05
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.067 [7762] <2> bpbkar SelectFile: INF - path = devices@pnp0@00:06
10:04:32.068 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.068 [7762] <2> bpbkar SelectFile: INF - path = devices@pnp0@00:07
10:04:32.068 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.068 [7762] <2> bpbkar SelectFile: INF - path = devices@pnp0@00:08
10:04:32.068 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
------------------------------------------------------------------------------------------------------------------------------
Even i excluded Path:
No issue with permission.But still it failed with 24.
06-26-2014 01:51 AM
Please find the latest BPBKAR logs after excluding the path:/export/muse/home/OvidSPUS_template/backup
06-26-2014 01:54 AM
Thanks for the reply.
What is the size of these .bak files?
-rw-r--r-- 1 webman webman 31583 Aug 4 2008 /export/muse/home/OvidSPUS_template/backup/PUBMED.1215558944584.bak
31K
*********************************
Ipv6 is disabled on Linux box
------------------------------
Media Server:
Client connect timeout :9600
Client read Timeout :9600
*********************************
Client read timeout :9600
The Backup is going on disk.Also i will assure that there is no issue with Media server bcoz many backup are going on the same media server and its going good.Also there are other media servers too and when backup run it fails with 24.So it seems the issue with this client.
Also i found in bpbkar logs :
10:04:32.065 [7762] <2> bpbkar SelectFile: INF - path = failed
10:04:32.065 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.065 [7762] <2> bpbkar SelectFile: INF - path = class@input@input0
10:04:32.065 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.065 [7762] <2> bpbkar SelectFile: INF - path = class@input@input1
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = class@input@input2
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = class@input@input3
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = class@input@input4
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - path = class@misc@nvram
10:04:32.066 [7762] <2> bpbkar SelectFile: INF - cwd = /dev/.udev/failed
----------------------------------------------------------------------------
Even i excluded Path.Also i attached the logs after excluding the below path.
cat /usr/openv/netbackup/exclude_list
/export/muse/home/OvidSPUS_template/backup/
No issue with permission.But still it failed with 24.
06-26-2014 01:54 AM
could u please try MP1 and MP2 in separate policy and increase client connect time put setting to maximum. give it a try. even if it fails . bpbkar logs would be easy to diagnose
06-26-2014 01:57 AM
i did not get you ...What do you mean by MP1 and MP2..can you please explain...
06-26-2014 02:11 AM
Currently i am backing it up 2 different mount points by enabling multiple streams in single policy....Do you want me to create 2 policies and fire one mount point one by one.
06-26-2014 02:26 AM
...... found 2 mount points is creating issue
Lets go back to your original post...
Have you tried to create separate policies or streams for these 2 mount points?
Before starting separate backup, rename existing bpbkar log to ensure new log is created for each mount point.
/dev should be excluded from backups - device files are automatically created when OS is installed.
So, there will never be a reason to restore /dev.
Anything 'different' about /export?
What type of filesystem?
I seem to remember other forum posts about issues with /export....
Let me see if I can find it....
We still see 15 minutes between last entry in bpbkar and the failure:
02:54:16.596 [16463] <2> fscp_is_tracked: disabled tla_init 03:10:17.822 [16463] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
So - there is 15 minute timeout 'somewhere'.
Have you enabled bptm and bpbrm logs on the media server?
We may need to look at those as well.
Is there a firewall anywhere in the picture?
What is OS KeepAlive setting on the client?
Mark_Solutions has previouly posted these recommendations. Please implement this and see if it makes a difference:
# echo 510 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_intvl
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
The changes would be rendered persistent with an addition such as the following to /etc/sysctl.conf
## Keepalive at 8.5 minutes
# start probing for heartbeat after 8.5 idle minutes (default 7200 sec)
net.ipv4.tcp_keepalive_time=510
# close connection after 4 unanswered probes (default 9)
net.ipv4.tcp_keepalive_probes=3
# wait 45 seconds for reponse to each probe (default 75)
net.ipv4.tcp_keepalive_intvl=3
and then run : chkconfig boot.sysctl on
06-26-2014 03:12 AM
Currently i am backing it up 2 different mount points by enabling multiple streams in single policy.However I will create 2 policies and rename bpbkar and send the logs.
as this is a prod server..server will get rebooted in another 3 days...So do i wait for reboot or make the necessary changes which you suggest.
What is OS KeepAlive setting on the client?
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
Firewall is not there.
Export Full Data :29 GB and in / 7.5 GB.
06-26-2014 03:48 AM
You can go ahead and make the changes.
The /etc/sysctl.conf entries are needed to make the changes persistent across reboots.
06-26-2014 08:36 PM
Marianne,
To implement the below changes customer is looking for a Symantec doc:
Mark_Solutions has previouly posted these recommendations. Please implement this and see if it makes a difference:
# echo 510 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_intvl
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
The changes would be rendered persistent with an addition such as the following to /etc/sysctl.conf
## Keepalive at 8.5 minutes
# start probing for heartbeat after 8.5 idle minutes (default 7200 sec)
net.ipv4.tcp_keepalive_time=510
# close connection after 4 unanswered probes (default 9)
net.ipv4.tcp_keepalive_probes=3
# wait 45 seconds for reponse to each probe (default 75)
net.ipv4.tcp_keepalive_intvl=3
and then run : chkconfig boot.sysctl on
Can you please help me to get this.
06-26-2014 08:46 PM
Marianne,
Can you please give me any Symantec doc for OS KeepAlive setting so that we can implement the changes.This is a customer requirement.
06-27-2014 01:34 AM
2013 - these setting have come from experience and from contact with Symantec backline support over many years of using NetBackup - not sure if they appear in a document anywhere.
Your backup clearly fails after a 15 minute timeout so something is closing down the ports after that period of time (15 minutes = 900 seconds)
The usual cause is a firewall - a hardware one that sits on the network between the client and th emedia server but it can also be NetBackup timeout settings.
As the timeouts are generally governed by the media server setting you have already exceeded all of these - but out of interest if you conennect to the clients host properties what are its timeouts?
The most likely is a network or operating system error and as other jobs work that is likely to be on this client or between this client and the media server
Try the setting for keep alive - no need for the anything other than the below for your test and you can change them back if it doesn't help:
# echo 510 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_intvl
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
06-27-2014 01:37 AM
I'm not aware that Symantec has such a document - these are not NBU settings, they are OS settings so really you should address and investigate using your OS support vendor - NBU is the casualty in this issue, not the cause.
After all, if you had toothache, you would goto the dentist, not the doctor ...
The examples I have for these types of settings have been customer specific, that is set to an individual environment.
I have attached some notes for you on these sort of settings.
06-27-2014 01:58 AM
And to be doubly sure can we see the output of this command on yoru client please:
# iptables -L -n
06-27-2014 03:37 AM
I did it but no luck.
# echo 510 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_intvl
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
Again it hangs after 15 min....
iptables are disabled everywhere in linux environment.we are relying on network firewall....
06-27-2014 03:44 AM
You need to investigate this issue then with your OS support and Network admins.
In the same way that Symantec would provide details for 'NetBackup tuning settings, the OS provider should assist you with OS settings.
06-27-2014 03:47 AM
"we are relying on network firewall...."
Does that lie between the client and the media server?
Did you check the client host properties timeout settings?