05-02-2013 09:18 AM
I have one backup client (from an estate of 600+ clients) failing with a code 24 during a server initiated file system backup. Both the client and the server are Solaris 10 (client : SPARC, server x86). Netbackup version is 7.0 on both.
I've checked the relevant tech articles that address code 24 errors but they don't seem to apply to my problem.
http://www.symantec.com/business/support/index?page=content&id=TECH188129 - backup is failing in less than one minute so the lowering the TCP_KEEPALIVE_INTERVAL to 5 mins would seem unlikely to help.
http://www.symantec.com/business/support/index?page=content&id=TECH76201&key=15143&basecat=TROUBLESHOOTING&actp=LIST - I've checked the tcp_recv_hiwat, tcp_xmit_hiwat and tcp_max_buf parameters and they are the same on the client (and other clients I've checked) and the server.
http://www.symantec.com/business/support/index?page=content&id=HOWTO34910 - OS patching is reasonably up to date (January Critical Path Update applied a few months ago). Backup has been re-tried at times when network load is low.
Also NET_BUFFER_SZ has not been configured on the client or the master/media server.
Does anybody have a suggestion as to what else I can look at? I'm guessing it's a network issue but I can't find a difference in network configuration between this client and all the others.
Logs and other info from the client and server. Names have been changed to protect the innocent.
05-02-2013 09:28 AM
I would first look at /u01/app/oracle/admin/cd02/adump/cd02u01_ora_23013_1.xml
or the file immediately after it as a corrupt file could be causing this error
Worth checking the files / file system in case there is some corruption - also check previous / next logs and see if it always fails at the same file
Hope this helps
05-02-2013 09:33 AM
Thanks for the reply.
I have previouslly tried adding /u01/app/oracle/admin to exclude_list. The end result being that if fails in a different place.
File systems are all ZFS and there's no reported corruption.
05-02-2013 11:06 AM
On the client:
add VERBOSE = 5 to the /usr/openv/netbackup/bp.conf file
run: touch /usr/openv/netbackup/bpbkar_path_tr
run: bpbkar -nocont <filesystem_that fails> >/dev/null
(wait until it returns to the command prompt)
Check the bpbkar log to see if the command above has created the same error.
05-02-2013 11:10 AM
The few network settings that there are in NetBackup are very unlikely to cause a status 24, I won't say its impossible, but I've never seen it.
You need to look outside NBU I'm afraid.
Did this client ever work, if it did what has changed recently ( I appreciate you may not know this).
As only one client is affected, this reinforces the suggestion that no settings in nbu are going to be the cause, else I would expect more clients to be having issues.
I've not seen net buffer sz cause failures, performance issues yes, definately, but not failures.
Are there any firewalls, does it fail after the same time, or, the same amount of data, any patterns you can see.
05-03-2013 06:21 AM
Thanks for the advise. The command is completing successfully so I'm guessing a network problem rather than corrupt files/file systems.
05-03-2013 07:16 AM
Thanks for the reply.
There's isn't a firewall between the client and the media/master server. There was a host based firewall (ipfilter) but that was one of the first things I disabled.
It does seem to be based on the amount of data or elapsed time. The backup fails about 55-56 secs after starting on the client (based on the bpbkar log). I've tried progressivly adding directories to exclude_list after each failure but the subsequent backup still fails after the same elapsed time.
05-03-2013 03:01 PM
OK, any chance you can work out if it is elasped time, or size data ...