cancel
Showing results for 
Search instead for 
Did you mean: 

Incremental backups consistently getting error code 14

LAToro
Level 4
Here's the environment:

Solaris 10 Master/media servers running NB 6.5.1
Windows 2000 client running 5.1 MP5
client has a SAN drive with approx. 650GB of data, across 3 million files, in addition to C and D, and is part of a cluster.

I only get this error on diff incremental backups, and only when I single stream (which is the standard...backups, as a matter of policy, are not multistreamed).
Allowing multiple streams yields 4 streams (C, D, G, and system state), and incr run fine.
Not allowing multiple streams, backup fails. Symantec says it hangs on the C drive, yet if I exclude G, the single stream incr runs fine.
Fulls run fine regardless of streaming.

Before you say "just allow multistreaming"... I tried that but it only resolves the issue on this particular client; there is another (part of the same cluster) that just has C and D, and it get the same error on diff incr, even with streaming turned on.

Got an open call with Symantec, but I'm not getting a warm and fuzzy that they will ersolve this issue.


Excerpt from bpbkar:

01:58:22.227 PM: [2768.496] <4> bpio::read_string: INF - read non-blocking message of length 1
01:58:22.227 PM: [2768.496] <2> dtcp_read: TCP - success: recv socket (436), 1 of 1 bytes
01:58:22.227 PM: [2768.496] <2> dtcp_read: TCP - success: recv socket (436), 2 of 4 bytes
01:58:22.227 PM: [2768.496] <4> bpio::read_string: INF - read non-blocking message of length 8960
01:58:22.227 PM: [2768.496] <2> dtcp_read: TCP - success: recv socket (436), 4303 of 8960 bytes
01:58:22.227 PM: [2768.496] <2> dtcp_read: TCP - success: recv socket (436), 4657 of 4657 bytes
01:58:22.227 PM: [2768.496] <2> dtcp_write: TCP - success: send socket (704), 20480 of 20480 bytes
01:58:22.227 PM: [2768.496] <2> dtcp_read: TCP - success: recv socket (436), 4 of 4 bytes
01:58:22.227 PM: [2768.496] <4> bpio::read_string: INF - read non-blocking message of length 68096
01:58:22.227 PM: [2768.496] <2> dtcp_read: TCP - success: recv socket (436), 40189 of 68096 bytes
01:59:22.260 PM: [2768.496] <4> bpio::bread: INF - read timeout
01:59:22.260 PM: [2768.496] <4> tar_base::V_vTarMsgW: INF - tar message received from tar_backup::backup_send_chkp_data_state
01:59:22.260 PM: [2768.496] <2> tar_base::V_vTarMsgW: FTL - tar file write error (40)
01:59:22.260 PM: [2768.496] <2> dtcp_write: TCP - success: send socket (720), 6 of 6 bytes
01:59:22.260 PM: [2768.496] <2> dtcp_write: TCP - success: send socket (720), 26 of 26 bytes
01:59:22.260 PM: [2768.496] <2> tar_base::V_vTarMsgW: INF - Client completed sending data for backup
01:59:22.260 PM: [2768.496] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 14: file write failed

Thanks


9 REPLIES 9

Andy_Welburn
Level 6
It seems to indicate (in the main) network or disk full issues?

In-depth Troubleshooting Guide for Exit Status Code 14

Nathan_B
Level 4
Is it possible for you to upgrade the client to match the master server's version?  I haven't seen too many issues with 5.1 to 6.5.x but I have seen a few cases where errors just sort of... corrected themselves, when I took them to 6.5.x.

Andy_Welburn
Level 6
that's often my first choice if I come across an 'old' client with issues.

Will_Restore
Level 6

LAToro
Level 4
Thanks for the replies. Let me answer in order...

- had already gone through the technote, and space there is plent of available free space.
- I also inquired as to why not upgrade the client to 6.5.1 to see if the issue goes away, but this is a big environment and before anything can be rolled out (ie., 6.5 to the clients) it has to be "certified" by a separate group.
- support from Symantec is not an issue (as I do have an open case on this).

We've checked the nic drivers and all are up to date. Also had the Windows SA do a defrag on C; drive, and it looks like the backup ran fine after the defrag. But I ran another incremental and got a 14.

LAToro
Level 4
After speaking with a Symantec engineer, he recommended increasing some TCP parameters on the Windows servers. Namely:

Increase Windows TCP settings to keep the socket open longer during backups
  -TcpMaxDataRetransmissions
  -TcpMaxConnectRetransmission
 
This has solved a number of  cases that end in socket failures
Microsoft Reference
    TCP/IP and NetBT configuration parameters for Windows 2000 or Windows NT
    http://support.microsoft.com/kb/120642
 

Waiting on confirmation from the Windows admins that they/ve made these changes to see if it fixes the issue.

LAToro
Level 4
After speaking with a Symantec engineer, he recommended increasing some TCP parameters on the Windows servers. Namely:

Increase Windows TCP settings to keep the socket open longer during backups
  -TcpMaxDataRetransmissions
  -TcpMaxConnectRetransmission
 
This has solved a number of  cases that end in socket failures
Microsoft Reference
    TCP/IP and NetBT configuration parameters for Windows 2000 or Windows NT
    http://support.microsoft.com/kb/120642
 

Waiting on confirmation from the Windows admins that they/ve made these changes to see if it fixes the issue.

LAToro
Level 4
As I wait for the windows admin to apply the aforementioned TCP parameter change, I continued to troubleshoot on my end on a different client.

Same deal: Fulls running fine, incrementals fail on error code 14 (FTL - tar file write error (40). This is a node on a windows SQL cluster, and I'm trying to backup only a single drive (ie., I'm not doing ALL_LOCAL_DRIVES).

So I maxed out all the logging, and decided to exclude a directory based on what I saw in a verbose bpbkar log.
The incremental ran fine. So I figure since the full is backing up fine, let me create a schedule-specific exclude for this directory. Did that, and the incremental failed...

Why excluding the directory on all schedules works, and not on a schedule specific exclude does not is beyond me.

SteveYu
Level 4
Employee Accredited Certified
Your case does not sound like a fun one. I had a couple of questions just more for my curiosity.

- So you are experiencing the same behavior to a different client? Have you tried the backup to another media server?
- Have these ever worked?
- If it were my case what I may ask you to do is either or both. Bpbkar to null test the data selection and/or use the bpbkar trace touch file so it prints the files being read on the client during the backup.
- I suppose upgrading the netbackup client is out of the question? 5.1 is old (but it is one of my favorite releases)
- Symantec support does not like like for our customers to have unresolved cases for so long. If you PM me your case number I will see if I can give it a little push though I can't say how much it will or won't help