cancel
Showing results for 
Search instead for 
Did you mean: 

Daily-Incr backups are failing with EC:14 continously and Weekly Full are getting Successful !!

Shyam_Prasad
Level 4
Certified

Hi All,,

one of my win 2000 host daily incremental backups are continously failing with EC:14 followed by the error:

01/20/2010 09:09:49 - begin writing
01/20/2010 09:19:40 - Critical bpbrm (pid=3541) from client psprod01.xxx.xx.com: FTL - tar file write error (40)
01/20/2010 09:19:43 - end writing; write time: 0:09:54
file write failed (14)

and the very suprising thing here is all the weekly full backups are getting successful on the same host without any error.
i tried restarting the weekly full in the weekday...it is also getting successful.
can anyone help me fixing this issue as it is a production box.

Thanks,
Shyam Prasad

8 REPLIES 8

Nicolai
Moderator
Moderator
Partner    VIP   

Netbackup is using to long time to find what files are needed in the incremental backup. Try increasing CLIENT_READ_TIMEOUT on the client.

You can also apply the bpbkar_path_tr trick to find the directory causing the issue (in you're previous thread).

Shyam_Prasad
Level 4
Certified
the CLIENT_READ_TIMEOUT values on client is set to 3600 seconds.
Pls advice...

Thanks,
Shyam

Nicolai
Moderator
Moderator
Partner    VIP   

Try settings it ti 21600 (6hours as a test). I would also find the directory that is causing the timeout and as a test exclude it from the incremental backup. 

You can find the diretory by following this procedure:

Usage specifics of the bpbkar_path_tr touch file to enable enhanced debug logging of the bpbkar proc...

LAToro
Level 4
I've been having the same issue. I have an open call with Symantec. The timeout values did not help me at all, particularly when the incr fails after 5-10 minutes (and my timeout settings are set to 3600).

Symantec recommended 2 things:

1-defrag the drive which is causing the problem.
2-increase some client TCP parameters (google "Netbackup" and these parameters to get details)...
-TcpMaxDataRetransmissions 
-TcpMaxConnectRetransmission

The defrag requires downtime and offhour coverage on the part of the windows guys here, and they are slow to act on that (in part since the fulls are running fine...they're pushing for fulls everyday)

The TCP changes require alot of redtape to implement (change in test...then uat...you get the idea) and again...they are like "just do fulls".

If your environment is less stringent, and you are able to do one or the other of these suggestions, I'd be curious to see if it resolved the issue. In the mean time, I'm reviewing the bpbkar logs (just requested that trace file to be created) to see if I can isolate the issue to a specific subdirectory.

Stumpr2
Level 6
The windows guys need to step up to the plate and defrag the drives.
There are 3rd party tools for scheduling online defrags.
 Shake 'em up and tell them to do their job :)

Nicolai
Moderator
Moderator
Partner    VIP   
defrag never fixed my timeout issues. But bpbkar_path_tr always did  (well except from a reboot now and then) :D

LAToro
Level 4
I'm wondering if I'm interpreting the log output correctly. I maxed out the verbose level, and got that bpbkar_path_tr created. I see the following...

03:42:44.945 PM: [2184.3148] <2> tar_backup::backup_startfile_state: TAR - Backup: U:\Shared\Apps\data\CMO
03:42:44.945 PM: [2184.3148] <2> tar_base::V_vTarMsgM: DIR - 456 6 54 38 16832 root root 0 1264105013 1263938793 925837900 /U/Shared/Apps/data/CMO/
03:42:44.945 PM: [2184.3148] <2> dtcp_write: TCP - success: send socket (700), 107 of 107 bytes
03:42:44.945 PM: [2184.3148] <2> tar_backup::backup_startfile_state: TAR - Backup: U:\Shared\Apps\data\CMO\spools
03:42:44.960 PM: [2184.3148] <2> tar_base::V_vTarMsgM: DIR - 456 7 59 47 16832 root root 0 1264106564 1245855951 1156728366 /U/Shared/Apps/data/CMO/spools/
03:42:44.960 PM: [2184.3148] <2> dtcp_write: TCP - success: send socket (700), 117 of 117 bytes
03:42:44.976 PM: [2184.3148] <2> tar_backup::backup_startfile_state: TAR - Backup: U:\Shared\Apps\data\CMO\spools\gnma.hsp
03:42:44.976 PM: [2184.3148] <2> dtcp_write: TCP - success: send socket (900), 32768 of 32768 bytes
03:42:44.976 PM: [2184.3148] <2> dtcp_read: TCP - success: recv socket (516), 4 of 4 bytes
03:42:44.976 PM: [2184.3148] <4> bpio::read_string: INF - read non-blocking message of length 1
03:42:44.976 PM: [2184.3148] <2> dtcp_read: TCP - success: recv socket (516), 1 of 1 bytes
03:42:44.976 PM: [2184.3148] <2> dtcp_read: TCP - success: recv socket (516), 4 of 4 bytes
03:42:44.976 PM: [2184.3148] <4> bpio::read_string: INF - read non-blocking message of length 1
03:42:44.976 PM: [2184.3148] <2> dtcp_read: TCP - success: recv socket (516), 1 of 1 bytes
03:42:44.976 PM: [2184.3148] <2> dtcp_read: TCP - success: recv socket (516), 4 of 4 bytes
03:42:44.976 PM: [2184.3148] <4> bpio::read_string: INF - read non-blocking message of length 1


[ those last messages are repeated over and over, until I get....]

03:42:46.804 PM: [2184.3148] <4> bpio::read_string: INF - read non-blocking message of length 8960
03:42:46.804 PM: [2184.3148] <2> dtcp_read: TCP - success: recv socket (516), 4303 of 8960 bytes
03:42:46.804 PM: [2184.3148] <2> dtcp_read: TCP - success: recv socket (516), 4657 of 4657 bytes
03:42:46.914 PM: [2184.3148] <2> dtcp_write: TCP - success: send socket (900), 32768 of 32768 bytes
03:42:46.914 PM: [2184.3148] <2> dtcp_read: TCP - success: recv socket (516), 4 of 4 bytes
03:42:46.914 PM: [2184.3148] <4> bpio::read_string: INF - read non-blocking message of length 68096
03:42:46.914 PM: [2184.3148] <2> dtcp_read: TCP - success: recv socket (516), 40189 of 68096 bytes
03:43:46.921 PM: [2184.3148] <4> bpio::bread: INF - read timeout
03:43:46.921 PM: [2184.3148] <4> tar_base::V_vTarMsgW: INF - tar message received from tar_backup::backup_data_state
03:43:46.921 PM: [2184.3148] <2> tar_base::V_vTarMsgW: FTL - tar file write error (40)

Seems to timeout after 1 minute. I excluded the mbspools subdirectory, and I still get 14s on the incr.


Nicolai
Moderator
Moderator
Partner    VIP   

If it time out after 60 seconds it's properly more a communication issue. The media server  throw a timeout if a connection isn't accepted after 60 seconds.  Any software firewall active ?

Have you tried to run bpkar by hand to see it can read through the file system  ?

E.g: install_path\NetBackup\bin\bpbkar32 -nocont U:\ > NUL

Files read will be send to the bit bucket