cancel
Showing results for 
Search instead for 
Did you mean: 

error backuping windows client over WAN

tbl
Level 3
Partner Certified
Hello

I've got an issue backuping windows server (only) through a WAN (only) on a tape (only ... no problem on disk ...)
I write "only" because no problem backuping linux over wan, on tape, on disk and the same windows (same policy) on tape via staging area.

when I try to backup any windows through this wan, it fails with following error

### Job details :
[...] // request & grant tape resource without error
date - connecting
date - connected; connect time: 0:00:00
date - mounting 0120L3
date - Critical bpbrm (pid=16807) from client win-test: FTL - socket write failed
date - Error bptm (pid=16808) media manager terminated during mount of media id 0120L3, possible media mount timeout
date - Error bptm (pid=16808) media manager terminated by parent process
date - Error bpbrm (pid=16807) could not send server status message
date - end writing
socket write failed (24)

### bpbrm log on the master/media server :
date [pid] <2> MNG: backup_cmd = /usr/openv/netbackup/bin/bpbkar bpbkar -r 1209600 -ru root -dt 0 -to 0 -clnt win-test \
-class DIGORA_DOSSIERS_win-test -sched FULL_Manuel_On-tape -st FULL -bpstart_to 500 -bpend_to 600 -read_to 600 \
-blks_per_buffer 127 -use_otm -use_ofb -b win-test_1248426819 -kl 28 -fso -Z -ct 13
date [pid] <2> bpbrm handle_backup: forking client backup
date [pid] <2> bpbrm send_bpsched_connected_msg: sending bpsched msg: CONNECTED TO CLIENT FOR win-test_1248426819
date [pid] <2> get_readline_type: Returning BPBRM_READ_INFORM_WHEN_DONE
date [pid] <2> bpbrm read_backup_start: from client win-test: read client start message
date [pid] <2> bpbrm write_continue_backup: wrote CONTINUE BACKUP on COMM_SOCK <7>
date [pid] <2> write_file_names: buffering file name 'D:\win-test\dir1' for output
date [pid] <2> write_file_names: buffering file name 'D:\win-test\dir2' for output
date [pid] <2> write_file_names: successfully wrote buffer to COMM_SOCK
date [pid] <2> bpbrm write_filelist: wrote CONTINUE on COMM_SOCK
date [pid] <2> bpbrm handle_backup: from client win-test: INF - File data will be compressed when appropriate
date [pid] <2> bpbrm handle_backup: ESTIMATE -1 -1 win-test_1248426819
date [pid] <32> bpbrm handle_backup: from client win-test: FTL - tar file write error (10054)

technical infos :
NBU server 6.5.4 (it causes problems also with 6.5.3)
Windows clients :
- physical win 2k3 x64 with nbu client 6.5.3
- physical win 2k3 x86 with nbu client 6.5.3
this one also has an Oracle DB policy which runs perfectly.
- virtual win 2k3 x86 with nbu client 5.1 MP5
- virtual win 2k3 x86 with nbu client 6.5.3
- virtual win XP x86 with nbu client 6.5.3

WAN : 100 Mbps Fiber Channel with QoS

tests realised :
- backup single files : OK
- backup directories : NOK
- diable QoS on WAN : NOK
- disable compression : NOK
- with / wihtout snapshot : NOK

does someone has another idea?

thomas

1 ACCEPTED SOLUTION

Accepted Solutions

teiva-boy
Level 6
Over a WAN, to tape...  This is expected.  Tape needs a minimum sustained write speed, and over a limited WAN link you are not going to hit that.  

Always write to disk first over a WAN, then do a duplicate to tape.

This is NOT a backup software issue, but more so an architectural/design one.


View solution in original post

11 REPLIES 11

Giroevolver
Level 6
How fast does the backup run to disk?

zippy
Level 6
 From master server execute this command:

bpgetconfig -M <windows server> > /tmp/bp.conf.<windows server>

Now go edit the file /tmp/bp.conf.<windows server> and save it

Now do this:

bpsetconfig -h <windows server> /tmp/bp.conf.<windows server>

TIMEOUT bp.conf in google

This command will update the bp.conf in the windows server

I felt this way more convinient

tbl
Level 3
Partner Certified
My backups on disk runs at 3 to 5 MBps, which is ~ half the wan pipe.

and do windows client take care about bp.conf file?
I thought everything was in registry...

I did not found explicitely the TIMEOUT parameter for bp.conf

I'll look for it in a better way tomorrow

tom

Yasuhisa_Ishika
Level 6
Partner Accredited Certified
Set grater timeout for "Client Connect Timeout" and "Client Read Timeout". You can find these parameters in Host Properties. What zippy shows is tips to configure parameters in Unix manner on Windows.

tbl
Level 3
Partner Certified
I changed "client connect timeout" and "client read timout" to 800 seconds on the master/media server
I changed "List files timeout" to 800 sec on the client (BAR -> netbackup client properties)

it does not change the error, here is the "job details" status
07/30/2009 12:18:52 - requesting resource SFR
07/30/2009 12:18:52 - requesting resource backuplinux.NBU_CLIENT.MAXJOBS.win-test
07/30/2009 12:18:53 - granted resource backuplinux.NBU_CLIENT.MAXJOBS.win-test
07/30/2009 12:18:53 - granted resource 0120L3
07/30/2009 12:18:53 - granted resource IBM.ULTRIUM-TD3.000
07/30/2009 12:18:53 - granted resource backuplinux-hcart3-robot-tld-0
07/30/2009 12:18:53 - estimated 17192 kbytes needed
07/30/2009 12:18:53 - started process bpbrm (pid=25794)
07/30/2009 12:18:55 - connecting
07/30/2009 12:18:56 - connected; connect time: 0:00:00
07/30/2009 12:19:01 - mounting 0120L3
07/30/2009 12:19:49 - mounted 0120L3; mount time: 0:00:48
07/30/2009 12:19:49 - positioning 0120L3 to file 62
07/30/2009 12:20:05 - Critical bpbrm (pid=25794) from client win-test: FTL - socket write failed
07/30/2009 12:20:28 - Error bptm (pid=25803) media manager terminated by parent process
07/30/2009 12:20:26 - positioned 0120L3; position time: 0:00:37
07/30/2009 12:20:26 - begin writing
07/30/2009 12:20:35 - Error bpbrm (pid=25794) could not send server status message
07/30/2009 12:20:35 - end writing; write time: 0:00:09
termination requested by administrator (150)   // (I canceled the retry)

Yasuhisa_Ishika
Level 6
Partner Accredited Certified
It seems that netwotk connection had broke before timeout comes.
For detail inspection, enable debug logging on both server and client.
Once backup job begins, so I think unified log is not needed. Just enable
legacy debug logs. See Troubleshooting Guide page 96 and 112.

NetBackup 6.5 Troubleshooting Guide
http://support.veritas.com/docs/290230

tbl
Level 3
Partner Certified
Hi, I already enabled it, but I noticed something strange int bptm on media server part of bpbkar on windows client :
:24:40.500 PM: [3824.1496] <4> tar_backup_cpr::start: INF - checkpoint thread started
2:24:40.500 PM: [3824.1496] <2> tar_backup_tfi::setupFileDirectives: TAR - backup filename = C:\Documents and Settings\Administrator\Desktop\Virtual Center 2.5 104265\vi\
2:24:40.500 PM: [3824.1496] <2> tar_base::V_vTarMsgW: INF - File data will be compressed when appropriate
2:24:40.515 PM: [3824.2404] <4> tar_base::keepaliveThread: INF - keepalive thread is active with an interval of 60 seconds
2:24:41.609 PM: [3824.1496] <4> dos_backup::V_VerifyFileList: INF - UBS Local Type for 'C:\Documents and Settings\Administrator\Desktop\Virtual Center 2.5 104265\vi\' --> 10020002
2:24:47.093 PM: [3824.1496] <4> backup_create: INF - NetBackup Temp Directory: 'C:\Program Files\Veritas\\NetBackup\Temp'
2:24:48.171 PM: [3824.1496] <2> ov_log::V_GlobalLogEx: INF - file_access (constructor): 0 non-NTFS volumes
2:25:28.953 PM: [3824.1496] <32> TransporterRemote::write[2](): FTL - SocketWriteException: send() call failed, could not write data to the socket, possible broken connection.
2:25:28.953 PM: [3824.1496] <16> NBUException::traceException(): (
An Exception of type [Symantec::NetBackup::Ncf::OperationFailedException] was thrown. Details about the exception follow...:
Error code  = (-1008).
Src file    = (d:\653\src\cl\clientpc\util\tar_tfi.cpp).
Src Line    = (275).
Description = (%s getBuffer operation failed).
Operation type=().
)
2:25:28.953 PM: [3824.1496] <16> NBUException::traceException(): (
An Exception of type [Symantec::NetBackup::Ncf::SocketWriteException] was thrown. Details about the exception follow...:
Error code  = (-1027).
Src file    = (TransporterRemote.cpp).
Src Line    = (310).
Description = (send() call failed, could not write data to the socket, possible broken connection).
Local IP=(). Remote IP=(). Remote Port No.=(0).
No. of bytes to write=(32768) while No. of bytes written=(0).
)
2:25:28.953 PM: [3824.1496] <2> tar_base::V_vTarMsgW: FTL - socket write failed
2:25:28.953 PM: [3824.1496] <4> tar_backup::backup_done_state: INF - number of file directives not found: 0
2:25:28.953 PM: [3824.1496] <4> tar_backup::backup_done_state: INF -     number of file directives found: 1
2:25:28.953 PM: [3824.2404] <4> tar_base::keepaliveThread: INF - keepalive thread terminating (reason: WAIT_OBJECT_0)
2:25:28.953 PM: [3824.1496] <4> tar_base::stopKeepaliveThread: INF - keepalive thread has exited. (reason: WAIT_OBJECT_0)
2:25:28.953 PM: [3824.1496] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 24: socket write failed
2:25:28.953 PM: [3824.1496] <4> tar_backup::backup_done_state: INF - Not waiting for server status
interesting part of bptm on master/media server
14:21:18.277 [24131] <2> tapelib: wait_for_ltid, Mount, timeout 0
14:22:03.422 [24131] <2> Media_signal_poll: 2:Terminate detected (tapelib.c:625)
14:22:03.422 [24131] <2> mount_open_media: mount canceled detected in tpreq(), signo = 1
14:22:03.423 [24131] <2> set_job_details: Sending Tfile jobid (61113)
14:22:03.423 [24131] <2> set_job_details: LOG 1250079723 16 bptm 24131 media manager terminated during mount of media id 0120L3, possible media mount timeout
any idea ? thomas

thesanman
Level 6
I'm having a similar problem.

Upgrading Win2003 clients from v6.0MP5 to v6.6.6 and am seeing some but not all clients failing with FTL - socket write failed.

If I write to disk; no problems so it's as if something is timing out during the tape mount and position.  If the tape is already up and mounted and in use; no problems.

Just wondering wether this issue was ever resolved and if you could post your fix?

Thanks.

Korkiatupa
Not applicable

I have same problem with win2003 and NBU 7.0

Write to disk works fine but not on tape.
Does anyone have solution for this?

/Grego

teiva-boy
Level 6
Over a WAN, to tape...  This is expected.  Tape needs a minimum sustained write speed, and over a limited WAN link you are not going to hit that.  

Always write to disk first over a WAN, then do a duplicate to tape.

This is NOT a backup software issue, but more so an architectural/design one.


tbl
Level 3
Partner Certified
Hello teiva-boy,

we use a 100Mbps WAN with fiber channel to our datacenter. I hope netbackup is able to backup data through a 100Mpbs Network, the latency (mesured via ping and mtr) is max 20ms and on average less than 4 ms.

After some tests we noticed that we didn't had trouble backuping when we replaced our Netasq router by a Juniper Netscreen NS25.

As we were needed this router, we kept the netasq, but we plan to bypass it, and/or use backup to disk.

thomas