cancel
Showing results for 
Search instead for 
Did you mean: 

Flashbackup-Windows socket read failure Err 13

Kev_Lamb
Level 6

Hi,

Having managed to get the Flashbackup-Windows policy to validate (problem with the firewall) now I am getting an Error 13 each time the backup runs.

The snapshot part of the policy completes without error and the backup start Ok but gets to about 400128Kb and 325500 Files and terminates with the following in the activity monitor:

01/13/2014 09:17:37 - Info bpbrm (pid=21133) lonbfbelvis1.corp.ad.timeinc.com is the host to backup data from
01/13/2014 09:17:37 - Info bpbrm (pid=21133) reading file list from client
01/13/2014 09:17:38 - Info bpbrm (pid=21133) starting bpbkar on client
01/13/2014 09:17:40 - Info bpbkar (pid=6580) Backup started
01/13/2014 09:17:40 - Info bpbrm (pid=21133) bptm pid: 21135
01/13/2014 09:17:40 - Info bptm (pid=21135) start
01/13/2014 09:17:40 - Info bptm (pid=21135) using 262144 data buffer size
01/13/2014 09:17:40 - Info bptm (pid=21135) using 30 data buffers
01/13/2014 09:17:40 - Info bptm (pid=21135) start backup
01/13/2014 09:17:45 - Info bptm (pid=21135) backup child process is pid 21152
01/13/2014 09:18:01 - Info nbjm (pid=16501) starting backup job (jobid=233765) for client lonbfbelvis1.corp.ad.timeinc.com, policy ELVIS1-SNAP-TEST, schedule Weekly-Full
01/13/2014 09:18:01 - estimated 0 kbytes needed
01/13/2014 09:18:01 - Info nbjm (pid=16501) started backup (backupid=lonbfbelvis1.corp.ad.timeinc.com_1389604681) job for client lonbfbelvis1.corp.ad.timeinc.com, policy ELVIS1-SNAP-TEST, schedule Weekly-Full on storage unit BFB-VMWARE-OST
01/13/2014 09:18:02 - started process bpbrm (pid=21133)
01/13/2014 09:18:03 - connecting
01/13/2014 09:18:03 - connected; connect time: 0:00:00
01/13/2014 09:18:11 - begin writing
01/13/2014 09:25:52 - Error bpbrm (pid=21133) socket read failed: errno = 104 - Connection reset by peer
01/13/2014 09:25:52 - Error bptm (pid=21152) system call failed - Connection reset by peer (at child.c.1298)
01/13/2014 09:25:52 - Error bptm (pid=21152) unable to perform read from client socket, connection may have been broken
01/13/2014 09:25:52 - Error bptm (pid=21135) media manager terminated by parent process
01/13/2014 09:26:01 - Error bpbrm (pid=21133) could not send server status message
01/13/2014 09:26:02 - Critical bpbrm (pid=21133) unexpected termination of client lonbfbelvis1.corp.ad.timeinc.com
01/13/2014 09:26:02 - Info bpbkar (pid=0) done. status: 13: file read failed
01/13/2014 09:26:27 - end writing; write time: 0:08:16
file read failed  (13)
 
The data area is approx 3Tb in size and is comprised of millions of small files, this is sat on a 10Tb disk so I know that the snap space is adequate >15% could this soley be down to IO issues whist backing up the snap from the same disk, also is there any rules that need to be followed when using an alternative host for the backup IO?
 
The backup is curently being performed onto a B6200 using OST
 
Kev
Attitude is a small thing that makes a BIG difference
29 REPLIES 29

Mark_Solutions
Level 6
Partner Accredited Certified

Take a look at this to adjust the read size on unix:

http://www.symantec.com/docs/HOWTO56178

I know it relates to the client but i am sure in the past it needs to actually be done on the media server

Kev_Lamb
Level 6

Hi Mark,

Just tried that, I doubled the suggested amounts but this still fails again, going to do the VM change on Tuesday and see how we go with that, I have just tested the Flashbackup on another server ouside our DMZ and this ahs worked without a problem.

Attitude is a small thing that makes a BIG difference

Mark_Solutions
Level 6
Partner Accredited Certified

Ok - was worth a try - keep us updated

Kev_Lamb
Level 6

Will do, obviously this will be after Tuesday evening :)

Attitude is a small thing that makes a BIG difference

Kev_Lamb
Level 6

Did the Virtual Memory change with two reboots and I still get the same problem as before, I am now wondering if this could be an issue with our DMZ, we do see a reduced bandwitdh with anything through our backups in the DMZ, going to get the network guys in the US to take a look at this whilst the backup is running.

I have successfully ran a Flashbackup-Windows policy on the Test & Dev server of the one that is failing and this is on the internal side of the internet.

Bit stuck now on what else I can try now.

UPDATE:

I have just ran the test on a W2003 server within the DMZ and this is fine, the server in question is a W2008 R2 64bit Enterprise server

Attitude is a small thing that makes a BIG difference

Marianne
Level 6
Partner    VIP    Accredited Certified

Time to log a call with Symantec Support?

They will need lots of level 5 logs....

Kev_Lamb
Level 6

Hi,

Call has been logged with support, I will leave this thread open and update it when/if I get a resolution

 

Kev

Attitude is a small thing that makes a BIG difference

Nick_M
Level 3

Hi Kevin,

In response to your query on how to find out which (if any) corrupt file may be causing an issue there's a good technote explaining how to use bpbkar_tr touch file to debug which file(s) backups are failing on.  Beware though this can cause large debug logs and I can't vouch for it's relevance with Flashbackup types

http://www.symantec.com/business/support/index?page=content&id=TECH31513

Kev_Lamb
Level 6

Latest: Symantec have doen a webex session and taken all the logs back for analysis, the case has now been escalated to backline support, hopefully hear soon..

Attitude is a small thing that makes a BIG difference

Kev_Lamb
Level 6

Looks like we fixed this problem ourselves... we found that the application was creating recursive directories for whatever reason and one of these was also causing Storage Essentials to fail on a discovery, we did scan the area and this came back Ok so we deleted it and re ran the SE which worked (we still have a few other recursive directories but these seem Ok) so I re-ran the Flashbackup policy and this is now working Ok.

Not sure what was wrong with the directory but that was definately the issue.

 

Kev

Attitude is a small thing that makes a BIG difference