cancel
Showing results for 
Search instead for 
Did you mean: 

Socket write failed(24)

knox890
Level 3

Team,

 

I am running backups on a windows 2008 server and I got an error syaing socket write failed(24)

The connection between master server and client is by using a private IP and this can be verified by running ping and bptestbpcd. All connections are okay but i am still having error. 

 

Size of file per drive = 350GB. It was previously running correctly but then right now, its having errors.

 

Can anyone help me with this?

 

 

May 31, 2012 9:12:49 AM - requesting resource atpsk01-hcart3-robot-tld-0
May 31, 2012 9:12:49 AM - requesting resource ATPBCKSP01.NBU_CLIENT.MAXJOBS.p1cl01n3
May 31, 2012 9:12:49 AM - requesting resource ATPBCKSP01.NBU_POLICY.MAXJOBS.Weekly_LNotes_P1P3
May 31, 2012 9:12:50 AM - granted resource  ATPBCKSP01.NBU_CLIENT.MAXJOBS.p1cl01n3
May 31, 2012 9:12:50 AM - granted resource  ATPBCKSP01.NBU_POLICY.MAXJOBS.Weekly_LNotes_P1P3
May 31, 2012 9:12:50 AM - granted resource  P10466
May 31, 2012 9:12:50 AM - granted resource  HP.ULTRIUM3-SCSI.005
May 31, 2012 9:12:50 AM - granted resource  atpsk01-hcart3-robot-tld-0
May 31, 2012 9:12:50 AM - estimated 1178294689 kbytes needed
May 31, 2012 9:12:51 AM - started process bpbrm (pid=23051)
May 31, 2012 9:13:26 AM - connecting
May 31, 2012 9:13:27 AM - connected; connect time: 0:00:00
May 31, 2012 9:13:35 AM - mounting P10466
May 31, 2012 9:14:17 AM - mounted P10466; mount time: 0:00:42
May 31, 2012 9:14:17 AM - positioning P10466 to file 4
May 31, 2012 9:15:35 AM - Warning bptm (pid=23058) read error on media id P10466, drive index 5 reading header block, No space left on device
May 31, 2012 9:16:50 AM - positioned P10466; position time: 0:02:33
May 31, 2012 9:16:50 AM - begin writing
May 31, 2012 9:16:50 AM - current media P10466 complete, requesting next media Any
May 31, 2012 9:16:51 AM - granted resource  P10733
May 31, 2012 9:16:51 AM - granted resource  HP.ULTRIUM3-SCSI.004
May 31, 2012 9:16:51 AM - granted resource  atpsk01-hcart3-robot-tld-0
May 31, 2012 9:16:52 AM - mounting P10733
May 31, 2012 9:17:34 AM - mounted P10733; mount time: 0:00:42
May 31, 2012 9:17:34 AM - positioning P10733 to file 1
May 31, 2012 9:17:42 AM - positioned P10733; position time: 0:00:08
May 31, 2012 9:17:42 AM - begin writing
May 31, 2012 10:12:07 AM - Critical bpbrm (pid=23051) from client p1cl01n3: FTL - socket write failed
May 31, 2012 10:12:10 AM - Error bptm (pid=23058) media manager terminated by parent process
May 31, 2012 10:12:55 AM - end writing; write time: 0:55:13
socket write failed  (24)
28 REPLIES 28

mph999
Level 6
Employee Accredited

You only sent bptm and bpbrm, the there are no client logs here  - we need bpbkar and I wouls say bpcd as well.

You also included 51216-200 (FT Client) log and 51216-263  (win gui)

Martin

Marianne
Level 6
Partner    VIP    Accredited Certified

We need bpcd and bpbkar from client as well, covering the same period.

As mentioned yesterday:

Log folders do not exist by default, they need to be created.

 

knox890
Level 3

okay, i created the folders and rerun the policy.

ill post the results once we got the logs, for now please see these also from the master server

Marianne
Level 6
Partner    VIP    Accredited Certified

We DON't need bpbkar and bpcd from the master.

We need these logs from the client.

As explained previously:

bpbkar log will tell us what happened on the client during backup process.

bpbkar on the master will log master server's backups of itself. This does not help us to troubleshoot CLIENT activity.

bpcd will log media server -> client comms.
bpcd on the master will log comms with itself...........

 

 

watsons
Level 6

Just throwing out a few more possibilities here, which I had gathered from past experience on error 24:

1) Disable TCP Chimney - which you have decided not to work on that path...

2) Reschedule the backups to other time of the day, to see if it has the same problem. Sometimes with more backups running simultaneously, it's more likely to get such error.

3) Could be the NIC itself. Go for latest driver...  also following NICs are known to cause problems:

 

- Hewlett-Packard NC373i Multifunction Gigabit Server Adapter
- Broadcom BCM5708C NetXtreme II GigE

knox890
Level 3

here are the logs i've gathered from the client.

 

 

Marianne
Level 6
Partner    VIP    Accredited Certified

PLEASE believe us when we say we need ALL of these logs covering the same start and end time of the job failure:

On media server: bpbrm and bptm

On client: bpcd and bpbkar

Please rename all logs to reflect the process name, e.g. bpcd.txt.

So, please post new output of job details as you did on 31 May, along with a full set of logs

We really need to follow the full process for one backup failure:
bpbrm on media server connects to bpcd on client.
bpcd on client accepts connection from media server and starts bpbkar.
bpbrm on media server start bptm.
bptm waits for data from client.
bpbrm waits for metadata info.
bpbkar sends data to bptm on media server.
... and so on .....
 

Martin has also explained why and what we need regarding logs under point (F) in this 'featured item' post at the top of the NetBackup Forum page:
https://www-secure.symantec.com/connect/forums/netbackup-basics-and-how-make-your-life-easier

mph999
Level 6
Employee Accredited

To be honest -  usually with status 24, the logs give information, but not a definate answer and it very often maeans you just have to work through the different possibilities.

Hence the reason I posted the long post previuos, showing just about ann the difference causes of 24s, along with technotes.

I susggest you start working through it ...  

Status 24s are almost always caused by events outside NBU ( I have never seem one caused by NBU) and thus, if we don't cause it, then we can't report many details in the logs.

Martin

 

 

V4
Level 6
Partner Accredited

May 31, 2012 9:15:35 AM - Warning bptm (pid=23058) read error on media id P10466, drive index 5 reading header block, No space left on device

 

Did you tried with different media. with increased read time out value. Check if it helps