cancel
Showing results for 
Search instead for 
Did you mean: 

Backup getting failed with EC24

dixit47
Level 4

Hi,

In our environment we have our backups getting failed with EC24

Windows Client Server - 2008 R2 -- NBU Version 7.6

Netbackup Master Server - Solaris - 7.6

Netbackup Media Server - Solaris 7.6

below is detailed status -----

04/22/2015 03:15:07 - Info nbjm (pid=16618) starting backup job (jobid=3161892) for client XXXXXXXXXX, policy XXXXXXXXXX_test, schedule Full
04/22/2015 03:15:07 - Info nbjm (pid=16618) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=3161892, request id:{4DE7C35A-E8BF-11E4-9428-00C0DD1C0E45})
04/22/2015 03:15:07 - requesting resource XXXXXXXXXX_dd_nfs001_dsu_st
04/22/2015 03:15:07 - requesting resource asprd537-ebr.XXXXXXXXXX.NBU_CLIENT.MAXJOBS.XXXXXXXXXX
04/22/2015 03:15:07 - requesting resource asprd537-ebr.XXXXXXXXXX.NBU_POLICY.MAXJOBS.XXXXXXXXXX_test
04/22/2015 03:15:07 - granted resource  asprd537-ebr.XXXXXXXXXX.NBU_CLIENT.MAXJOBS.XXXXXXXXXX
04/22/2015 03:15:07 - granted resource  asprd537-ebr.XXXXXXXXXX.NBU_POLICY.MAXJOBS.XXXXXXXXXX_test
04/22/2015 03:15:07 - granted resource  MediaID=@aaaa6;Path=/opt/app/ebr/XXXXXXXXXX/nfs_stu001/nbu_dsu_st;MediaServer=XXXXXXXXXX.XXXXXXXXXX
04/22/2015 03:15:07 - granted resource  XXXXXXXXXX_dd_nfs001_dsu_st
04/22/2015 03:15:08 - estimated 0 kbytes needed
04/22/2015 03:15:08 - Info nbjm (pid=16618) started backup (backupid=XXXXXXXXXX_1429686907) job for client XXXXXXXXXX, policy XXXXXXXXXX_test, schedule Full on storage unit XXXXXXXXXX_dd_nfs001_dsu_st
04/22/2015 03:15:12 - started process bpbrm (pid=9051)
04/22/2015 03:15:13 - Info bpbrm (pid=9051) XXXXXXXXXX is the host to backup data from
04/22/2015 03:15:14 - Info bpbrm (pid=9051) reading file list for client
04/22/2015 03:15:14 - connecting
04/22/2015 03:15:16 - Info bpbrm (pid=9051) starting bpbkar on client
04/22/2015 03:15:16 - connected; connect time: 0:00:00
04/22/2015 03:15:17 - Info bpbkar (pid=4896) Backup started
04/22/2015 03:15:17 - Info bpbrm (pid=9051) bptm pid: 9057
04/22/2015 03:15:17 - Info bpbkar (pid=4896) change time comparison:<disabled>
04/22/2015 03:15:17 - Info bpbkar (pid=4896) archive bit processing:<enabled>
04/22/2015 03:15:18 - Info bpbkar (pid=4896) not using change journal data for <C:\>: not enabled
04/22/2015 03:15:18 - Info bpbkar (pid=4896) not using change journal data for <D:\>: not enabled
04/22/2015 03:15:18 - Info bpbkar (pid=4896) not using change journal data for <E:\>: not enabled
04/22/2015 03:15:18 - Info bpbkar (pid=4896) not using change journal data for <F:\>: not enabled
04/22/2015 03:15:19 - Info bptm (pid=9057) start
04/22/2015 03:15:20 - Info bptm (pid=9057) using 262144 data buffer size
04/22/2015 03:15:20 - Info bptm (pid=9057) using 32 data buffers
04/22/2015 03:15:31 - Info bptm (pid=9057) start backup
04/22/2015 03:15:33 - Info bptm (pid=9057) backup child process is pid 9093
04/22/2015 03:15:33 - begin writing
04/22/2015 03:32:09 - Critical bpbrm (pid=9051) from client XXXXXXXXXX: FTL - socket write failed
04/22/2015 03:32:09 - Error bptm (pid=9093) system call failed - Connection reset by peer (at child.c.1306)
04/22/2015 03:32:10 - Error bptm (pid=9093) unable to perform read from client socket, connection may have been broken
04/22/2015 03:32:11 - Error bptm (pid=9057) media manager terminated by parent process
04/22/2015 03:32:49 - Error bpbrm (pid=9051) could not send server status message
04/22/2015 03:32:51 - Info bpbkar (pid=4896) done. status: 24: socket write failed
04/22/2015 03:32:51 - end writing; write time: 0:17:18
socket write failed  (24)

 

-- Chked the followint things -

 

  • Chimney settings
  • Buffer Settings
  • Disabled snapshot backups.
  • Enabled multi-streaming
  • Routing also checked and communication is from correct interfaces.
  • Backups not getting failed at a particular point.
  • Changed the media server storage unit but no luck.
  • NIC card drivers updated on the client.
  • backups tried from tape as well as disk.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

dixit47
Level 4

Its not a netbackup issue, Windows SA has changed some setting from OS end and issue has been resolved..

Thanks all for your valuable suggestions.. 

View solution in original post

11 REPLIES 11

dixit47
Level 4

 Experts : Could you please help to fix this EC 24 issue ?

Jaime_Vazquez
Level 6
Employee

There is a significant time lag shown in the details between the "begin writing" and the "socket write failed" message.  Over 16 minutes in fact.  What is the client connection timeout value? What is the policy type and options being used for the backup?

 

For diagnostic work I would enable/verify debug logging of bpbkar and bpcd on the client and see what it shows for any level of progress.  I would do same for bpbrm and bptm on the Media Server.

The message "Connection reset by peer" typically means that the client process that bptm was talking to (bpbkar) died unexpectedly. It could also mean that the network connection itself went down. The bptm process has no idea why, only that it can no longer "talk" to the client process. That is analogous to talking to somebody through a cell phone and having the call drop on you.  Was the cause because the other person hung up (application crash) or the link to the cell tower failed (network failure).

So, strictly on a first guess best shot initial view, the problem appears to be happening on the client.

 

 

mph999
Level 6
Employee Accredited

Other option is to grab a tcp dump on both media and client and run it through Wireshark.

Look at the logs as suggested by Jaime first though. 

mnolan
Level 6
Employee Accredited Certified

netsh int tcp show global 

run this on client and media server

 

If tcp autotuning enabled, disable

 

netsh int tcp set global autotuning=disabled

 

STATUS CODE 24: Socket write failedArticle: TECH150369 Updated: July 22, 2014 Article URL: http://www.symantec.com/docs/TECH150369

 

 

dixit47
Level 4

Already tried netsh int tcp set global autotuning=disabled cmd and also checked bptm and bpbrm logs on media server

and bpbkar and bpcd logs on client server, but no luck, still EC 24 issue persist. 

dixit47
Level 4

gettting below error in bpbkar logs file :

21:04:05.127 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:06.141 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:07.155 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:08.169 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:09.183 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:10.197 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:11.211 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)


Any suggestions ?

dixit47
Level 4

Bpbkar logs :  Getting below error :

21:04:05.127 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:06.141 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:07.155 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:08.169 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:09.183 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:10.197 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)
21:04:11.211 [13332.13884] <16> dtcp_read: TCP - failure: recv socket (612) (TCP 10058: Can't send after socket shutdown)

Please advice !!

mph999
Level 6
Employee Accredited

OK, when did this problem start (seems only to be one client ?)

Has anyone added any OS patches of something similar to this server.

NBU doesn't cause status 24s !!!

Suggest you get the logs (though I think they may only tell us the error we already know) , and then look at the TCP dumps in Wireshark.

sdo
Moderator
Moderator
Partner    VIP    Certified

1) How many clients in total in the whole environment?

2) How many clients fail with status 24?

3) How long has this been hapenning for?

4) Is the master server also a media server?  i.e. is it a 'master/media' server?

5) How many media servers are there in the NetBackup domain which is experiencing this issue?

dixit47
Level 4

Its not a netbackup issue, Windows SA has changed some setting from OS end and issue has been resolved..

Thanks all for your valuable suggestions.. 

sdo
Moderator
Moderator
Partner    VIP    Certified

Hi dixit47 - any chance you could share with us some detail around the actual problem, and the actual solution?  Many thanks.