08-19-2013 11:20 AM
Hi,
NBU master server: 7.5.0.4
Client: 7.5.0.3
Suddenly, full backup of a particular drive (different drive in different clients) is failing with status code 24:socket write failed in many windows clients. However, differentials are completing successfully for the same. The full backups were running fine until the previous week. No changes have been made from Netbackup end and also from the server end.
bpbkar and bpcd logs of the client are attached for your reference.
08-19-2013 01:11 PM
Don't see any attachments.
From the Status Code Guide
A possible cause is a high network load. For example, this problem occurs with
Cannot write to STDOUT when a Windows system that monitors network
load detects a high load. It then sends an ICMP packet to other systems to
inform them that the route those systems use was disconnected.
08-20-2013 05:00 AM
Would also need bpbrm logs from the media server for further investigation.
08-20-2013 05:22 AM
Hi Tejas9024,
Better log a case with NetBackup Support. They may ask to run AppCritical tool to check your network health between media server and client.
08-20-2013 08:36 AM
Hi All,
Thanks for the suggestions. Please find the log details from client.
08-20-2013 10:01 AM
seeing a lot of these in your bpcd log
ABC is not a master server
ABC is not a media server either
FTL - BPCD EXIT STATUS 46
Server access denied
seems you need to ABC to client YYYY's Server list
08-26-2013 08:13 AM
hi
ABC is actually the domain name. Please ignore the previous logs and find the latest logs from the client and media server.
weekend full backups have failed again with 24, however differentials are done without any issues.
Thanks
08-27-2013 11:50 AM
bpbrm handle_backup: client CLIENT1 EXIT STATUS = 24: socket write failed
http://www.symantec.com/business/support/index?page=content&id=TECH150369
Solution
1. Change client read timeout parameter from 300 to 9600
2. Change Communication buffer size from 32K to 128K. Go to Host Properties > Clients > Client Properties > Windows Client > ClientSettings > Communication buffer size = 128
3. If antivirus software is running, disable it troubleshooting proposes.
4. Disable autotuning and chimney features. From a command prompt, run:
netsh int tcp set global autotuning=disabled
(on Windows Server 2003) netsh int tcp set global chimney=disabled
(on Windows Server 2008) netsh int ip set global chimney DISABLED
5. Create the registry key TcpTimedWaitDelay (of type REG_DWORD) in HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters and set the value to 30 seconds.
Reference: http://technet.microsoft.com/en-us/library/cc757512(WS.10).aspx
6. Reboot the server.
08-29-2013 05:36 AM
Thanks wr, we have forwarded the recommendations to Windows team.
However, i would like to highlight that around 10 clients among 25 clients across 3 policies are failing with the same error. Sometimes the differential backups failing with 24, complete upon multiple restarts.I m not able to find the root cause. Please help.
08-29-2013 05:45 AM
Also, the following 2 errors in bpbkar are common in all those clients.
[20708.19300] <16> tar_tfi::processException:
An Exception of type [SocketWriteException] has occured at:
Module: @(#) $Source: src/ncf/tfi/lib/TransporterRemote.cpp,v $ $Revision: 1.54.126.1 $ , Function: TransporterRemote::write[2](), Line: 338
Module: @(#) $Source: src/ncf/tfi/lib/Packer.cpp,v $ $Revision: 1.90.44.1 $ , Function: Packer::getBuffer(), Line: 652
Module: tar_tfi::getBuffer, Function: D:\NB\NB_7.5.0.3\src\cl\clientpc\util\tar_tfi.cpp, Line: 311
Local Address: [0.0.0.0]:0
Remote Address: [0.0.0.0]:0
OS Error: 10060 (A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
)
Expected bytes: 131072
10:31:00.424 AM: [20708.19300] <2> tar_base::V_vTarMsgW: FTL - socket write failed
10:31:00.424 AM: [20708.19300] <4> tar_backup::backup_done_state: INF - number of file directives not found: 0
10:31:00.424 AM: [20708.19300] <4> tar_backup::backup_done_state: INF - number of file directives found: 3
10:31:00.424 AM: [20708.21620] <4> tar_base::keepaliveThread: INF - keepalive thread terminating (reason: WAIT_OBJECT_0)
10:31:00.424 AM: [20708.19300] <4> tar_base::stopKeepaliveThread: INF - keepalive thread has exited. (reason: WAIT_OBJECT_0)
10:31:00.424 AM: [20708.19300] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 24: socket write failed
1
===============================================
<16> dtcp_write: TCP - failure: send socket (1772) (TCP 10054: Connection reset by peer)
5:57:14.528 AM: [6780.9912] <16> dtcp_write: TCP - failure: attempted to send 220 bytes
5:57:14.543 AM: [6780.9912] <16> dtcp_write: TCP - failure: send socket (1772) (TCP 10054: Connection reset by peer)
5:57:14.543 AM: [6780.9912] <16> dtcp_write: TCP - failure: attempted to send 220 bytes
5:57:14.559 AM: [6780.9912] <16> dtcp_write: TCP - failure: send socket (1772) (TCP 10054: Connection reset by peer)
5:57:14.559 AM: [6780.9912] <16> dtcp_write: TCP - failure: attempted to send 220 bytes
5:57:14.575 AM: [6780.9912] <16> dtcp_write: TCP - failure: send socket (1772) (TCP 10054: Connection reset by peer)