10-13-2010 06:27 AM
I am having a problem backing a couple of Windows servers, both of which generate either a Status 156 or Status 150 error. The backup job fails on a volume that contains between 400 and 600GB of data, and one of the servers has a folder that contains million of small files.
Below are some server details:
<32> TransporterRemote::write[2](): FTL - SocketWriteException: send() call failed, could not write data to the socket, possible broken connection. 05:00:14.355: [4500.4244] <16> NBUException::traceException(): ( An Exception of type [Symantec::NetBackup::Ncf::OperationFailedException] was thrown. Details about the exception follow...: Error code = (-1008). Src file = (D:\654\src\cl\clientpc\util\tar_tfi.cpp). Src Line = (275). Description = (%s getBuffer operation failed). Operation type=(). ) 05:00:14.355: [4500.4244] <16> NBUException::traceException(): ( An Exception of type [Symantec::NetBackup::Ncf::SocketWriteException] was thrown. Details about the exception follow...: Error code = (-1027). Src file = (TransporterRemote.cpp). Src Line = (310). Description = (send() call failed, could not write data to the socket, possible broken connection). Local IP=(). Remote IP=(). Remote Port No.=(0). No. of bytes to write=(32768) while No. of bytes written=(0). ) 05:00:14.355: [4500.4244] <2> tar_base::V_vTarMsgW: FTL - socket write failed 05:00:14.355: [4500.4244] <16> dtcp_write: TCP - failure: send socket (1856) (TCP 10054: Connection reset by peer) 05:00:14.355: [4500.4244] <16> dtcp_write: TCP - failure: attempted to send 26 bytes 05:00:14.355: [4500.4244] <4> tar_backup::backup_done_state: INF - number of file directives not found: 0 05:00:14.355: [4500.4244] <4> tar_backup::backup_done_state: INF - number of file directives found: 1 05:00:14.355: [4500.5292] <4> tar_base::keepaliveThread: INF - keepalive thread terminating (reason: WAIT_OBJECT_0) 05:00:14.355: [4500.4244] <4> tar_base::stopKeepaliveThread: INF - keepalive thread has exited. (reason: WAIT_OBJECT_0) 05:00:14.355: [4500.4244] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 24: socket write failed 05:00:14.355: [4500.4244] <16> dtcp_write: TCP - failure: send socket (1856) (TCP 10054: Connection reset by peer) 05:00:14.355: [4500.4244] <16> dtcp_write: TCP - failure: attempted to send 42 bytes 05:00:14.355: [4500.4244] <4> tar_backup::backup_done_state: INF - Not waiting for server status 05:00:14.355: [4500.4244] <4> dos_backup::tfs_reset: INF - Snapshot deletion start 05:00:14.355: [4500.4244] <4> ov_log::OVLoop: INF - Cycling log file 05:00:14.355: [4500.4244] <4> ov_log::OVClose: INF - Closing log file: C:\Program Files\VERITAS\NetBackup\logs\BPBKAR\101210.LOG
10-13-2010 07:10 AM
Snapshot errors - 156 are widely related to timeout issues - Hence increase timeouts to 3600 sec (1Hour) or higher -
The other issue is a network connection issue seen many times on a Windows server when connection is borken as in the log - TCP 10054: Connection reset by peer
Make sure that NIC and switch are set at Full duplex if possible.
These are some basic commands which will change the NIC settings -
These features are found to cause unexpected socket drops during network load for backups.
- Run this command to check if those features are enabled
netsh int tcp show global
- Disable the features by running these commands
netsh int tcp set global autotuning=disabled
netsh int tcp set global chimney=disabled
Run backup -
The actual technet refers:
10-14-2010 09:19 AM
Hi Mohamed,
Thanks for your answer, I increased the Client Read Timeout values on the two affected clients to 3600 (1 hour). But last night the backup jobs failed again. I haven't changed any of the NIC settings on the client servers, because they are Production boxes and this would require a Change Request to be approved. But I do know that most, if not all of our servers are set to "Link Speed and Duplex = Auto Detect" and all of the "Offload" options will be enabled/on.
Just to clean things up I rebooted both of my Netbackup servers this morning, but I know that something network related is going on, so I don't expect the reboot to help, but a reboot is always good.
After looking through the performance tuning manual I discovered that high network / cpu utilisation on the master server can be fixed by changing the Master Server Firewall properties - see screenshot below. I have no experience in changing this, so I don't know what effect this change could have on all of my backups.
If I cannot fix this in the next day or so, I will have to contact Support.
10-14-2010 11:09 AM
Looking at the screenshot - remove the 'localhost' from the firewall configuration of NETBACKUP -
What was meant was the Firewall service for Windows -
Disable Windows Firewall from the Services configuration.
Increase the 'BPSTART_notify' timeout on the Master server - this should eleviate the 156 error code.
Also instead of DNS, update the hosts file of all the servers with the relevant hostname ans IP addresses.
...Good Luck