cancel
Showing results for 
Search instead for 
Did you mean: 

NetBackup Windows 2003 SQL client suddenly 'FTL - socket write failed'

Michael__
Level 3

Hi,

I have a backup solution with 1x Solaris master and 4x Solaris media servers (all 6.5.6).   One of the servers (Windows 2003 SP2 32 bit w/ SQL 2008 Netbackup 6.5.4) has been backing up fine for sometime and has suddenly started ending with status '(24) socket write failed'.

Below is a snippet of the bpbkar log, has anyone seen this before?   I've tried all sorts like turning off TCP chimney, turning of checksum offload, updating the NIC drivers etc.. Still fails! Nothing has changed server side either.

Mike

 

BPBKAR  NetBackup Backup/Archive  6.5GA  [May  1 2009]
Copyright 1993 - 2007 VERITAS Software Corporation
All Rights Reserved.

15:32:08.443: [5580.5240] <2> WinMain: DAT - _pgmptr = 'C:\Program Files\Veritas\NetBackup\bin\bpbkar32.exe'
15:32:08.443: [5580.5240] <2> WinMain: DAT - lpCmdLine = '-r 2419200 -ru root -dt 512225 -to 0 -clnt {REMOVED} -class 002_Clients -sched Diff-Incremental -st INCR -bpstart_to 300 -bpend_to 300 -read_to 300 -blks_per_buffer 512 -use_otm -fso -b {REMOVED}_1312468317 -kl 28 -WOFB_enabled -WOFB_fim 0 -WOFB_usage 0 -WOFB_error 0 -ct 13 -use_ofb '
15:32:08.443: [5580.5240] <2> date_debug: DAT - timezone: GMT Standard Time, offset=0, dst: GMT Daylight Time
15:32:08.443: [5580.5240] <2> date_debug: DAT - current time: 1312468328, 04/08/2011 15:32:08
15:32:08.443: [5580.5240] <2> date_debug: DAT - 01/01/94 UCT:  757382400, 01/01/1994 01:00:00
15:32:08.443: [5580.5240] <2> date_debug: DAT - 07/01/94 UCT:  773020800, 01/07/1994 01:00:00
15:32:08.443: [5580.5240] <2> WinMain: DAT - standard input handle = 536
15:32:08.443: [5580.5240] <2> WinMain: DAT - standard output handle = 1916
15:32:08.443: [5580.5240] <2> WinMain: DAT - standard error handle = 1856
15:32:08.474: [5580.5240] <4> tar_backup::V_SetupJobData: INF - dwJobData: ffffffff
15:32:08.474: [5580.5240] <4> tar_backup::V_SetupJobData: INF -     dwJob: ffffffff
15:32:08.490: [5580.5240] <4> dos_backup::V_PreProcessing: INF - backup privileges enabled, previous = 0
15:32:08.490: [5580.5240] <4> dos_backup::V_PreProcessing: INF - restore privileges enabled, previous = 0
15:32:08.490: [5580.5240] <4> dos_backup::V_PreProcessing: INF - security privileges enabled, previous = 0
15:32:08.490: [5580.5240] <4> dos_backup::V_PreProcessing: INF - tcb privileges enabled, previous = 0
15:32:08.490: [5580.5240] <4> dos_backup::V_PreProcessing: INF - create token privileges enabled, previous = 0
15:32:08.490: [5580.5240] <4> dos_backup::V_PreProcessing: INF - user name: root
15:32:08.490: [5580.5240] <4> dos_backup::V_PreProcessing: INF - no access token: scheduled backup
15:32:08.506: [5580.5240] <2> ov_log::V_GlobalLog: INF - BEDS_Init() Enter InitFlags:0x0
15:32:08.615: [5580.5240] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: C:
15:32:08.615: [5580.5240] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: D:
15:32:08.615: [5580.5240] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: E:
15:32:08.615: [5580.5240] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Microsoft Terminal Services
15:32:08.615: [5580.5240] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Microsoft Windows Network
15:32:08.615: [5580.5240] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Web Client Network
15:32:08.615: [5580.5240] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Oracle
15:32:08.615: [5580.5240] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Shadow?Copy?Components
15:32:08.615: [5580.5240] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: System?State
15:32:08.615: [5580.5240] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Active Directory Application Mode
15:32:10.849: [5580.5240] <2> tar_backupt_tfi::create: TAR - Incremental backup using the archive bit
15:32:10.928: [5580.5240] <2> tar_backup::V_SetupProcessContinue: TAR - CONTINUE BACKUP received
15:32:10.928: [5580.5240] <2> tar_backup_tfi::setupFileDirectives: TAR - backup filename = ALL_LOCAL_DRIVES
15:32:10.928: [5580.5668] <4> tar_base::keepaliveThread: INF - keepalive thread is active with an interval of 60 seconds
15:32:11.959: [5580.5240] <2> tar_backup_vxbsa::add: INF - called with '/'
15:32:11.959: [5580.5240] <4> dos_backup::V_VerifyFileSystem: INF - Added: C:\
15:32:11.959: [5580.5240] <4> dos_backup::V_VerifyFileSystem: INF - Added: D:\
15:32:11.959: [5580.5240] <4> dos_backup::V_VerifyFileSystem: INF - Added: E:\
15:32:11.959: [5580.5240] <4> dos_backup::V_VerifyFileSystem: INF - Added: Shadow Copy Components:\
15:32:11.959: [5580.5240] <4> dos_backup::V_VerifyFileSystem: INF - Added: System State:\
15:32:15.553: [5580.5240] <4> dos_backup::V_VerifyFileList: INF - UBS Local Type for 'System State:\' --> 1002002d
15:32:15.553: [5580.5240] <4> dos_backup::V_VerifyFileList: INF - UBS Local Type for '_BACKUP_SPECIAL_OBJECTS AFTER System State:' --> 1002002d
15:32:15.553: [5580.5240] <4> dos_backup::V_VerifyFileList: INF - UBS Local Type for 'D:\' --> 10020002
15:32:15.553: [5580.5240] <4> dos_backup::V_VerifyFileList: INF - UBS Local Type for 'E:\' --> 10020002
15:32:17.615: [5580.5240] <4> dos_backup::V_VerifyFileList: INF - UBS Local Type for 'Shadow Copy Components:\' --> 10020027
15:32:17.615: [5580.5240] <4> dos_backup::V_VerifyFileList: INF - UBS Local Type for 'C:\' --> 10020002
15:32:19.459: [5580.5240] <4> VssSnapshot::V_VssSetMetadata(): INF - Unable to locate DFSR metadata
15:32:20.912: [5580.5240] <4> dos_backup::V_InitializeShadowCopy: INF - ============================================
15:32:20.912: [5580.5240] <4> dos_backup::V_InitializeShadowCopy: INF - ShadowCopy PrePost: Begin
15:32:22.599: [5580.5240] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - Drive 'C:' added to VSS Volume Set
15:32:22.646: [5580.5240] <4> SnapshotAssociation::SnapshotAssociation: INF - VSS_MAXIMUM_STORAGE_SIZE is unavailable or the wrong type, staying with the default action of limiting unbounded associations
15:32:22.646: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - checking association status for drive "C:\"
15:32:22.662: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - QueryDiffAreasForVolume returned failure status; no Diff areas for this volume
15:32:40.084: [5580.5240] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - VSS Volume Set snapped
15:32:40.084: [5580.5240] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - About to commence post-snap unbounded association checks
15:32:40.115: [5580.5240] <4> SnapshotAssociation::SnapshotAssociation: INF - VSS_MAXIMUM_STORAGE_SIZE is unavailable or the wrong type, staying with the default action of limiting unbounded associations
15:32:40.115: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - checking association status for drive "C:\"
15:32:40.131: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - QueryDiffAreasForVolume returned failure status; no Diff areas for this volume
15:32:40.131: [5580.5240] <4> dos_backup::V_InitializeShadowCopy: INF - ShadowCopy PrePost: End
15:32:40.131: [5580.5240] <4> dos_backup::V_InitializeShadowCopy: INF - ============================================
15:32:40.131: [5580.5240] <4> VssSnapshot::V_VssSetMetadata(): INF - Unable to locate DFSR metadata
15:32:46.818: [5580.5240] <4> dos_backup::tfs_scanstart: INF - Backup Special Objects request AFTER System State:
15:32:48.021: [5580.5240] <4> dos_backup::tfs_findfirst: WRN - unable to find first file: 'System State:\'
15:32:48.021: [5580.5240] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - Drive 'C:' added to VSS Volume Set
15:32:48.037: [5580.5240] <4> SnapshotAssociation::SnapshotAssociation: INF - VSS_MAXIMUM_STORAGE_SIZE is unavailable or the wrong type, staying with the default action of limiting unbounded associations
15:32:48.037: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - checking association status for drive "C:\"
15:32:48.053: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - QueryDiffAreasForVolume returned failure status; no Diff areas for this volume
15:32:48.053: [5580.5240] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - Drive 'c:' added to VSS Volume Set
15:32:48.084: [5580.5240] <4> SnapshotAssociation::SnapshotAssociation: INF - VSS_MAXIMUM_STORAGE_SIZE is unavailable or the wrong type, staying with the default action of limiting unbounded associations
15:32:48.084: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - checking association status for drive "c:\"
15:32:48.099: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - QueryDiffAreasForVolume returned failure status; no Diff areas for this volume
15:32:56.881: [5580.5240] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - VSS Volume Set snapped
15:32:56.881: [5580.5240] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - About to commence post-snap unbounded association checks
15:32:56.896: [5580.5240] <4> SnapshotAssociation::SnapshotAssociation: INF - VSS_MAXIMUM_STORAGE_SIZE is unavailable or the wrong type, staying with the default action of limiting unbounded associations
15:32:56.896: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - checking association status for drive "C:\"
15:32:57.006: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - QueryDiffAreasForVolume returned failure status; no Diff areas for this volume
15:32:57.021: [5580.5240] <4> SnapshotAssociation::SnapshotAssociation: INF - VSS_MAXIMUM_STORAGE_SIZE is unavailable or the wrong type, staying with the default action of limiting unbounded associations
15:32:57.021: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - checking association status for drive "c:\"
15:32:57.037: [5580.5240] <4> SnapshotAssociation::AdjustUnboundedAssocations: INF - QueryDiffAreasForVolume returned failure status; no Diff areas for this volume
15:32:57.037: [5580.5240] <4> backup_create: INF - NetBackup Temp Directory: 'C:\Program Files\Veritas\\NetBackup\Temp'
15:32:58.224: [5580.5240] <2> ov_log::V_GlobalLogEx: INF - beds_ss_access::V_OpenForRead(): FS_OpenObj() Successful.
15:33:06.381: [5580.5240] <4> dos_backup::tfs_include: INF - folder (System Files) has been created recently (since 29/07/2011 16:15:05).  It will be backed up in full.
15:33:06.381: [5580.5240] <2> ov_log::V_GlobalLogEx: INF - beds_ss_access::V_OpenForRead(): FS_OpenObj() Successful.
15:33:14.740: [5580.5240] <2> ov_log::V_GlobalLogEx: INF - beds_ss_access::V_OpenForRead(): FS_OpenObj() Successful.
15:38:15.162: [5580.5240] <32> TransporterRemote::write[2](): FTL - SocketWriteException: send() call failed, could not write data to the socket, possible broken connection.
15:38:15.162: [5580.5240] <16> NBUException::traceException(): (
An Exception of type [Symantec::NetBackup::Ncf::OperationFailedException] was thrown. Details about the exception follow...:
Error code  = (-1008).
Src file    = (D:\654\src\cl\clientpc\util\tar_tfi.cpp).
Src Line    = (275).
Description = (%s getBuffer operation failed).
Operation type=().
)
15:38:15.162: [5580.5240] <16> NBUException::traceException(): (
An Exception of type [Symantec::NetBackup::Ncf::SocketWriteException] was thrown. Details about the exception follow...:
Error code  = (-1027).
Src file    = (TransporterRemote.cpp).
Src Line    = (310).
Description = (send() call failed, could not write data to the socket, possible broken connection).
Local IP=(). Remote IP=(). Remote Port No.=(0).
No. of bytes to write=(32768) while No. of bytes written=(0).
)
15:38:15.162: [5580.5240] <2> tar_base::V_vTarMsgW: FTL - socket write failed
15:38:15.162: [5580.5240] <4> tar_backup::backup_done_state: INF - number of file directives not found: 0
15:38:15.162: [5580.5240] <4> tar_backup::backup_done_state: INF -     number of file directives found: 4
15:38:15.162: [5580.5668] <4> tar_base::keepaliveThread: INF - keepalive thread terminating (reason: WAIT_OBJECT_0)
15:38:15.162: [5580.5240] <4> tar_base::stopKeepaliveThread: INF - keepalive thread has exited. (reason: WAIT_OBJECT_0)
15:38:15.162: [5580.5240] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 24: socket write failed
15:38:15.162: [5580.5240] <4> tar_backup::backup_done_state: INF - Not waiting for server status
15:38:15.334: [5580.5240] <4> dos_backup::tfs_reset: INF - Snapshot deletion start
15:38:15.537: [5580.5240] <4> OVStopCmd: INF - EXIT - status = 0
15:38:15.537: [5580.5240] <2> tar_base::V_Close: closing...
15:38:15.537: [5580.5240] <4> dos_backup::tfs_reset: INF - Snapshot deletion start
15:38:15.943: [5580.5240] <2> ov_log::V_GlobalLog: INF - BEDS_Term(): enter - InitFlags:0x00000001
15:38:29.959: [5580.5240] <4> OVShutdown: INF - Finished process
15:38:29.990: [5580.5240] <4> WinMain: INF - Exiting C:\Program Files\Veritas\NetBackup\bin\bpbkar32.exe
15:38:31.990: [5580.5240] <4> ov_log::OVClose: INF - Closing log file: C:\Program Files\Veritas\NetBackup\logs\BPBKAR\080411.LOG 

11 REPLIES 11

Yogesh9881
Level 6
Accredited

What NBU version on Client ?

Do u hav enough disk space for VSS ??

kindly run below command from master server & post o/p.

/usr/openv/netbackup/bin/admincmd$ ./bptestbpcd -client <client> -verbose -debug

Michael__
Level 3

Hi, versions are as mentioned above.. Client is 6.5.4, master/media are 6.5.6.

C:\ drive is 33.8GB total with 10GB free.

Output from bptestbpcd:

18:05:10.335 [7920] <2> bptestbpcd: VERBOSE = 0
18:05:10.376 [7920] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
18:05:10.376 [7920] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpcd
18:05:10.517 [7920] <2> logconnections: BPCD CONNECT FROM XXXXXXXX.35001 TO XXXXXXXX.13724
18:05:10.518 [7920] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM XXXXXXXX.35002 TO XXXXXXXX.13724 fd = 5
18:05:10.571 [7920] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.535: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
18:05:10.723 [7920] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.552: ipc_string: 4344
1 1 1
XXXXXXXX:35001 -> XXXXXXXX:13724
XXXXXXXX:35002 -> XXXXXXXX:13724
18:05:10.842 [7920] <2> bpcr_get_peername_rqst: Server peername length = 27
18:05:10.902 [7920] <2> bpcr_get_hostname_rqst: Server hostname length = 14
18:05:10.962 [7920] <2> bpcr_get_clientname_rqst: Server client name length = 14
18:05:11.022 [7920] <2> bpcr_get_version_rqst: bpcd version: 06540000
18:05:11.082 [7920] <2> bpcr_get_platform_rqst: Server client platform length = 2
18:05:11.142 [7920] <2> bpcr_get_version_rqst: bpcd version: 06540000
18:05:11.202 [7920] <2> bpcr_patch_version_rqst: theRest == > <
18:05:11.203 [7920] <2> bpcr_get_version_rqst: bpcd version: 06540000
18:05:11.262 [7920] <2> bpcr_patch_version_rqst: theRest == > <
18:05:11.263 [7920] <2> bpcr_get_version_rqst: bpcd version: 06540000
PEER_NAME = {REMOVED}
HOST_NAME = {REMOVED}
CLIENT_NAME = {REMOVED}
VERSION = 0x06540000
PLATFORM = nt
PATCH_VERSION = 6.5.4.0
SERVER_PATCH_VERSION = 6.5.4.0
MASTER_SERVER = {REMOVED}
EMM_SERVER = {REMOVED}
18:05:11.323 [7920] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM XXXXXXXX.35011 TO XXXXXXXX.13724 fd = 6
18:05:11.358 [7920] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.535: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
18:05:11.623 [7920] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.552: ipc_string: 4347
XXXXXXXX:35011 -> XXXXXXXX:13724
<2>bptestbpcd: EXIT status = 0
18:05:11.629 [7920] <2> bptestbpcd: EXIT status = 0

Michael__
Level 3

As mentioned, I've tried turning off the TCP chimney, upgrading NIC drivers etc.   Just checked the master, media server and the client they're all set to full duplex and the correct port speed.

No idea where to go or what to try next.. Mike.

CRZ
Level 6
Employee Accredited Certified

This won't help too much, but comparing log snippets, this TechNote almost exactly describes your issue:

Status code 24 on Windows 2003 client.

 http://www.symantec.com/docs/TECH127930

The conclusion in this doc is that it isn't NetBackup, but something hardware-related in the network between your server and client.  Probably not what you wanted to hear but it does seem to make sense, logically...

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

What is the Client Read Timeout on the media server backing up this client?

The default is 300 (5 minutes).

You will notice that there is 5 minutes between last activity on the client and the failure:

15:33:14.740: [5580.5240] <2> ov_log::V_GlobalLogEx: INF - beds_ss_access::V_OpenForRead(): FS_OpenObj() Successful.
15:38:15.162: [5580.5240] <32> TransporterRemote::write[2](): FTL - SocketWriteException: send() call failed, could not write data to the socket, possible broken connection.

 

Please also check bpbrm on media server for evidence of timeout.

Increase the timeout to 900 and see if that makes a difference.
 

Michael__
Level 3

I ran a backup over the prod LAN last night which completed fine, then immediately ran another job over the backup LAN which started however was extremely slow (~159KB/s) and failed with the same error as before.  How odd.. Currently attempting another backup over the backup LAN and its running at around 600KB/s compared to it's usual 26,000KB/s+.

Michael__
Level 3

It just failed too.. Same entry as before in the bpbkar logs.

Yogesh9881
Level 6
Accredited

Can you check backup speed by changing STU ??

Michael__
Level 3

Another storage unit?  It used the same storage unit and same tape drive as the prod LAN backup from last night.

I've got networks checking the switch, port speed and duplex settings again. Just for info, the NIC card settings are:

802.11p QOS - Disable

Checksum Offload - None

Ethernet@WireSpeed - Enable

Flow Control - Auto

Jumbo MTU 1500

Large Send Offload - Disable

Speed & Duplex - Auto

Wakeup Capabilities - Both

WOL Speed - Auto