cancel
Showing results for 
Search instead for 
Did you mean: 

socket write failed  (24)

Niyaz1
Level 3

Hi all , i am stuck with the below backup failure , i have checked bptestbpcd it seems fine, i have disabled AV , i have network connection it is also fine.KIndly your suggestions.

Dec 31, 2020 11:27:38 AM - Error bptm (pid=9920) socket operation failed - 10054 (at ../child.c.1287)
Dec 31, 2020 11:27:38 AM - Critical bpbrm (pid=3928) from client PMCRABKPNAS01B: FTL - socket write failed
Dec 31, 2020 11:27:38 AM - Error bptm (pid=9920) unable to perform read from client socket, connection may have been broken
Dec 31, 2020 11:27:43 AM - Error bpbrm (pid=3928) could not send server status message to client
Dec 31, 2020 11:27:45 AM - Info bpbkar32 (pid=5592) done. status: 24: socket write failed
Dec 31, 2020 11:27:45 AM - end writing; write time: 0:00:28
socket write failed  (24)
Client OS : windows server 2019
netbackup verssion : 8.2
client version also 8.2

 

11 REPLIES 11

Niyaz1
Level 3

when checked at client BPCD logs i found the below error. howver the connection was successful.

11:05:53.983 [1384.6896] <16> dump_proxy_info: statusmsg: A SSL accept failed. Status: 5 Msg: A non-recoverable I/O error occurred. The ssl error queue was empty , nbu status = 7624, severity = 2
11:05:53.983 [1384.6896] <16> dump_proxy_info: local_proxy_info:

Hamza_H
Moderator
Moderator
   VIP   

Hello,

is the master is also the media server?

is the bptestbpcd successful from master and the media?

I can see the below error in detailled status :

socket operation failed - 10054

could you please share the whole job's details?

also the bptestbpcd (from master) and bpclntcmd -pn (from client).

have you made any changes recently? please try a clear cache on master,media and client.

also, please share bptestnetconn -s -a -v -p from the master, media and client.

 

Hi, thanks for your response.

checked bptestbpcd from master and media as well it is fine.

below is the job policy details


Policy Name: MCRA-ECM-FS01

Policy Type: MS-Windows
Active: yes
Effective date: 08/12/2020 14:35:16
Backup network drvs: no
Collect TIR info: no
Mult. Data Streams: no
Client Encrypt: no
Checkpoint: no
Policy Priority: 0
Max Jobs/Policy: Unlimited
Disaster Recovery: 0
Collect BMR info: no
Residence: MCRA_SO_FS_STO01-STU
Volume Pool: NetBackup
Server Group: *ANY*
Keyword: (none specified)
Data Classification: -
Residence is Storage Lifecycle Policy: no
Application Discovery: no
Discovery Lifetime: 0 seconds
ASC Application and attributes: (none defined)

Granular Restore Info: no
Ignore Client Direct: no
Use Accelerator: no
Optimized Backup: no
HW/OS/Client: Windows-x64 Windows PMCRABKPNAS01B

Include: \\PMCRAVFSCIFS02\ctxhome$


Schedule: Weekly-Full
Type: Full Backup
Calendar sched: Enabled
Included Dates-----------
Wednesday, Week 1
Wednesday, Week 2
Wednesday, Week 3
Wednesday, Week 4
Wednesday, Week 5
Excluded Dates----------
No specific exclude dates entered
No exclude days of week entered
Day 21 of month excluded
Synthetic: 0
Checksum Change Detection: 0
PFI Recovery: 0
Maximum MPX: 1
Retention Level: 3 (1 month)
Number Copies: 1
Fail on Error: 0
Residence: (specific storage unit not required)
Volume Pool: (same as policy volume pool)
Server Group: (same as specified for policy)
Residence is Storage Lifecycle Policy: 0
Daily Windows:
Sunday 00:30:00 --> Sunday 02:30:00
Monday 00:30:00 --> Monday 02:30:00
Tuesday 00:30:00 --> Tuesday 02:30:00
Wednesday 00:30:00 --> Wednesday 02:30:00
Thursday 00:30:00 --> Thursday 02:30:00
Friday 00:30:00 --> Friday 02:30:00
Saturday 00:30:00 --> Saturday 02:30:00

********************************************************************************************

From master;:

14:58:46.014 [14256.5596] <2> bptestbpcd: VERBOSE = 0
14:58:46.029 [14256.5596] <2> read_client: dname=., offline=0, online_at=0 offline_at=0 offlineres=0 onlineres_at=0 offlineres_at=0
14:58:46.029 [14256.5596] <2> read_client: dname=.., offline=0, online_at=0 offline_at=0 offlineres=0 onlineres_at=0 offlineres_at=0
14:58:46.029 [14256.5596] <2> read_client: dname=CO_0, offline=0, online_at=0 offline_at=0 offlineres=0 onlineres_at=0 offlineres_at=0
14:58:46.029 [14256.5596] <2> read_client: dname=host_info, offline=0, online_at=0 offline_at=0 offlineres=0 onlineres_at=0 offlineres_at=0
14:58:46.029 [14256.5596] <2> read_client: dname=OA_0, offline=0, online_at=0 offline_at=0 offlineres=0 onlineres_at=0 offlineres_at=0
14:58:46.045 [14256.5596] <2> vnet_pbxConnect_ex: pbxConnectExEx Succeeded
14:58:46.045 [14256.5596] <4> create_user_group_id_marker_WIN: connecting to named pipe:\\.\pipe\{0D9DC596-B9C4-481C-AD04-B9B304AD4F2C}:OUTBOUND
14:58:46.045 [14256.5596] <4> create_user_group_id_marker_WIN: successfully connected to server named pipe
14:58:46.592 [14256.5596] <2> logconnections: PROXY CONNECT FROM 10.161.4.10.61560 TO 10.161.5.138.1556 fd = 172
14:58:46.592 [14256.5596] <2> logconnections: BPCD CONNECT FROM 127.0.0.1.53657 TO 127.0.0.1.61563 fd = 172
14:58:46.592 [14256.5596] <2> vnet_connect_to_vnetd_bpcd: js_bpcd_info: 0000000004414D10
14:58:49.623 [14256.5596] <2> vnet_pbxConnect_ex: pbxConnectExEx Succeeded
14:58:49.639 [14256.5596] <4> create_user_group_id_marker_WIN: connecting to named pipe:\\.\pipe\{32DB0825-80B9-4583-AC59-B1F660CCAA0B}:OUTBOUND
14:58:49.639 [14256.5596] <4> create_user_group_id_marker_WIN: successfully connected to server named pipe
14:58:52.389 [14256.5596] <8> do_pbx_service: [vnet_connect.c:3510] via PBX VNETD CONNECT FROM 10.161.4.10.61564 TO 10.161.5.138.1556 fd = 852
14:58:52.389 [14256.5596] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:476] VN_REQUEST_CONNECT_FORWARD_SOCKET 10 0xa
14:58:52.389 [14256.5596] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:501] ipc_string 56227
14:58:52.436 [14256.5596] <2> bpcr_get_version_rqst: bpcd version: 08200000
1 1 1
127.0.0.1:53657 -> 127.0.0.1:61563 PROXY 10.161.4.10:61560 -> 10.161.5.138:1556
127.0.0.1:53125 -> 127.0.0.1:61569 PROXY 10.161.4.10:61564 -> 10.161.5.138:1556
14:58:52.451 [14256.5596] <2> bpcr_get_version_rqst: bpcd version: 08200000
14:58:52.482 [14256.5596] <2> bpcr_get_peername_rqst: Server peername length = 13
14:58:52.529 [14256.5596] <2> bpcr_get_hostname_rqst: Server hostname length = 14
14:58:52.592 [14256.5596] <2> bpcr_get_clientname_rqst: Server clientname length = 14
14:58:52.654 [14256.5596] <2> bpcr_get_version_rqst: bpcd version: 08200000
14:58:52.670 [14256.5596] <2> bpcr_get_platform_rqst: Server platform length = 7
14:58:52.686 [14256.5596] <2> bpcr_get_version_rqst: bpcd version: 08200000
14:58:52.701 [14256.5596] <2> bpcr_patch_version_rqst: theRest == > <
14:58:52.701 [14256.5596] <2> bpcr_get_version_rqst: bpcd version: 08200000
14:58:52.732 [14256.5596] <2> bpcr_patch_version_rqst: theRest == > <
14:58:52.732 [14256.5596] <2> bpcr_get_version_rqst: bpcd version: 08200000
LOCAL_CERT_ISSUER_NAME = O=vx,OU=root@PMCRABKPMST01,CN=broker
LOCAL_CERT_SUBJECT_COMMON_NAME = da0cd8c6-d954-4c49-99fe-143af177f77d
PEER_CERT_ISSUER_NAME = O=vx,OU=root@PMCRABKPMST01,CN=broker
PEER_CERT_SUBJECT_COMMON_NAME = 6f68e22c-fcf2-4a70-ace4-80241c2161f8
PEER_NAME = PMCRABKPMST01
HOST_NAME = pmcrabkpnas01b
CLIENT_NAME = pmcrabkpnas01b
VERSION = 0x08200000
PLATFORM = win_x64
PATCH_VERSION = 8.2.0.0
SERVER_PATCH_VERSION = 8.2.0.0
MASTER_SERVER = PMCRABKPMST01
EMM_SERVER = PMCRABKPMST01
NB_MACHINE_TYPE = CLIENT
SERVICE_TYPE = VNET_DOMAIN_CLIENT_TYPE
PROCESS_HINT = 6f68e22c-fcf2-4a70-ace4-80241c2161f8
14:58:52.826 [14256.5596] <2> vnet_connect_to_vnetd_bpcd: js_bpcd_info: 00000000044181F0
14:58:55.857 [14256.5596] <2> vnet_pbxConnect_ex: pbxConnectExEx Succeeded
14:58:55.889 [14256.5596] <4> create_user_group_id_marker_WIN: connecting to named pipe:\\.\pipe\{B6C24A6E-0752-40D2-9211-5050FA1CC3C3}:OUTBOUND
14:58:55.889 [14256.5596] <4> create_user_group_id_marker_WIN: successfully connected to server named pipe
14:58:55.951 [14256.5596] <8> do_pbx_service: [vnet_connect.c:3510] via PBX VNETD CONNECT FROM 10.161.4.10.61570 TO 10.161.5.138.1556 fd = 180
14:58:55.951 [14256.5596] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:476] VN_REQUEST_CONNECT_FORWARD_SOCKET 10 0xa
14:58:55.951 [14256.5596] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:501] ipc_string 56231
127.0.0.1:53657 -> 127.0.0.1:61573 PROXY 10.161.4.10:61570 -> 10.161.5.138:1556
<2>bptestbpcd: EXIT status = 0
14:58:55.982 [14256.5596] <2> bptestbpcd: EXIT status = 0
***************************************************************************************************************************

C:\Program Files\Veritas\NetBackup\bin>bpclntcmd -pn
expecting response from server PMCRABKPMST01
PMCRABKPNAS01B PMCRABKPNAS01B 10.161.5.138 56234

***************************************************************************************************************************

 

 

quebek
Moderator
Moderator
   VIP    Certified

hey

for errors 10054 - I would looked at NIC drives and its firmware on all servers - so client, media and master (if separate from media).

check also the event logs on the client...

Hamza_H
Moderator
Moderator
   VIP   

Hello,

you shared policy details, I asked for the backup job details (detailled status).

From the client's name, I can see that is a NAS (filer?), I asked for the detailled status to check if there is a delay because the rror 10054 is a Microsoft Winsock error. there is a delay/reset somewhere or an idle drop socket. did you try to rise timeout?

more details about error 10054 : https://www.veritas.com/support/en_US/article.100018361

 

also you didn't provide the output of the command line bptestnetconn -s -a -v -p .

Below is the job details.

Dec 31, 2020 11:26:46 AM - Info nbjm (pid=11892) starting backup job (jobid=109657) for client PMCRABKPNAS01B, policy MCRA-ECM-FS01, schedule Weekly-Full
Dec 31, 2020 11:26:46 AM - Info nbjm (pid=11892) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=109657, request id:{53FC0925-10B0-4055-AF08-0EF5CFA2B2AF})
Dec 31, 2020 11:26:46 AM - requesting resource  MCRA_SO_FS_STO01-STU
Dec 31, 2020 11:26:46 AM - requesting resource  PMCRABKPMST01.NBU_CLIENT.MAXJOBS.PMCRABKPNAS01B
Dec 31, 2020 11:26:46 AM - granted resource  PMCRABKPMST01.NBU_CLIENT.MAXJOBS.PMCRABKPNAS01B
Dec 31, 2020 11:26:46 AM - granted resource  MediaID=@aaaak;DiskVolume=MCRA_SO_FS_STO01;DiskPool=MCRA_SO_FS_STO01;Path=MCRA_SO_FS_STO01;StorageServer=COFC-PMCRASTORNCE01;MediaServer=pmcrabkpmed02
Dec 31, 2020 11:26:46 AM - granted resource  MCRA_SO_FS_STO01-STU
Dec 31, 2020 11:26:47 AM - estimated 736458104 kbytes needed
Dec 31, 2020 11:26:47 AM - Info nbjm (pid=11892) started backup (backupid=PMCRABKPNAS01B_1609403207) job for client PMCRABKPNAS01B, policy MCRA-ECM-FS01, schedule Weekly-Full on storage unit MCRA_SO_FS_STO01-STU
Dec 31, 2020 11:26:48 AM - started process bpbrm (pid=3928)
Dec 31, 2020 11:27:00 AM - Info bpbrm (pid=3928) PMCRABKPNAS01B is the host to backup data from
Dec 31, 2020 11:27:00 AM - Info bpbrm (pid=3928) reading file list for client
Dec 31, 2020 11:27:01 AM - connecting
Dec 31, 2020 11:27:06 AM - Info bpbrm (pid=3928) starting bpbkar32 on client
Dec 31, 2020 11:27:12 AM - connected; connect time: 0:00:00
Dec 31, 2020 11:27:15 AM - Info bpbkar32 (pid=5592) Backup started
Dec 31, 2020 11:27:15 AM - Info bpbkar32 (pid=5592) change time comparison:<disabled>
Dec 31, 2020 11:27:15 AM - Info bpbkar32 (pid=5592) archive bit processing:<enabled>
Dec 31, 2020 11:27:15 AM - Info bptm (pid=6296) start
Dec 31, 2020 11:27:16 AM - Info bptm (pid=6296) using 262144 data buffer size
Dec 31, 2020 11:27:16 AM - Info bptm (pid=6296) setting receive network buffer to 1049600 bytes
Dec 31, 2020 11:27:16 AM - Info bptm (pid=6296) using 30 data buffers
Dec 31, 2020 11:27:16 AM - Info bptm (pid=6296) start backup
Dec 31, 2020 11:27:17 AM - Warning bpbrm (pid=3928) from client PMCRABKPNAS01B: WRN - can't open file: \\PMCRAVFSCIFS02\CTXPROFILES$\.copy_offload\.tokens (WIN32 5: Access is denied. )
Dec 31, 2020 11:27:17 AM - Info bptm (pid=6296) backup child process is pid 9920.8540
Dec 31, 2020 11:27:17 AM - Info bptm (pid=9920) start
Dec 31, 2020 11:27:17 AM - begin writing
Dec 31, 2020 11:27:38 AM - Error bptm (pid=9920) socket operation failed - 10054 (at ../child.c.1287)
Dec 31, 2020 11:27:38 AM - Critical bpbrm (pid=3928) from client PMCRABKPNAS01B: FTL - socket write failed
Dec 31, 2020 11:27:38 AM - Error bptm (pid=9920) unable to perform read from client socket, connection may have been broken
Dec 31, 2020 11:27:43 AM - Info pmcrabkpmed02 (pid=6296) StorageServer=hp-StoreOnceCatalyst:COFC-PMCRASTORNCE01; Report=scanned: 0 KB, CR sent: 0 KB, dedup: 0.00%
Dec 31, 2020 11:27:43 AM - Error bpbrm (pid=3928) could not send server status message to client
Dec 31, 2020 11:27:45 AM - Info bpbkar32 (pid=5592) done. status: 24: socket write failed
Dec 31, 2020 11:27:45 AM - end writing; write time: 0:00:28
socket write failed  (24)

******************************************************************

Below is the output of bptestnetconn from client  which is showing DNS Lookup failed for host pmcrabkpnas01b error:9003! , for others like master media servers the output is fine without any error.

And please note the  the backup was working fine previuosly ..

SERVER = PMCRABKPMST01
SERVER = PMCRABKPMED01
SERVER = PMCRABKPMED02
SERVER = PMCRABKPMED03
SERVER = PHCRABKPMED01
------------------------------------------------------------------------
NBU IP_ADDRESS_FAMILY configured to use Remote Addresses: IPv4(yes) IPv6(no)
FL: PMCRABKPMST01 -> 10.161.4.10 : 0 ms FAST (< 10 sec) SRC: ANY
FL: PMCRABKPMED01 -> 10.161.4.11 : 0 ms FAST (< 10 sec) SRC: ANY
FL: PMCRABKPMED02-> 10.161.4.12 : 0 ms FAST (< 10 sec) SRC: ANY
FL: PMCRABKPMED03-> 10.161.4.13 : 0 ms FAST (< 10 sec) SRC: ANY
FL: PHCRABKPMED01-> 10.161.9.11 : 0 ms FAST (< 10 sec) SRC: ANY
------------------------------------------------------------------------
RL: 10.161.4.10 -> PMCRABKPMST01 : 0 ms FAST (< 10 sec)
RL: 10.161.4.11 -> PMCRABKPMED01 : 0 ms FAST (< 10 sec)
RL: 10.161.4.12 -> PMCRABKPMED02 : 0 ms FAST (< 10 sec)
RL: 10.161.4.13 -> PMCRABKPMED03 : 0 ms FAST (< 10 sec)
RL: 10.161.9.11 -> PHCRABKPMED01 : 0 ms FAST (< 10 sec)
RL: 127.0.0.1 -> PMCRABKPNAS01 : 0 ms FAST (< 10 sec)
DNS Lookup failed for host pmcrabkpnas01b error:9003!
------------------------------------------------------------------------
Total elapsed time: 2 sec

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@Niyaz1 

I am trying to make sense of the network path that is used for this backup.

The data source seems to be on a UNC path:  \\PMCRAVFSCIFS02\ctxhome$ that is backed up through client name PMCRABKPNAS01B to a media server pmcrabkpmed02 using StoreOnce storage server PMCRASTORNCE01.
So, PMCRAVFSCIFS02 -> PMCRABKPNAS01B -> pmcrabkpmed02 -> PMCRASTORNCE01.
This looks like a lot of network hops.

Status 24 simply means that a connection was initially established, but then the connection got interrupted.

With this amount of network hops, it is difficult to troubleshoot.

Why do you need to backup through client PMCRABKPNAS01B?
Can PMCRAVFSCIFS02 not be mapped directly to MediaServer mcrabkpmed02?

What type of storage is PMCRAVFSCIFS02? Best will probably be to backup data directly off the NAS - either with NDMP policy type (if this is supported for this NAS device) or else the new Dynamic NAS feature.

@Marianne 

PMCRAVFSCIFS02 - This is netapp fileshare  storage 

PMCRABKPNAS01B - I am using this server to access the fileshare storage, this is the reason i am using this server to acess the file share from net app stoarage.

*** I am very glad to have ur reponse , i have troubleshooted many issue by your provided solutions.

Your advice for my issue will be  very valuble, thank you.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@Niyaz1 

No guarantee that I can help here - status 24 is one the most difficult errors to troubleshoot, simply because the issue is outside of NBU at the network level.

I have bookmarked this explanation of status codes 23/24/25 : https://vox.veritas.com/t5/NetBackup/Error-23/m-p/738836#M201891

It is possible to map the NAS storage to the media server and back it up from there (using same UNC path)?
In other words, use the media server as Client in the policy.
This will eliminate one network hop.

Another way to backup will be via NDMP protocol, using NDMP policy type. That is, if you have NBU licensing that allows NDMP.

For troubleshooting, you can create logs to get an idea of more or less where the break in comms is.

On the client, bpbkar log at level 3. 
bpbkar will log every filename that is read and sent to the media server.
This will tell us up to where the client was able to read from the NAS and comms were good with media server.

On the media server: bptm and bpbrm logs at level 3.
bpbrm will log comms with client, metadata received from the client and comms with master server bpdbm to update metadata.
bptm will log each data buffer received from the client and write actions to the storage.

So, these 3 logs will give you an idea of where the break in comms is.
You can also ask your network team to monitor comms between filer -> client -> media server -> storage
during backup window.

@Marianne 

Thank you i will try to reduce network hopsand check , but i would like to let u know the backup was working fine before suddenly this issue happened.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@Niyaz1 wrote:

....  but i would like to let u know the backup was working fine before suddenly this issue happened.


@Niyaz1 

I hear you.
Fact is - there is a network issue outside of NBU. It could simply be something wrong at OS-level on the client or the media server, causing network connection to drop.  With so many network hops, where do you start to troubleshoot?

Are ALL backups for this client (CIFS share and local drives) failing? Or just this CIFS one?
Or all / many backups going to this media server?

I remember back in W2003 days that TCP Chimney settings caused status 24. (Requirement is different for W2008 and above)

You only have to Google for Status 24 errors to see possible reasons:

https://www.veritas.com/support/en_US/article.100004801

https://vox.veritas.com/t5/NetBackup/NetBackup-Status-Code-24-Possible-Parameters-to-Check/td-p/6592...
(None of the Symantec links work anymore...)

https://www.veritas.com/support/en_US/article.100026326
(Very old, but worth checking.)

https://vox.veritas.com/t5/NetBackup/Netbackup-failure-with-code-24/td-p/576636
(Again Symantec links, but enough useful info.)

https://vox.veritas.com/t5/NetBackup/Backup-ends-in-Status-Code-24/m-p/364149
(Also old, but issue was caused by insufficient memory on media server.)

There is more if you want to Google...