cancel
Showing results for 
Search instead for 
Did you mean: 

Error 24 socket write failed, Cannot write to STDOUT...Again !

Dackey
Level 4

Hello dear all ! ^^

I have read almost all post about this error, try a a lot of things without good result.

All the client are some virtual machine VMware.

Almost the client have an NBU agent install with the bp.conf feeld.

Détails: When i install the NBU agent, the certificate can't be deploy automatically ? why ?

 

ERROR 24, Socket Write Failed ! :

  • Test Port firewall 1556 / 13724 / 13782

Netstat –na | FINDSTR LISTEN : The port was closed

So with an Network administrator we have open it.

Test Backup: Same error.

 

  • Test interconnexion between Master and client :

Bptestbpcd sur dhcpServer, OK source (Master)

Bpclntcmd –hn dhcp, Source  (client) : ok

Bpclntcmd –ip dhcp, Source  (client) : ok

Bpclntcmd –pn, Source  (client) : ok

Bpclient –client <Client Name>, Source (master) : Error code 227: No entity was found => same thing on a server working good !

- IPv6 has been disabled by registery to be sure.

  • Proceed of a Veritas KB, Modify NIC Advanced parameters:

https://www.veritas.com/support/en_US/article.100004801

Put Disabled all offload parameter on NIC :  netsh int tcp set global chimney=disabled

Modify / Create registery keys :

HKey_LocalMachine\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters 

Value Name: TcpMaxDataRetransmissions

Data Type: REG_DWORD

Default: 5

Change to Value = 15    (Decimal)

 

Value Name: KeepAliveTime

Data Type: REG_DWORD

Default: 7200000

Change to Value = 300000    (Decimal)

 

Restart Server.

Backup Test : NOK, same error

 

  • Check Certifcate

          Client,  Search certificat :

nbcertcmd -displayCACertDetail -server mymaster, not really OK

Champ : CA Certificate State : Not Trusted

  • Deploy the certificate :

nbcertcmd -getCACertificate -server mymaster

 

Check Certficate :

nbcertcmd -displayCACertDetail -server mymaster

Champ : CA Certificate State :  Trusted !!! Yes

=>  Backup Test : NOK, same error !

6) Stupid test, uninnstall agent v8.1.2:

Test backup: Same error.

 

I don’t know how to do know….please help me ^^ 

 

 

 

 

 

12 REPLIES 12

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@Dackey 

Please show us all text in Job Details of failed backup.

Status 24 is difficult to troubleshoot, because connection is normally fine and backup starts writing.
After a while, the connection is broken. Something outside of NetBackup.

The Job details should help us with troubleshooting to see how long the backup runs and which processes on media server and client are involved.
You can then go to the relevant logs to get an idea where the break in network connection took place - client or media server. This will determine where to look next.

Hey Marianne ^^

I was waiting you ;)

Which logs do you want to see ?

bpbrm / bpkar / bptm /  ... ?

You must know it's a confidential environement so i just ca give you some part of logs with the name of server and ip change.

Logs :
bpbrm:
10:04:01.491 [18668.12580] <2> read_client: opendir() failed: Client.FQDN: No such file or directory (2)
10:04:01.491 [18668.12580] <2> check_dynamic_client_and_set_tss: Client entry for client Client.FQDN not found in client DB
10:04:01.491 [18668.12580] <2> bpbrm main: Cleaning cs_cache for client:[Master.FQDN] older than:[30] sec

bpbkar:
10:28:25.689 [8700.12604] <16> dtcp_write: TCP - failure: send socket (580) (TCP 10054: Connection reset by peer)
10:28:25.689 [8700.12604] <16> dtcp_write: TCP - failure: attempted to send 1 bytes

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Please post all text in Job Details. 
This will tell us which logs on which server and client, PIDs and timestamps. 

The jog:

30-May-2020 18:48:18 - Info bpbrm (pid=243561) Client.FQDN is the host to backup data from
30-May-2020 18:48:18 - Info bpbrm (pid=243561) reading file list for client
30-May-2020 18:48:21 - Info bpbrm (pid=243561) accelerator enabled
30-May-2020 18:48:27 - Info bpbrm (pid=243561) starting bpbkar on client
30-May-2020 18:48:28 - Info bpbkar (pid=223829) Backup started
30-May-2020 18:48:28 - Info bpbrm (pid=243561) bptm pid: 243580
30-May-2020 18:48:28 - Info bpbkar (pid=223829) INF - Backing up vCenter server ServerVMware.FQDN, ESX host Hyperviseur.FQDN, BIOS UUID 4236010e-bb66-5dfd-404f-d436f5a548c4, Instance UUID 50360deb-8bd4-d9eb-cc6f-5890b7a33007, Display Name Client, Hostname Client.FQDN
30-May-2020 18:48:28 - Info bptm (pid=243580) start
30-May-2020 18:48:31 - Info bptm (pid=243580) using 1048576 data buffer size
30-May-2020 18:48:31 - Info bptm (pid=243580) using 256 data buffers
30-May-2020 18:48:33 - Info bptm (pid=243580) start backup
30-May-2020 18:48:34 - Info bptm (pid=243580) backup child process is pid 243621
30-May-2020 18:49:37 - Info bpbkar (pid=223829) 0 entries sent to bpdbm
30-May-2020 18:52:44 - Info bpbkar (pid=223829) 95000 entries sent to bpdbm
30-May-2020 18:56:11 - Info bpbkar (pid=223829) 190000 entries sent to bpdbm
30-May-2020 19:02:08 - Info bpbkar (pid=223829) 285000 entries sent to bpdbm
30-May-2020 19:05:01 - Info nbjm (pid=5528) starting backup job (jobid=705362) for client Client.FQDN, policy PolicyName, schedule Full_xx_H_5S
30-May-2020 19:05:01 - Info nbjm (pid=5528) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=705362, request id:{2B57E509-A7AE-4C48-BA24-09717F5373EE})
30-May-2020 19:05:01 - requesting resource  StorageUnit
30-May-2020 19:05:01 - requesting resource  MasterServer.FQDN.NBU_CLIENT.MAXJOBS.Client.FQDN
30-May-2020 19:05:01 - requesting resource  MasterServer.FQDN.VMware.Datastore.ServerVMware.FQDN/DatacenterName/VolumeDiskName
30-May-2020 19:05:01 - requesting resource  MasterServer.FQDN.VMware.ESXserver.Hyperviseur.FQDN
30-May-2020 19:05:01 - granted resource  MasterServer.FQDN.NBU_CLIENT.MAXJOBS.Client.FQDN
30-May-2020 19:05:01 - granted resource  MasterServer.FQDN.VMware.Datastore.ServerVMware.FQDN/DatacenterName/VolumeDiskName
30-May-2020 19:05:01 - granted resource  MasterServer.FQDN.VMware.ESXserver.Hyperviseur.FQDN
30-May-2020 19:05:01 - granted resource  MediaID=@aaaac;DiskVolume=PureDiskVolume;DiskPool=DiskPoolAppliance;Path=PureDiskVolume;StorageServer=Appliance.FQDN;MediaServer=Appliance.FQDN
30-May-2020 19:05:01 - granted resource  StorageUnit
30-May-2020 19:05:01 - estimated 24292714 kbytes needed
30-May-2020 19:05:01 - Info nbjm (pid=5528) started backup (backupid=Client.FQDN_1590858301) job for client Client.FQDN, policy PolicyName, schedule Full_xx_H_5S on storage unit StorageUnit using backup host applibck-rgscd.wsd.tadfr.thales
30-May-2020 19:05:06 - started process bpbrm (pid=243561)
30-May-2020 19:05:18 - connecting
30-May-2020 19:05:19 - connected; connect time: 0:00:00
30-May-2020 19:05:26 - begin writing
30-May-2020 19:07:55 - Info bpbkar (pid=223829) 380000 entries sent to bpdbm
30-May-2020 19:11:56 - Info bpbkar (pid=223829) 427475 entries sent to bpdbm
30-May-2020 19:12:15 - Info bpbkar (pid=223829) INF - Transport Type =  nbd
30-May-2020 19:12:29 - Info bpbkar (pid=223829) 427525 entries sent to bpdbm
30-May-2020 19:28:58 - Error bpbrm (pid=243561) from client Client.FQDN: ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
30-May-2020 19:28:58 - Critical bpbrm (pid=243561) from client Client.FQDN: FTL - cleanup() failed, status 24
30-May-2020 19:28:59 - Error bptm (pid=243621) system call failed - Connection reset by peer (at ../child.c.1289)
30-May-2020 19:28:59 - Error bptm (pid=243621) unable to perform read from client socket, connection may have been broken
30-May-2020 19:29:00 - Error bptm (pid=243580) media manager terminated by parent process
30-May-2020 19:29:14 - Info Appliance.FQDN (pid=243580) StorageServer=PureDisk:Appliance.FQDN; Report=PDDO Stats for (Appliance.FQDN): scanned: 444565 KB, CR sent: 18218 KB, CR sent over FC: 0 KB, dedup: 95.9%, cache disabled
30-May-2020 19:29:14 - Error bpbrm (pid=243561) could not send server status message to client
30-May-2020 19:29:15 - Critical bpbrm (pid=243561) unexpected termination of client Client.FQDN
30-May-2020 19:29:16 - Info bpbkar (pid=0) done. status: 24: socket write failed
30-May-2020 19:46:08 - end writing; write time: 0:40:42
socket write failed  (24)

Oh !! the log bpbrm and bpkar are not the log for this job.

I have 10 servers in this case.

I don't understand , i have no log for this timestamp...

I will write here the details of a new job, i'm waiting the job ending.

But the error will be exactly the same.

sorry.

The backup is ending, the details job :

01-Jun-2020 14:01:01 - Info bpbrm (pid=8113) client.FQDN is the host to backup data from
01-Jun-2020 14:01:01 - Info bpbrm (pid=8113) reading file list for client
01-Jun-2020 14:01:02 - Info bpbrm (pid=8113) accelerator enabled
01-Jun-2020 14:01:02 - Info bpbrm (pid=8113) There is no complete backup image match with track journal, a regular full backup will be performed.
01-Jun-2020 14:01:03 - Info bpbrm (pid=8113) starting bpbkar on client
01-Jun-2020 14:01:04 - Info bpbkar (pid=344637) Backup started
01-Jun-2020 14:01:04 - Info bpbrm (pid=8113) bptm pid: 8235
01-Jun-2020 14:01:04 - Info bpbkar (pid=344637) INF - Backing up vCenter server vcenter.FQDN, ESX host Hyperviseur.FQDN, BIOS UUID 564d7d23-1e5c-32ff-d9a6-03780b525944, Instance UUID 5036437d-10bc-fdd1-cd52-69ce6f90a447, Display Name Client, Hostname client.FQDN
01-Jun-2020 14:01:04 - Info bptm (pid=8235) start
01-Jun-2020 14:01:05 - Info bptm (pid=8235) using 1048576 data buffer size
01-Jun-2020 14:01:05 - Info bptm (pid=8235) using 256 data buffers
01-Jun-2020 14:01:07 - Info bptm (pid=8235) start backup
01-Jun-2020 14:01:10 - Info bptm (pid=8235) backup child process is pid 8270
01-Jun-2020 14:02:04 - Info bpbkar (pid=344637) 0 entries sent to bpdbm
01-Jun-2020 14:03:14 - Info bpbkar (pid=344637) 95000 entries sent to bpdbm
01-Jun-2020 14:04:37 - Info bpbkar (pid=344637) 190000 entries sent to bpdbm
01-Jun-2020 14:05:31 - Info bpbkar (pid=344637) 285000 entries sent to bpdbm
01-Jun-2020 14:07:04 - Info bpbkar (pid=344637) 375265 entries sent to bpdbm
01-Jun-2020 14:07:04 - Info bpbkar (pid=344637) 375266 entries sent to bpdbm
01-Jun-2020 14:17:56 - Info nbjm (pid=5528) starting backup job (jobid=705711) for client client.FQDN, policy PolicyName, schedule Full_xx_H_5S
01-Jun-2020 14:17:57 - estimated 0 kbytes needed
01-Jun-2020 14:17:57 - Info nbjm (pid=5528) started backup (backupid=client.FQDN_1591013876) job for client client.FQDN, policy PolicyName, schedule Full_xx_H_5S on storage unit StorageUnit using backup host Appliance.FQDN
01-Jun-2020 14:17:58 - started process bpbrm (pid=8113)
01-Jun-2020 14:18:02 - connecting
01-Jun-2020 14:18:03 - connected; connect time: 0:00:00
01-Jun-2020 14:18:09 - begin writing
01-Jun-2020 14:35:55 - Error bpbrm (pid=8113) from client client.FQDN: ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
01-Jun-2020 14:35:55 - Critical bpbrm (pid=8113) from client client.FQDN: FTL - cleanup() failed, status 24
01-Jun-2020 14:35:55 - Error bptm (pid=8270) system call failed - Connection reset by peer (at ../child.c.1289)
01-Jun-2020 14:35:58 - Error bptm (pid=8235) media manager terminated by parent process
01-Jun-2020 14:35:58 - Error bptm (pid=8270) unable to perform read from client socket, connection may have been broken
01-Jun-2020 14:36:06 - Info Appliance.FQDN (pid=8235) StorageServer=PureDisk:Appliance.FQDN; Report=PDDO Stats for (Appliance.FQDN): scanned: 232451 KB, CR sent: 27596 KB, CR sent over FC: 0 KB, dedup: 88.1%, cache disabled
01-Jun-2020 14:36:06 - Error bpbrm (pid=8113) could not send server status message to client
01-Jun-2020 14:36:07 - Critical bpbrm (pid=8113) unexpected termination of client client.FQDN
01-Jun-2020 14:36:08 - Info bpbkar (pid=0) done. status: 24: socket write failed
01-Jun-2020 14:53:08 - end writing; write time: 0:34:59
socket write failed  (24)

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

So, important information is that this is a VMware backup. 

This means that bpbkar log is not on the VM, but on the 'backup host'. 
In this case, it seems to be the same Appliance media server. 

So, all logs need to be collected from the appliance: 
bpbkar, bpbrm and bptm. 

If you want us to assist with troubleshooting, please copy the logs to the process name, e.g. bpbkar.txt, bpbrm.txt, bptm.txt and upload as attachments. 

 

No no it was my eyes...sorry, the logs is really on the Master.

Something i don't understand :

in the log bpbrm, the last hour log is 10:32:53

The hour was 15:05. There was a difference between the real hour of 25 min.

i set the hour of the Appliance, and launch a backup of the same client.

 

For the log, i just can sent you the part i found with error / failed, as you see, i must modify the log.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

If the Appliance is the media server (as per Job Details), then you need to collect logs on the Appliance, NOT the master. 

bpbrm and bptm processes run on the media server.
bpbkar runs on the 'backup host', which is also the Appliance media server in this instance. 

Do you understand NetBackup backup process flow? 

I begin to know well NBU, commande line process etc.

But effectively i do not know the flow.

Ok i read this log and tell you, certainly tomorrow, so have  a nice eveneing Marianne ;)

Thx for helping me ;)

This the log in txt of the Appliance for BPRM and BPTM, the folder BPBKAr was empty.

Hello Marianne,

 

After you leave me alone ^^,

I have create a case to Veritas Support,

We have found the problem, and i'have shame myself, On a Vmware policy, the Backup Media server was different of the Backup media server in Attribute.

So error 24.....Smiley Frustrated