06-01-2020 02:36 AM - edited 06-01-2020 03:49 AM
Hello dear all ! ^^
I have read almost all post about this error, try a a lot of things without good result.
All the client are some virtual machine VMware.
Almost the client have an NBU agent install with the bp.conf feeld.
Détails: When i install the NBU agent, the certificate can't be deploy automatically ? why ?
ERROR 24, Socket Write Failed ! :
Netstat –na | FINDSTR LISTEN : The port was closed
So with an Network administrator we have open it.
Test Backup: Same error.
Bptestbpcd sur dhcpServer, OK source (Master)
Bpclntcmd –hn dhcp, Source (client) : ok
Bpclntcmd –ip dhcp, Source (client) : ok
Bpclntcmd –pn, Source (client) : ok
Bpclient –client <Client Name>, Source (master) : Error code 227: No entity was found => same thing on a server working good !
- IPv6 has been disabled by registery to be sure.
https://www.veritas.com/support/en_US/article.100004801
Put Disabled all offload parameter on NIC : netsh int tcp set global chimney=disabled
Modify / Create registery keys :
HKey_LocalMachine\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
Value Name: TcpMaxDataRetransmissions
Data Type: REG_DWORD
Default: 5
Change to Value = 15 (Decimal)
Value Name: KeepAliveTime
Data Type: REG_DWORD
Default: 7200000
Change to Value = 300000 (Decimal)
Restart Server.
Backup Test : NOK, same error
Client, Search certificat :
nbcertcmd -displayCACertDetail -server mymaster, not really OK
Champ : CA Certificate State : Not Trusted
nbcertcmd -getCACertificate -server mymaster
Check Certficate :
nbcertcmd -displayCACertDetail -server mymaster
Champ : CA Certificate State : Trusted !!! Yes
=> Backup Test : NOK, same error !
6) Stupid test, uninnstall agent v8.1.2:
Test backup: Same error.
I don’t know how to do know….please help me ^^
06-01-2020 03:43 AM
Please show us all text in Job Details of failed backup.
Status 24 is difficult to troubleshoot, because connection is normally fine and backup starts writing.
After a while, the connection is broken. Something outside of NetBackup.
The Job details should help us with troubleshooting to see how long the backup runs and which processes on media server and client are involved.
You can then go to the relevant logs to get an idea where the break in network connection took place - client or media server. This will determine where to look next.
06-01-2020 03:53 AM - edited 06-01-2020 04:00 AM
Hey Marianne ^^
I was waiting you ;)
Which logs do you want to see ?
bpbrm / bpkar / bptm / ... ?
You must know it's a confidential environement so i just ca give you some part of logs with the name of server and ip change.
Logs : bpbrm: 10:04:01.491 [18668.12580] <2> read_client: opendir() failed: Client.FQDN: No such file or directory (2) 10:04:01.491 [18668.12580] <2> check_dynamic_client_and_set_tss: Client entry for client Client.FQDN not found in client DB 10:04:01.491 [18668.12580] <2> bpbrm main: Cleaning cs_cache for client:[Master.FQDN] older than:[30] sec bpbkar: 10:28:25.689 [8700.12604] <16> dtcp_write: TCP - failure: send socket (580) (TCP 10054: Connection reset by peer) 10:28:25.689 [8700.12604] <16> dtcp_write: TCP - failure: attempted to send 1 bytes
06-01-2020 03:57 AM
Please post all text in Job Details.
This will tell us which logs on which server and client, PIDs and timestamps.
06-01-2020 04:16 AM
The jog:
30-May-2020 18:48:18 - Info bpbrm (pid=243561) Client.FQDN is the host to backup data from 30-May-2020 18:48:18 - Info bpbrm (pid=243561) reading file list for client 30-May-2020 18:48:21 - Info bpbrm (pid=243561) accelerator enabled 30-May-2020 18:48:27 - Info bpbrm (pid=243561) starting bpbkar on client 30-May-2020 18:48:28 - Info bpbkar (pid=223829) Backup started 30-May-2020 18:48:28 - Info bpbrm (pid=243561) bptm pid: 243580 30-May-2020 18:48:28 - Info bpbkar (pid=223829) INF - Backing up vCenter server ServerVMware.FQDN, ESX host Hyperviseur.FQDN, BIOS UUID 4236010e-bb66-5dfd-404f-d436f5a548c4, Instance UUID 50360deb-8bd4-d9eb-cc6f-5890b7a33007, Display Name Client, Hostname Client.FQDN 30-May-2020 18:48:28 - Info bptm (pid=243580) start 30-May-2020 18:48:31 - Info bptm (pid=243580) using 1048576 data buffer size 30-May-2020 18:48:31 - Info bptm (pid=243580) using 256 data buffers 30-May-2020 18:48:33 - Info bptm (pid=243580) start backup 30-May-2020 18:48:34 - Info bptm (pid=243580) backup child process is pid 243621 30-May-2020 18:49:37 - Info bpbkar (pid=223829) 0 entries sent to bpdbm 30-May-2020 18:52:44 - Info bpbkar (pid=223829) 95000 entries sent to bpdbm 30-May-2020 18:56:11 - Info bpbkar (pid=223829) 190000 entries sent to bpdbm 30-May-2020 19:02:08 - Info bpbkar (pid=223829) 285000 entries sent to bpdbm 30-May-2020 19:05:01 - Info nbjm (pid=5528) starting backup job (jobid=705362) for client Client.FQDN, policy PolicyName, schedule Full_xx_H_5S 30-May-2020 19:05:01 - Info nbjm (pid=5528) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=705362, request id:{2B57E509-A7AE-4C48-BA24-09717F5373EE}) 30-May-2020 19:05:01 - requesting resource StorageUnit 30-May-2020 19:05:01 - requesting resource MasterServer.FQDN.NBU_CLIENT.MAXJOBS.Client.FQDN 30-May-2020 19:05:01 - requesting resource MasterServer.FQDN.VMware.Datastore.ServerVMware.FQDN/DatacenterName/VolumeDiskName 30-May-2020 19:05:01 - requesting resource MasterServer.FQDN.VMware.ESXserver.Hyperviseur.FQDN 30-May-2020 19:05:01 - granted resource MasterServer.FQDN.NBU_CLIENT.MAXJOBS.Client.FQDN 30-May-2020 19:05:01 - granted resource MasterServer.FQDN.VMware.Datastore.ServerVMware.FQDN/DatacenterName/VolumeDiskName 30-May-2020 19:05:01 - granted resource MasterServer.FQDN.VMware.ESXserver.Hyperviseur.FQDN 30-May-2020 19:05:01 - granted resource MediaID=@aaaac;DiskVolume=PureDiskVolume;DiskPool=DiskPoolAppliance;Path=PureDiskVolume;StorageServer=Appliance.FQDN;MediaServer=Appliance.FQDN 30-May-2020 19:05:01 - granted resource StorageUnit 30-May-2020 19:05:01 - estimated 24292714 kbytes needed 30-May-2020 19:05:01 - Info nbjm (pid=5528) started backup (backupid=Client.FQDN_1590858301) job for client Client.FQDN, policy PolicyName, schedule Full_xx_H_5S on storage unit StorageUnit using backup host applibck-rgscd.wsd.tadfr.thales 30-May-2020 19:05:06 - started process bpbrm (pid=243561) 30-May-2020 19:05:18 - connecting 30-May-2020 19:05:19 - connected; connect time: 0:00:00 30-May-2020 19:05:26 - begin writing 30-May-2020 19:07:55 - Info bpbkar (pid=223829) 380000 entries sent to bpdbm 30-May-2020 19:11:56 - Info bpbkar (pid=223829) 427475 entries sent to bpdbm 30-May-2020 19:12:15 - Info bpbkar (pid=223829) INF - Transport Type = nbd 30-May-2020 19:12:29 - Info bpbkar (pid=223829) 427525 entries sent to bpdbm 30-May-2020 19:28:58 - Error bpbrm (pid=243561) from client Client.FQDN: ERR - Cannot write to STDOUT. Errno = 110: Connection timed out 30-May-2020 19:28:58 - Critical bpbrm (pid=243561) from client Client.FQDN: FTL - cleanup() failed, status 24 30-May-2020 19:28:59 - Error bptm (pid=243621) system call failed - Connection reset by peer (at ../child.c.1289) 30-May-2020 19:28:59 - Error bptm (pid=243621) unable to perform read from client socket, connection may have been broken 30-May-2020 19:29:00 - Error bptm (pid=243580) media manager terminated by parent process 30-May-2020 19:29:14 - Info Appliance.FQDN (pid=243580) StorageServer=PureDisk:Appliance.FQDN; Report=PDDO Stats for (Appliance.FQDN): scanned: 444565 KB, CR sent: 18218 KB, CR sent over FC: 0 KB, dedup: 95.9%, cache disabled 30-May-2020 19:29:14 - Error bpbrm (pid=243561) could not send server status message to client 30-May-2020 19:29:15 - Critical bpbrm (pid=243561) unexpected termination of client Client.FQDN 30-May-2020 19:29:16 - Info bpbkar (pid=0) done. status: 24: socket write failed 30-May-2020 19:46:08 - end writing; write time: 0:40:42 socket write failed (24)
06-01-2020 04:20 AM - edited 06-01-2020 04:28 AM
Oh !! the log bpbrm and bpkar are not the log for this job.
I have 10 servers in this case.
I don't understand , i have no log for this timestamp...
I will write here the details of a new job, i'm waiting the job ending.
But the error will be exactly the same.
sorry.
06-01-2020 06:02 AM
The backup is ending, the details job :
01-Jun-2020 14:01:01 - Info bpbrm (pid=8113) client.FQDN is the host to backup data from 01-Jun-2020 14:01:01 - Info bpbrm (pid=8113) reading file list for client 01-Jun-2020 14:01:02 - Info bpbrm (pid=8113) accelerator enabled 01-Jun-2020 14:01:02 - Info bpbrm (pid=8113) There is no complete backup image match with track journal, a regular full backup will be performed. 01-Jun-2020 14:01:03 - Info bpbrm (pid=8113) starting bpbkar on client 01-Jun-2020 14:01:04 - Info bpbkar (pid=344637) Backup started 01-Jun-2020 14:01:04 - Info bpbrm (pid=8113) bptm pid: 8235 01-Jun-2020 14:01:04 - Info bpbkar (pid=344637) INF - Backing up vCenter server vcenter.FQDN, ESX host Hyperviseur.FQDN, BIOS UUID 564d7d23-1e5c-32ff-d9a6-03780b525944, Instance UUID 5036437d-10bc-fdd1-cd52-69ce6f90a447, Display Name Client, Hostname client.FQDN 01-Jun-2020 14:01:04 - Info bptm (pid=8235) start 01-Jun-2020 14:01:05 - Info bptm (pid=8235) using 1048576 data buffer size 01-Jun-2020 14:01:05 - Info bptm (pid=8235) using 256 data buffers 01-Jun-2020 14:01:07 - Info bptm (pid=8235) start backup 01-Jun-2020 14:01:10 - Info bptm (pid=8235) backup child process is pid 8270 01-Jun-2020 14:02:04 - Info bpbkar (pid=344637) 0 entries sent to bpdbm 01-Jun-2020 14:03:14 - Info bpbkar (pid=344637) 95000 entries sent to bpdbm 01-Jun-2020 14:04:37 - Info bpbkar (pid=344637) 190000 entries sent to bpdbm 01-Jun-2020 14:05:31 - Info bpbkar (pid=344637) 285000 entries sent to bpdbm 01-Jun-2020 14:07:04 - Info bpbkar (pid=344637) 375265 entries sent to bpdbm 01-Jun-2020 14:07:04 - Info bpbkar (pid=344637) 375266 entries sent to bpdbm 01-Jun-2020 14:17:56 - Info nbjm (pid=5528) starting backup job (jobid=705711) for client client.FQDN, policy PolicyName, schedule Full_xx_H_5S 01-Jun-2020 14:17:57 - estimated 0 kbytes needed 01-Jun-2020 14:17:57 - Info nbjm (pid=5528) started backup (backupid=client.FQDN_1591013876) job for client client.FQDN, policy PolicyName, schedule Full_xx_H_5S on storage unit StorageUnit using backup host Appliance.FQDN 01-Jun-2020 14:17:58 - started process bpbrm (pid=8113) 01-Jun-2020 14:18:02 - connecting 01-Jun-2020 14:18:03 - connected; connect time: 0:00:00 01-Jun-2020 14:18:09 - begin writing 01-Jun-2020 14:35:55 - Error bpbrm (pid=8113) from client client.FQDN: ERR - Cannot write to STDOUT. Errno = 110: Connection timed out 01-Jun-2020 14:35:55 - Critical bpbrm (pid=8113) from client client.FQDN: FTL - cleanup() failed, status 24 01-Jun-2020 14:35:55 - Error bptm (pid=8270) system call failed - Connection reset by peer (at ../child.c.1289) 01-Jun-2020 14:35:58 - Error bptm (pid=8235) media manager terminated by parent process 01-Jun-2020 14:35:58 - Error bptm (pid=8270) unable to perform read from client socket, connection may have been broken 01-Jun-2020 14:36:06 - Info Appliance.FQDN (pid=8235) StorageServer=PureDisk:Appliance.FQDN; Report=PDDO Stats for (Appliance.FQDN): scanned: 232451 KB, CR sent: 27596 KB, CR sent over FC: 0 KB, dedup: 88.1%, cache disabled 01-Jun-2020 14:36:06 - Error bpbrm (pid=8113) could not send server status message to client 01-Jun-2020 14:36:07 - Critical bpbrm (pid=8113) unexpected termination of client client.FQDN 01-Jun-2020 14:36:08 - Info bpbkar (pid=0) done. status: 24: socket write failed 01-Jun-2020 14:53:08 - end writing; write time: 0:34:59 socket write failed (24)
06-01-2020 06:35 AM - edited 06-01-2020 06:42 AM
So, important information is that this is a VMware backup.
This means that bpbkar log is not on the VM, but on the 'backup host'.
In this case, it seems to be the same Appliance media server.
So, all logs need to be collected from the appliance:
bpbkar, bpbrm and bptm.
If you want us to assist with troubleshooting, please copy the logs to the process name, e.g. bpbkar.txt, bpbrm.txt, bptm.txt and upload as attachments.
06-01-2020 06:39 AM - edited 06-01-2020 06:44 AM
No no it was my eyes...sorry, the logs is really on the Master.
Something i don't understand :
in the log bpbrm, the last hour log is 10:32:53
The hour was 15:05. There was a difference between the real hour of 25 min.
i set the hour of the Appliance, and launch a backup of the same client.
For the log, i just can sent you the part i found with error / failed, as you see, i must modify the log.
06-01-2020 06:48 AM
If the Appliance is the media server (as per Job Details), then you need to collect logs on the Appliance, NOT the master.
bpbrm and bptm processes run on the media server.
bpbkar runs on the 'backup host', which is also the Appliance media server in this instance.
Do you understand NetBackup backup process flow?
06-01-2020 07:15 AM - edited 06-01-2020 07:32 AM
I begin to know well NBU, commande line process etc.
But effectively i do not know the flow.
Ok i read this log and tell you, certainly tomorrow, so have a nice eveneing Marianne ;)
Thx for helping me ;)
06-01-2020 08:11 AM - edited 06-01-2020 08:24 AM
This the log in txt of the Appliance for BPRM and BPTM, the folder BPBKAr was empty.
06-10-2020 06:42 AM
Hello Marianne,
After you leave me alone ^^,
I have create a case to Veritas Support,
We have found the problem, and i'have shame myself, On a Vmware policy, the Backup Media server was different of the Backup media server in Attribute.
So error 24.....