cancel
Showing results for 
Search instead for 
Did you mean: 

VM backup failing with (11) system call failed

jos_hua
Level 3

My nbu v8.1.2 daily backups of VMware & MSSQL polices have been running successfully.

Suddenly, I had encountered the below error only for VMware policy:

Error bptm cannot add fragment to image database, error = cannot connect on socket
(11) system call failed

My MSSQL backups are still running successfully.

i had checked firewall rules are already opened.

tcp 443 & 902 for nbu master/media & vmware vcenter & ESX hosts
- nbu master/media able to telnet vcenter tcp port 443 & ESX tcp port 902

tcp 1556 & 13724 for nbu master/media
- able to telnet nbu master/media tcp ports 1556/13724

I had searched the knowledgebase and tried the technotes.

https://www.veritas.com/support/en_US/article.100001950
https://www.veritas.com/support/en_US/article.100013415
https://vox.veritas.com/t5/NetBackup/cannot-connect-on-socket-25/td-p/620192
https://vox.veritas.com/t5/NetBackup/My-FS-Bakup-is-failing-with-25/td-p/525005
https://vox.veritas.com/t5/NetBackup/VMware-Backups-fails-with-status-11/m-p/342301#M69430

Does anyone have similiar error and managed to resolve?

Regards
Joshua

1 ACCEPTED SOLUTION

Accepted Solutions

@Marianne 

Unfortunately my veritas support had lapsed recently. Smiley Sad

But good news! I did the following steps to resolve the backup issue. Smiley Happy

1) Deactivate the CDPUserSvc service via Registry Editor

  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\CDPUserSvc

2) Set the value Start to 4

3) Invole an administrative command prompt window and enter the following command:

  • sc config cdpusersvc type=own

4) Reboot the both NBU Master & Media servers

5) Check that CDPUserSvc service is no longer running via Services control panel applet.

It seems the KB4048953 from Microsoft Windows Update had caused my NBU services to go haywire.

View solution in original post

13 REPLIES 13

jos_hua
Level 3

I checked the logs in bprd folder and noticed this error in the log:

00:01:52.255 [15764.15204] <2> emmlib_QueryAppCredentials: (0) EMM_AppCredentials_Record => { MediaServerName => vcenter.abc.net AppServerName => vcenter.abc.net AppServerType => 10 (CEMM_NBU_VIRTUAL_MACHINE) AppServerSubType => 1 PortNumber => 443 UserName => nbuadmin Password => ****** PasswordKey => ****** CredentialType => 1563766176 (INVALID_CREDENTIAL_TYPE) SubTypeName => DisplayName =>

However, i could login to my vcenter using the same account & password and browse the vcenter to select additional VMs to add to my VMware_Backup policy.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

The line that you saw in bprd is probably a 'red herring'. Plus it is a <2> - debug, not a warning <8>, error <16> or severe error <32>.

I would start with with bpbrm log on the media server - level 3.
This looks like comms error between bpbrm on media server and bpdbm on the master:

... cannot add fragment to image database, error = cannot connect on socket

Hi Marianne,

I have tried to filter thru both logs for <16> & <32> : bpbrm from the media server and bpdbm from the master

I search the knowledgebase for solution on "The proxy protocol has been violated."

https://www.veritas.com/support/en_US/article.100043072

I checked my Netbackup Client Service was as per default set to start as LocalSystem and running.

bpdbm log (master server)

11:34:10.840 [7532.7860] <16> dump_proxy_info: statusmsg: The proxy protocol has been violated. The remote connector of connection ([local] [AF_INET:[10.20.34.38]:1556] -- [remote] [AF_INET:[10.20.36.21]:58215]) is not correctly using the proxy., nbu status = 7647, severity = 2, Additional Message: [PROXY] Encountered error (CERT_PROTOCOL_READING_JSON_LENGTH) while processing(CertProtocol)., nbu status = 2, severity = 1

bpbrm log (media server)

10:41:07.906 [11364.11332] <32> bpbrm Exit: unexpected termination of client cute_cats.abc.net
10:41:07.906 [11364.11332] <2> set_job_details: Tfile (219762): LOG 1603680067 32 bpbrm 11364 unexpected termination of client cute_cats.abc.net
10:41:07.906 [11364.11332] <2> send_job_file: job ID 219762, ftype = 3 msg len = 91, msg = LOG 1603680067 32 bpbrm 11364 unexpected termination of client cute_cats.abc.net
10:41:07.906 [11364.11332] <2> ConnectionCache::connectAndCache: Acquiring new connection for host masterserver01.abc.net, query type 1

 

Regards

Terence

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@jos_hua 

You need to find log entries that correspond with this timestamp:

23 Oct, 2020 8:26:31 cannot add fragment to image database, error = cannot connect on socket

Best to locate the PID in Job details, then extract all entries for that PID.
Among the bpbrm entries, there will be attemp to connect to bpbdm.

If you cannot post full logs, please log a call with Veritas Support and submit logs to them.

I hope you can help me take a look.

I have attached new logs instead.

Job ID: 219837 (Snapshot) & 219838 (Backup)

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@jos_hua 

I need the PIDs that can be seen in Job Details.

@Marianne 

Thank you for your help!

Managed to locate the following PIDs from the Job Details:

Job Type: Snapshot (Job ID: 219837)

  • bpbrm (pid=9136)
  • bpfis (pid=3948)
  • nbjm (pid=9984)
  • RUNCMD (pid=11408)
  • bpbrm (pid=5720)
  • bpfis (pid=7200)
  • RUNCMD (pid=7016)

Job Type: Backup (Job ID: 219838)

  • bpbrm (pid=7404)
  • bpbkar32 (pid=1704)
  • bptm (pid=8900)
  • nbjm (pid=9984)

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Thanks!
So, the snapshot was successful:

14:50:30.420 [9136.8680] <2> bpbrm Exit: client snapshot EXIT STATUS 0: the requested operation was successfully completed

Backup job started 1 second later :

14:50:31.139 [7404.1340] <2> logparams: -backup -S masterserver01.abc.net -c cute_cats.abc.net -ct 40 -ru root -cl VMware_Backup -sched Full .....

Weird to see that the Scedule name is Full, but schedule type is INCR ?
14:50:32.795 [7404.1340] <2> build_cmd_for_non_mpx_backup_archive_verify_import: backup_cmd = /usr/openv/netbackup/bin/bpbkar bpbkar32 -r 1209600 -dt 0 -to 0 -bpstart_time 1603695360 -clnt cute_cats.abc.net -class VMware_Backup -sched Full -st INCR .....

It seems that the backup then fails within a few seconds without any data being transferred. 
This time we do not see the same "cannot add fragment to image database, .... " error as before, just status 25 and 11:

14:50:39.858 [7404.1340] <2> bpbrm wait_for_child: start
14:50:39.858 [7404.1340] <2> bpbrm wait_for_child: child exit_status = 25
14:50:39.874 [7404.1340] <2> inform_client_of_status: INF - Server status = 11
14:50:39.874 [7404.1340] <2> put_string: cannot write data to network: An existing connection was forcibly closed by the remote host.
14:50:39.874 [7404.1340] <16> inform_client_of_status: could not send server status message to client

I interpret 'bpbrm wait_for_child' as waiting for bpbkar.
So, here it looks like issue between bpbrm and bpbkar.
Can we assume that the media server is also the backup host?
Do you have bpbkar log on the media server?
Have you perhaps checked Task Manager for orphaned bpbkar processes after completed or failed backups?

Curious about the schedule name and type - this failed backup is actually running an INCR backup.
When last did a Full backup (with type FULL schedule) run?

@Marianne 

Yes, the backup schedule was originally named as Full.

I was troubleshooting by testing out Differential Incremental Backup and Full Backup.

The last successful Full backup was on 20th Oct.

Yes, media server is also the backup host.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I do not see much in bpbkar other than the status 11 and then socket errors because bpbrm has already terminated the backup.

I am wondering about this debug <2> message:

14:50:35.186 [1704.4896] <2> tar_base::V_vTarMsgW: INF - FIML_initialize_vxms failed (14) (../vxms_bridge.cpp:257)

A search for this error found an old TN that I do not believe is relevant in your situation:

https://www.veritas.com/support/en_US/article.100003025

Probably best to log a Support call with Veritas.
Be sure that you have VxMS logging enable - IMHO, this is probably where the issue will be found.

@Marianne 

Unfortunately my veritas support had lapsed recently. Smiley Sad

But good news! I did the following steps to resolve the backup issue. Smiley Happy

1) Deactivate the CDPUserSvc service via Registry Editor

  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\CDPUserSvc

2) Set the value Start to 4

3) Invole an administrative command prompt window and enter the following command:

  • sc config cdpusersvc type=own

4) Reboot the both NBU Master & Media servers

5) Check that CDPUserSvc service is no longer running via Services control panel applet.

It seems the KB4048953 from Microsoft Windows Update had caused my NBU services to go haywire.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@jos_hua that is fantastic news!

If I may ask - how did you come across the solution?

 

@Marianne 

I searched all over the net any possible solution and chanced upon an article which mentioned KB4048953 & CDPUsersvc causing problem on Window 10 & Windows Server 2016 platform.