cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

Backup error 43

Hamza_H
Moderator
Moderator
   VIP   

Hello Guys,

I really need your help regarding a confusing error code which is 43: unexpected message received.

Master server is on Windows server 2012R2 and NetBackup version 8.2

Media serve: RHEL, NBU 8.2

Client : nbu version 7.6.1.2 and Os : Windows server 2012R2

Policy type : Ms-Windows

snip detailled status :

Aug 31, 2020 4:13:09 PM - Info nbjm (pid=4420) starting backup job (jobid=309891) for client SRV-CLIENT.domain.com, policy TEST_POLICY, schedule WFULL
Aug 31, 2020 4:13:09 PM - Info nbjm (pid=4420) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=309891, request id:{03415F63-91B6-4431-8179-F3BA8595898B})
Aug 31, 2020 4:13:09 PM - requesting resource  srv-media-stu
Aug 31, 2020 4:13:09 PM - requesting resource  srv-mas.domain.com.NBU_CLIENT.MAXJOBS.SRV-CLIENT.domain.com
Aug 31, 2020 4:13:09 PM - requesting resource  srv-mas.domain.com.NBU_POLICY.MAXJOBS.TEST_POLICY
Aug 31, 2020 4:13:09 PM - granted resource  srv-mas.domain.com.NBU_CLIENT.MAXJOBS.SRV-CLIENT.domain.com
Aug 31, 2020 4:13:09 PM - granted resource  srv-mas.domain.com.NBU_POLICY.MAXJOBS.TEST_POLICY
Aug 31, 2020 4:13:09 PM - granted resource  MediaID=@aaaaz;DiskVolume=PureDiskVolume;DiskPool=srv-media-dp;Path=PureDiskVolume;StorageServer=srv-media.domain.com;MediaServer=srv-media.domain.com
Aug 31, 2020 4:13:09 PM - granted resource  srv-media-stu
Aug 31, 2020 4:13:09 PM - estimated 263878 kbytes needed
Aug 31, 2020 4:13:09 PM - begin Parent Job
Aug 31, 2020 4:13:09 PM - begin WOFB: Step By Condition
Operation Status: 0
Aug 31, 2020 4:13:09 PM - end WOFB: Step By Condition; elapsed time 0:00:00
Aug 31, 2020 4:13:09 PM - begin WOFB: Stream Discovery
Operation Status: 0
Aug 31, 2020 4:13:09 PM - end WOFB: Stream Discovery; elapsed time 0:00:00
Aug 31, 2020 4:13:09 PM - begin WOFB: Read File List
Operation Status: 0
Aug 31, 2020 4:13:09 PM - end WOFB: Read File List; elapsed time 0:00:00
Aug 31, 2020 4:13:09 PM - begin WOFB: Create Snapshot
Aug 31, 2020 4:13:19 PM - started process bpbrm (pid=23478)
Aug 31, 2020 4:13:19 PM - end writing
Operation Status: 43
Aug 31, 2020 4:13:19 PM - end WOFB: Create Snapshot; elapsed time 0:00:10
Aug 31, 2020 4:13:19 PM - begin WOFB: Stop On Error
Operation Status: 0
Aug 31, 2020 4:13:19 PM - end WOFB: Stop On Error; elapsed time 0:00:00
Operation Status: 43
Aug 31, 2020 4:13:19 PM - end Parent Job; elapsed time 0:00:10
unexpected message received  (43)

Note: we also have this on clients with NetBackup version 8.0 and on other media servers, but all other backups are running just fine on other media servers (we have a lot fo media servers)

As soon as we execute the backup,it gets stuck for almost 10 minutes and fails with that error code 43.

it didn't show on which process it hangs, because allow multuiple data stream was enabled, and when I disabled it, the detailled status showed bpbrm process.

But as per bpbrm logs, I don't think the problem is coming from this process, because the log only reported this error: 

13:13:50.907 [25298.25298] <2> vnet_apply_domain_constraints: Domain hint(f7185221-b5d1-44da-84a0-164dd102486d) is ignored for client connection
13:13:50.907 [25298.25298] <2> dump_proxy_info: local_proxy_info:
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----pid: 1684
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----connection_id: {987318CA-EC4C-11EA-9EA3-E54F786B6AB8}:OUTBOUND
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----ca_roots: f7185221-b5d1-44da-84a0-164dd102486d
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----ca_roots_excluded: UNCONSTRAINED
13:13:50.907 [25298.25298] <2> dump_proxy_info: remote_proxy_info:
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----pid: 3408
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----connection_id: {F0DD5121-0265-47FD-AA6F-5898F072B9E0}:INBOUND
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----ca_roots: f7185221-b5d1-44da-84a0-164dd102486d
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----ca_roots_excluded: f772adf8-78b7-4565-8713-e2d6a12d8245, aa4559d8-7cbd-4758-a0e9-8279f8a1d084, 2472d64c-15ad-4417-836d-08f452c2dc6c, 7f0fcc50-121f-4000-bbfa-e78b23188bdc, 08ca4903-d9e5-4033-a609-cce797308811, 631bc3d3-6af6-417d-96e0-05088d927652, 32af3387-6fa4-407e-aa18-0a7140cbecaa, 7162123d-ba3c-4fee-8d73-fded949ee1df
13:13:50.907 [25298.25298] <2> dump_proxy_info: remote_endpoint_info:
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----pid: 15352
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----exe_name: bpdbm
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----local_hostname: srv-mas.domain.com
13:13:50.907 [25298.25298] <2> dump_proxy_info: domain_constraints_set:
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----process_hint_server_name: srv-mas.domain.com
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----process_hint: f7185221-b5d1-44da-84a0-164dd102486d
13:13:50.907 [25298.25298] <2> dump_proxy_info: ----process_hint_reason: the domain was inherited from the local peer process
13:13:50.909 [25298.25298] <2> is_connection_INBOUND: Connection-Is-INBOUND:[FALSE]
13:13:50.910 [25298.25298] <2> logconnections: PROXY CONNECT FROM 10.X.X.1.54741 TO 10.X.X.2.1556 fd = 6
13:13:50.910 [25298.25298] <2> logconnections: BPDBM CONNECT FROM 127.0.0.1.46627 TO 127.0.0.1.40458 fd = 6
13:13:50.910 [25298.25298] <2> vnet_check_vxss_client_magic_with_info_ex: [vnet_vxss_helper.c:1137] Ignoring VxSS authentication 2 0x2
13:13:50.935 [25298.25298] <2> isDONE: reply DONE 43 <?xml version="1.0" encoding="utf-8"?><StatusMsgDoc><StatusMsg NBUStatus="43" Severity="2" VXULoid="271" VXULtid="43"><MsgString>unexpected&#32;message&#32;received</MsgString></StatusMsg></StatusMsgDoc>
13:13:50.936 [25298.25298] <2> isDONE: reply DONE 43
13:13:50.936 [25298.25298] <2> db_startrequest: protocol error 1
13:13:50.936 [25298.25298] <16> db_begin: db_startrequest() failed: unexpected message received
13:13:50.936 [25298.25298] <2> db_getCLIENT: db_begin() failed: unexpected message received
13:13:50.936 [25298.25298] <2> verify_client: ../bpbrm.c.37760: db_getCLIENT failed for CLIENT: srv-client.domain.com
13:13:50.936 [25298.25298] <2> verify_client: ../bpbrm.c.37956: db_getCLIENT failed: 43 43 0x0000002b
13:13:50.936 [25298.25298] <2> bpbrm main: ../bpbrm.c.3622: verify_client failed: 43 43 0x0000002b
13:13:50.937 [25298.25298] <2> bpbrm kill_child_process: start
13:13:50.937 [25298.25298] <2> inform_client_of_status: COMM_SOCK == INVALID_SOCKET, 0
13:13:50.937 [25298.25298] <2> bpbrm Exit: attempting to send mail to root on srv-client.domain.com

I attached a snip of nbjm logs (I had to change names).

 

I can't find anything anywhere regarding this, in fact, we don't know what is causing the problem, media, master or client

Note: we have already rebooted media and client.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Hamza_H
Moderator
Moderator
   VIP   
Hello David,
Thanks a lot for your persistence and your help,
Actually we found the root cause of the problem, it was a problem in Reverse DNS (media->master) once we corrected that, backups start to work again.

View solution in original post

10 REPLIES 10

davidmoline
Level 6
Employee

My reading of the logs is indicating a problem in taking the snapshot. As such I would review the VSS status of the server to see if there are issues there.

Secondly, is there any reason for leaving the NBU client at an unsupported version (7.6.x)? You are running an unsupported configuration and there seems top be no reason why this server couldn't have a more current client installed.

Is this a fresh issue or something that has been around for some time?

Hamza_H
Moderator
Moderator
   VIP   
Hello David,

Thank you for your reply,

I actually suspected that could be a vss problem but I didn’t find anything abnormal on this side, besides, I enabled bpfis bpbkar bpcd and vnetd logs on client and when I run the backup I don’t have logs on the client, only vnetd and bpcd when I try bptestbpcd commands which lead that the problem is on the media or master server( I suspect it is on media server but can’t figure it out..)
And actually I thought about upgrading netbackup version but as mentioned we have the same problem on some clients in higher version (8.0)..
And no this is not a fresh install.. it used to work fine before and suddenly stopped (no
Change were made..)

Can you share the admin logs from the media server?

And the bpcd & vnetd logs from the client (and if anything else is logged as part of the backup that too). 

Hamza_H
Moderator
Moderator
   VIP   

Hello David,

sorry for the delay , I had to ask our customer for these logs..

for admin logs on media server, I enabled them and there are no record of the job that I run, only in bpbrm ( attached to this reply)

same thing for bpcd & vnetd logs, nothing is in there, I don't think there is any connection to the client to initiate the backup.. which is confusing..

I attached a snip of bpbrm log & detailled status (for the time stamp)

 

thanks again for your help

The one thing that sticks out (and I noticed before) is that the time doesn't match between what is displayed in the activity monitor and the time in the bpbrm log. In particular this looks problematic:

14:53:52.464 [1796.1796] <2> get_long_base: (2) premature end of file (byte 1)
14:53:52.465 [1796.1796] <2> job_connect: Can't get server time
14:53:52.465 [1796.1796] <2> check_connect_time: It took 1 seconds to connect.
14:53:52.465 [1796.1796] <2> check_connect_time: Giving up trying to connect. Function elapsed time (301) > client connect timeout (300).
14:53:52.465 [1796.1796] <16> event_msg_to_monitor: failed to connect to bpjobd on master server SRV-MASTER.domain.com (23)

While in the job details the time is way different:
Sep 2, 2020 3:38:44 PM - started process bpbrm (pid=1796)

It's possible fixing the time mismatch may solve your problems.

Hamza_H
Moderator
Moderator
   VIP   

Hello @davidmoline ,

thanks again for your patience and your help,

is that really what could be stopping it from connecting to bpjobd on master server?

I mean there isn't a huge difference in time (Less than 1 hour..) 

I will give it a try, I will ask the EC to synchronize the time between these two servers (media & master) then re-run the backup.

 

Hamza_H
Moderator
Moderator
   VIP   

Hello @davidmoline ,

we checked time synchronization between the media server and the master and it is synchronized .. 

 

**edited**

and the command below confirms it:

nbcertcmd -checkClockSkew -server master_server_name

The current host clock is in sync with the master server.

Interesting - now I don't understand the time differences shown between the bpbrm log and the job details. How does the customer account for that? It's not as though it might be accounted for by timezone differences, the skew is around 45 mins.

Are you sure you have the correct log file snippets?

Hamza_H
Moderator
Moderator
   VIP   
Hello David,
Thanks a lot for your persistence and your help,
Actually we found the root cause of the problem, it was a problem in Reverse DNS (media->master) once we corrected that, backups start to work again.

Thanks for letting us know @Hamza_H - even if I didn't help with the solution it's always nice to know how the problem was solved.