hello guys, I have a problem with a backup of a server operating system. "file read failed (13)" This backup is configured to copy C, D, and System State. However the system state backup is configured not to run on the incremental differential schedule. Added to exclud list, but realized that it still backs up the EFI partition. I checked the network settings, server overload and event viewer but found nothing. Attached are the logs bpcd, bpfis, bpbkar and active monitor. Could someone help?

You need to change Client Read Timeout on the Media Server, not the client.

danguss You also need bpbrm and bptm logs on the media server. According to Activity Monitor, bpbrm on the media server failed at this point: 15/10/2019 02:57:13 - Error bpbrm (pid=544) socket read failed: errno = 62 - Timer expired bpbkar on the client however was at this point stil 'happily' generating backup stream until 03:02: 03:02:41.287 [18812.15328] <4> dos_backup::OVDeInitSpecialInfo: INF - Special File: 1 C:\Windows\system32\winevt\logs\HardwareEvents.evtx HardwareEvents03:02:41.287 [18812.15328] <4> dos_backup::OVDeInitSpecialInfo: INF - Special File: 1 C:\Windows\System32\winevt\Logs\Security.evtx Security03:02:41.287 [18812.15328] <4> dos_backup::OVDeInitSpecialInfo: INF - Special File: 1 C:\Windows\system32\winevt\Logs\System.evtx System03:02:41.287 [18812.15328] <4> dos_backup::V_ShadowCopyBackupEnd: INF - foreign snapshot released. At this point, bpbkar realised that connection with media server is broken: 03:02:41.292 [18812.15328] <2> ov_log::V_GlobalLogEx: KeepAliveThread::isKeepAliveRunning keep alive not running: 203:02:41.292 [18812.15328] <16> dtcp_read: TCP - failure: recv socket (124) (TCP 10053: Software caused connection abort)03:02:42.292 [18812.15328] <16> dtcp_read: TCP - failure: recv socket (124) (TCP 10053: Software caused connection abort)03:02:43.293 [18812.15328] <16> dtcp_read: TCP - failure: recv socket (124) (TCP 10053: Software caused connection abort)03:02:43.293 [18812.15328] <16> dtcp_read: TCP - failure: recv socket (100) (TCP 10058: Can't send after socket shutdown) Level 3 logs on media server will give us this info: bpbrm (pid=544) will tell you up to what time metadata from client was received and information about the timeout.bptm ((pid=3595) and child process pid 4153 ) will tell you up to what time actual data was received from the client.

Apologies, I've not looked at the logs... but when I consistently get status 13/14 for file-system style backups across SDN (software defined networking) (aka virtual networks) (e.g. Hyper-V NVGRE)... the one consistent solution for me (for quite a few (and I do mean like a 100+) guest VMs) has been to disable client-side-dedupe, and only very occaisionally to also disable accelerator.What I mean is... that.. accelerator doesn't seem to be a problem for me... but client-side dedupe... hmmm... there's something, that I can't quite yet put my finger on, about client-side-dedupe that seems to not sit too happily with SDN, e.g. NAT across NVGRE.So, without looking at your logs... maybe you could turn off client-side dedupe for the problem clients?You don't need to make any policy adjustments... because client-side-dedupe is granular at the client-name level, e.g. :bpclient -update -client blah.com -client_direct 0

02:42:13.321 [18812.15328] <2> ov_log::V_GlobalLog: INF - VssSnapshotVolume::BackupComplete(): Trying to get lock to run BackupComplete 02:42:13.321 [18812.15328] <2> ov_log::V_GlobalLog: INF - VssSnapshotVolume::BackupComplete(): Got lock - running BackupComplete 02:42:13.321 [18812.15328] <2> ov_log::V_GlobalLog: INF - Attached [0x1f0d430] At this operation time out occurred on media server but the client doesn't know yet03:02:40.443 [18812.15328] <2> ov_log::V_GlobalLog: INF - Unattaching [0x1f0d430] 03:02:40.443 [18812.15328] <2> ov_log::V_GlobalLog: INF - Attempting to delete VSS snapshot set: {81b8f0aa-336e-44a9-a84d-2686026a0f91} 03:02:41.136 [18812.15328] <2> ov_log::V_GlobalLog: INF - VSS snapshot set deleted. Snapshot SetID = {81b8f0aa-336e-44a9-a84d-2686026a0f91} 03:02:41.137 [18812.15328] <2> ov_log::V_GlobalLog: INF - Freeing gathered writer metadata 03:02:41.137 [18812.15328] <2> ov_log::V_GlobalLog: INF - Releasing vss interface 0x1f0d430 03:02:41.277 [18812.15328] <8> tar_backup_tfi::cleanupTemp: WRN - will not cleanup 'temp' directory, debug level(s) are too high 03:02:41.277 [18812.15328] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 0: the requested operation was successfully completed 03:02:41.277 [18812.15328] <4> dos_backup::V_PostCleanup: INF - ================================================================================ Try to start with increasing CLIENT_READ_TIMEOUT

HELLO, After reviewing the logs, there is a certain communication/timeout between the client and media server snippet from bpbkar :16:24:10.803 [17432.41552] <2> ov_log::V_GlobalLog: INF - Attached [0x427a860]16:44:39.080 [17432.41552] <2> ov_log::V_GlobalLog: INF - Unattaching [0x427a860] as you can see there is a gap between the two lines (about 20 minutes)from bpbrm logs :15:20:08.927 [8328.9244] <2> bpbrm readline: sending keepalive15:20:13.928 [8328.9244] <2> bpbrm mm_sig: received ready signal from media manager15:21:03.929 [8328.9244] <2> bpbrm send_ping: PING15:31:03.934 [8328.9244] <2> bpbrm send_ping: PINGmedia server is sending a keepalive and ping requests to say "hey i'm still here" but after a while it gives up. You said that changed the client_read_timeout, what is the current value? because i'm still seeing 300 seconds which is 5 minutes which is the gap between the two lines here before it returns the error [8328.9244] <2> bpbrm send_ping: PING16:01:03.938 [8328.9244] <2> bpbrm send_ping: PING16:11:03.947 [8328.9244] <2> bpbrm send_ping: PING16:21:03.948 [8328.9244] <2> bpbrm send_ping: PING16:24:09.032 [8328.9244] <2> non_mpx_backup_archive_verify_import: ESTIMATE -1 -1 srv11420_157133637816:24:10.407 [8328.9244] <2> non_mpx_backup_archive_verify_import: ADDED FILES TO DB FOR srv11420_1571336378 1 /EFI System Partition/EFI/16:29:10.887 [8328.9244] <2> bpbrm readline: bpbrm timeout after 300 seconds16:29:10.887 [8328.9244] <2> bpbrm kill_child_process_Ex: start16:29:15.606 [8328.9244] <2> bpbrm wait_for_child: start16:29:15.606 [8328.9244] <2> bpbrm wait_for_child: child exit_status = 15016:29:15.606 [8328.9244] <2> inform_client_of_status: COMM_SOCK == INVALID_SOCKET, 016:29:15.606 [8328.9244] <2> set_job_details: Tfile (6251779): LOG 1571340555 4 bpbkar32 17432 done. status: 41: network connection timed out16:29:15.606 [8328.9244] <2> send_job_file: job ID 6251779, ftype = 3 msg len = 79, msg = LOG 1571340555 4 bpbkar32 17432 done. status: 41: network connection timed out16:29:15.606 [8328.9244] <2> bpbrm Exit: client snapshot EXIT STATUS 41: network connection timed out16:29:15.621 [8328.9244] <2> vnet_close_socket_safely_ex: [vnet.c:1176] safe close 0 0x016:29:15.621 [8328.9244] <4> JobdSockList::UnregisterSocket: Unregister socket (624).16:29:15.637 [8328.9244] <2> job_monitoring_exex: ACK disconnect16:29:15.637 [8328.9244] <2> job_disconnect: Disconnected try to increase timeout/ and let us know the results. Good luck

danguss wrote: The policy is set to "Allow Multiple data streams" and the odd thing is that only the job pointed to the "System state" that as mentioned is set in the client exclude list fails. What exactly is recorded in this Client's Exclude List? I see the following in the bpbkar log that you posted the other day : 00:23:04.925 [13752.29060] <2> ov_log::V_GlobalLog: INF - Successfully attached to Device 'Shadow?Copy?Components' BackupReason:0x80000:23:04.925 [13752.29060] <2> ov_log::V_GlobalLog: INF - ATTACH_BACKUP_EXCLUDE: Shadow excludes will be processed Attach=0x800...I have extracted all lines containing 'exclude' in attached text file..... too many lines for me to go through.... I want to suggest that you log a Support call with Veritas - they have engineers with time and knowledge to sift through very large log files.

backup fail with status 13

18 Replies

Replies have been turned off for this discussion

Marianne
Level 6
5 years ago
danguss

You also need bpbrm and bptm logs on the media server.

According to Activity Monitor, bpbrm on the media server failed at this point:

15/10/2019 02:57:13 - Error bpbrm (pid=544) socket read failed: errno = 62 - Timer expired

bpbkar on the client however was at this point stil 'happily' generating backup stream until 03:02:

03:02:41.287 [18812.15328] <4> dos_backup::OVDeInitSpecialInfo: INF - Special File: 1 C:\Windows\system32\winevt\logs\HardwareEvents.evtx HardwareEvents
03:02:41.287 [18812.15328] <4> dos_backup::OVDeInitSpecialInfo: INF - Special File: 1 C:\Windows\System32\winevt\Logs\Security.evtx Security
03:02:41.287 [18812.15328] <4> dos_backup::OVDeInitSpecialInfo: INF - Special File: 1 C:\Windows\system32\winevt\Logs\System.evtx System
03:02:41.287 [18812.15328] <4> dos_backup::V_ShadowCopyBackupEnd: INF - foreign snapshot released.

At this point, bpbkar realised that connection with media server is broken:

03:02:41.292 [18812.15328] <2> ov_log::V_GlobalLogEx: KeepAliveThread::isKeepAliveRunning keep alive not running: 2
03:02:41.292 [18812.15328] <16> dtcp_read: TCP - failure: recv socket (124) (TCP 10053: Software caused connection abort)
03:02:42.292 [18812.15328] <16> dtcp_read: TCP - failure: recv socket (124) (TCP 10053: Software caused connection abort)
03:02:43.293 [18812.15328] <16> dtcp_read: TCP - failure: recv socket (124) (TCP 10053: Software caused connection abort)
03:02:43.293 [18812.15328] <16> dtcp_read: TCP - failure: recv socket (100) (TCP 10058: Can't send after socket shutdown)

Level 3 logs on media server will give us this info:

bpbrm (pid=544) will tell you up to what time metadata from client was received and information about the timeout.
bptm ((pid=3595) and child process pid 4153 ) will tell you up to what time actual data was received from the client.
- Hamza_H
  Moderator
  5 years ago
  Hello,
  Plus what Marianne and sdo mentionned, there is absolutly a communication problem between the client and the media, looking at timeout or your firewall (not the local one, but the general one) sometimes the session closes after a mount of time)..
  
  Best of luck,
  - danguss
    Level 4
    5 years ago
    The policy is set to "Allow Multiple data streams" and the odd thing is that only the job pointed to the "System state" that as mentioned is set in the client exclude list fails.
    When you open this job, you measure something from the EFI volume. Could it be something related to this volume?
- danguss
  Level 4
  5 years ago
  I performed another backup with a longer timeout and the backup now fails with status code 41. I increased the verbose of the attached media server.
  Erro 13 SRV11420.rar4.6 MB
- danguss
  Level 4
  5 years ago
  The Detailing of this job occurs between 10/17/2019 15:19:36 to 17/10/2019 16:29:15
  
  Job details.rar1 KB
  - Hamza_H
    Moderator
    5 years ago
    HELLO,
    
    After reviewing the logs, there is a certain communication/timeout between the client and media server
    
    snippet from bpbkar :
    16:24:10.803 [17432.41552] <2> ov_log::V_GlobalLog: INF - Attached [0x427a860]
    16:44:39.080 [17432.41552] <2> ov_log::V_GlobalLog: INF - Unattaching [0x427a860]
    
    as you can see there is a gap between the two lines (about 20 minutes)
    from bpbrm logs :
    15:20:08.927 [8328.9244] <2> bpbrm readline: sending keepalive
    15:20:13.928 [8328.9244] <2> bpbrm mm_sig: received ready signal from media manager
    15:21:03.929 [8328.9244] <2> bpbrm send_ping: PING
    15:31:03.934 [8328.9244] <2> bpbrm send_ping: PING
    
    media server is sending a keepalive and ping requests to say "hey i'm still here"
    but after a while it gives up.
    
    You said that changed the client_read_timeout, what is the current value? because i'm still seeing 300 seconds which is 5 minutes which is the gap between the two lines here before it returns the error
    [8328.9244] <2> bpbrm send_ping: PING
    16:01:03.938 [8328.9244] <2> bpbrm send_ping: PING
    16:11:03.947 [8328.9244] <2> bpbrm send_ping: PING
    16:21:03.948 [8328.9244] <2> bpbrm send_ping: PING
    16:24:09.032 [8328.9244] <2> non_mpx_backup_archive_verify_import: ESTIMATE -1 -1 srv11420_1571336378
    16:24:10.407 [8328.9244] <2> non_mpx_backup_archive_verify_import: ADDED FILES TO DB FOR srv11420_1571336378 1 /EFI System Partition/EFI/
    16:29:10.887 [8328.9244] <2> bpbrm readline: bpbrm timeout after 300 seconds
    16:29:10.887 [8328.9244] <2> bpbrm kill_child_process_Ex: start
    16:29:15.606 [8328.9244] <2> bpbrm wait_for_child: start
    16:29:15.606 [8328.9244] <2> bpbrm wait_for_child: child exit_status = 150
    16:29:15.606 [8328.9244] <2> inform_client_of_status: COMM_SOCK == INVALID_SOCKET, 0
    16:29:15.606 [8328.9244] <2> set_job_details: Tfile (6251779): LOG 1571340555 4 bpbkar32 17432 done. status: 41: network connection timed out
    16:29:15.606 [8328.9244] <2> send_job_file: job ID 6251779, ftype = 3 msg len = 79, msg = LOG 1571340555 4 bpbkar32 17432 done. status: 41: network connection timed out
    16:29:15.606 [8328.9244] <2> bpbrm Exit: client snapshot EXIT STATUS 41: network connection timed out
    16:29:15.621 [8328.9244] <2> vnet_close_socket_safely_ex: [vnet.c:1176] safe close 0 0x0
    16:29:15.621 [8328.9244] <4> JobdSockList::UnregisterSocket: Unregister socket (624).
    16:29:15.637 [8328.9244] <2> job_monitoring_exex: ACK disconnect
    16:29:15.637 [8328.9244] <2> job_disconnect: Disconnected
    
    try to increase timeout/ and let us know the results.
    
    Good luck
sdo
Moderator
5 years ago
Apologies, I've not looked at the logs... but when I consistently get status 13/14 for file-system style backups across SDN (software defined networking) (aka virtual networks) (e.g. Hyper-V NVGRE)... the one consistent solution for me (for quite a few (and I do mean like a 100+) guest VMs) has been to disable client-side-dedupe, and only very occaisionally to also disable accelerator.
What I mean is... that.. accelerator doesn't seem to be a problem for me... but client-side dedupe... hmmm... there's something, that I can't quite yet put my finger on, about client-side-dedupe that seems to not sit too happily with SDN, e.g. NAT across NVGRE.
So, without looking at your logs... maybe you could turn off client-side dedupe for the problem clients?
You don't need to make any policy adjustments... because client-side-dedupe is granular at the client-name level, e.g. :
bpclient -update -client blah.com -client_direct 0
- danguss
  Level 4
  5 years ago
  sdo
  Deduplication is always performed by the Media server. In the case of this policy the backup is directed to a stage area (basic disk).
  I did some tests on changing the media server, increasing the client_read_timeout and it doesn't seem to work.
  while analyzing the bpbkar logs I found the following messages:
  
  01:38:01.856 [18812.15328] <2> ov_log::V_GlobalLog: INF - EFI Partition mount access point is GLOBALROOT\Device\HarddiskVolume2!
  01:38:01.868 [18812.15328] <2> ov_log::V_GlobalLog: INF - CreateFile(\\.\EFI>System>Partition) failed. return code: 2
  01:27:24.806 [27532.19004] <16> dtcp_read: TCP - failure: recv socket (736) (TCP 10053: Software caused connection abort)
  01:27:25.807 [27532.19004] <16> dtcp_read: TCP - failure: recv socket (736) (TCP 10053: Software caused connection abort)
  01:27:25.807 [27532.19004] <16> dtcp_read: TCP - failure: recv socket (740) (TCP 10058: Can't send after socket shutdown)
  01:27:25.807 [27532.19004] <4> OVShutdown: INF - Shutdown wait finished
  01:27:25.807 [27532.19004] <4> OVShutdown: INF - Closing stderr connection
  01:27:25.807 [27532.19004] <2> dtcp_close: TCP - success: close socket (740)
  01:27:25.807 [27532.19004] <4> OVShutdown: INF - Closing stdout/stdin connection
  01:27:25.807 [27532.19004] <2> dtcp_close: TCP - success: close socket (736)
  01:27:25.807 [27532.19004] <4> OVShutdown: INF - Finished process
  01:38:01.856 [18812.15328] <2> ov_log::V_GlobalLog: INF - EFI Partition mount access point is GLOBALROOT\Device\HarddiskVolume2!
  01:38:01.868 [18812.15328] <2> ov_log::V_GlobalLog: INF - CreateFile(\\.\EFI>System>Partition) failed. return code: 2
  Do these messages mean anything?
  - sdo
    Moderator
    5 years ago
    I think EFI messages can be ignored for now.
    The first error is TCP status 10053, which is listed here:
    https://docs.microsoft.com/en-us/windows/win32/winsock/windows-sockets-error-codes-2
    As for why the underlying network TCP conversation failed, then I'm not sure that we can tell from the bpbkar log snippet. I think you might have to google for causes of TCP 10053 and posisble solutions.

Mike_Gavrilov

Level 6

5 years ago


02:42:13.321 [18812.15328] <2> ov_log::V_GlobalLog: INF - VssSnapshotVolume::BackupComplete(): Trying to get lock to run BackupComplete
02:42:13.321 [18812.15328] <2> ov_log::V_GlobalLog: INF - VssSnapshotVolume::BackupComplete(): Got lock - running BackupComplete
02:42:13.321 [18812.15328] <2> ov_log::V_GlobalLog: INF - Attached [0x1f0d430]
          At this operation time out occurred on media server but the client doesn't know yet
03:02:40.443 [18812.15328] <2> ov_log::V_GlobalLog: INF - Unattaching [0x1f0d430]
03:02:40.443 [18812.15328] <2> ov_log::V_GlobalLog: INF - Attempting to delete VSS snapshot set: {81b8f0aa-336e-44a9-a84d-2686026a0f91}
03:02:41.136 [18812.15328] <2> ov_log::V_GlobalLog: INF - VSS snapshot set deleted. Snapshot SetID = {81b8f0aa-336e-44a9-a84d-2686026a0f91}
03:02:41.137 [18812.15328] <2> ov_log::V_GlobalLog: INF - Freeing gathered writer metadata
03:02:41.137 [18812.15328] <2> ov_log::V_GlobalLog: INF - Releasing vss interface 0x1f0d430
03:02:41.277 [18812.15328] <8> tar_backup_tfi::cleanupTemp: WRN - will not cleanup 'temp' directory, debug level(s) are too high
03:02:41.277 [18812.15328] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 0: the requested operation was successfully completed
03:02:41.277 [18812.15328] <4> dos_backup::V_PostCleanup: INF - ================================================================================

Try to start with increasing CLIENT_READ_TIMEOUT

danguss
Level 4
5 years ago
Mike_Gavrilov I had already made this change and it seems to have no effect. = (

Forum Discussion

backup fail with status 13

18 Replies

Related Content

BAckup fails with status 24

VM backup failing with status 23 and status 6

VM Backup failed with status code 6

Backup fails with status 58.

Incremental backup fail with status code 1

Recent Discussions

MS-SharePoint policy restore error (2804) .

How to restore a backup

How to configure RBAC

command: bperror

10 years old netbackup appliance database service down, ssl certification out date