cancel
Showing results for 
Search instead for 
Did you mean: 

backup is failing with erro 13

rupadeviy
Level 4

Hi All,

backup is failing with error 13, backup is failing after writing some data.

job details :

 

12/01/2017 21:43:09 - Info bptm (pid=9218) backup child process is pid 9741
12/01/2017 21:43:09 - begin writing
12/01/2017 21:43:19 - Info bpbkar (pid=23118) 5000 entries sent to bpdbm
12/01/2017 21:45:30 - Info bpbrm (pid=9160) from client cddbrp01f.backup.internal.vf.net.nz: TRV - [/sys] is on file system type sysfs. Skipping
12/01/2017 22:07:53 - Error bpbrm (pid=9160) socket read failed: errno = 62 - Timer expired
12/01/2017 22:07:55 - Error bptm (pid=9218) media manager terminated by parent process
12/01/2017 22:08:05 - Info nbmsrp03f (pid=9218) StorageServer=hp-StoreOnceCatalyst:COFC-2M2601282501; Report=scanned: 39145217 KB, CR sent: 30823728 KB, dedup: 21.26%
12/01/2017 22:09:05 - Error bpbrm (pid=9160) cannot send mail to DL-INFRA-NZ-BACKUP-TSSC@vodafone.com
12/01/2017 22:09:35 - Error bpbrm (pid=9160) cannot send mail to root on client cddbrp01f.backup.internal.vf.net.nz
12/01/2017 22:09:35 - Info bpbkar (pid=23118) done. status: 13: file read failed

bptestbpcd output:

0 1 2
172.18.192.20:624 -> 172.27.226.64:13782
172.18.192.20:1556 <- 172.27.226.64:64974
172.18.192.20:1556 <- 172.27.226.64:44692

 

 

 

 

 

7 REPLIES 7

Marianne
Level 6
Partner    VIP    Accredited Certified

Status 13 is not a connectivity issue - it is a network-related problem that occurred some time after the start of the data transfer.

socket read failed: errno = 62 - Timer expired 
is normally Client Read Timeout.

You can check on the media server what the current value is. If 300, then try to change it to 1800.
You don't want to make this value too big as it will simply delay the error if something else is wrong, e.g. bpbkar process getting hung or stuck on a specific folder.

To troubleshoot, ensure that all of these log folders exist: 
On media server: bpbrm and bptm
On client: bpbkar
Level 3 logs should be fine.

PS:
You may want to remove domain names when you copy and paste info in a public forum.

Bpbrm logs on client

 

22:29:57.054 [24785] <2> bpbkar SelectFile: cwd=/migrate path=db_full_08sl2b78_8_1
22:30:08.394 [24785] <2> bpbkar PrintFile: /migrate/db_full_08sl2b78_8_1
22:30:08.394 [24785] <2> bpbkar SelectFile: cwd=/migrate path=test_file5.txt
22:30:28.416 [24785] <2> bpbkar PrintFile: /migrate/test_file5.txt
22:30:28.416 [24785] <2> bpbkar SelectFile: cwd=/migrate path=db_full_05sl1us4_5_1
22:54:20.048 [24785] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 32: Broken pipe
22:54:20.049 [24785] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 24: socket write failed
22:54:20.049 [24785] <4> bpbkar Exit: INF - EXIT STATUS 24: socket write failed
22:54:20.049 [24785] <2> bpbkar Exit: INF - Close of stdout complete
22:54:20.050 [24785] <4> bpbkar Exit: INF - setenv FINISHED=0

Marianne
Level 6
Partner    VIP    Accredited Certified

You need to look at all logs that I have mentioned in order to follow the entire backup data flow.
Follow the specific PIDs that you see in job details.
bptm (pid=9218)
bpbrm (pid=9160)
bpbrm (pid=9160)

Just a snippet of a single log file does not help. 

 

You are getting an error from bpbkar. Look in the bpbkar log on the client to see why.

Bpbkar writes to two sockets. It sends messages to bpbrm to tell how the backup is going. These are getting through, so consider the other socket. Bpkar sends data to bptm. @Marianne has already suggested checking the bptm log. I suggest you match it up with the bpbkar log.

-----

<edit> I see on a more careful read that the bpkar failure came after bptm failed. That's natural. You're back to the relationship between bpbrm and bptm.

please check below logs and suggest me

 

Below are the logs:

 

 

Detail status:

 

12/04/2017 09:38:46 - Info bpbkar (pid=28023) Backup started
12/04/2017 09:38:46 - Info bpbrm (pid=27316) bptm pid: 27342
12/04/2017 09:38:46 - started process bpbrm (pid=27316)
12/04/2017 09:38:46 - connecting
12/04/2017 09:38:46 - connected; connect time: 0:00:00
12/04/2017 09:38:46 - Info bptm (pid=27342) start
12/04/2017 09:38:47 - Info bptm (pid=27342) using 262144 data buffer size
12/04/2017 09:38:47 - Info bptm (pid=27342) setting receive network buffer to 1049600 bytes
12/04/2017 09:38:47 - Info bptm (pid=27342) using 30 data buffers
12/04/2017 09:38:48 - Info bptm (pid=27342) start backup
12/04/2017 09:38:51 - Info bptm (pid=27342) backup child process is pid 27585
12/04/2017 09:38:51 - begin writing
12/04/2017 09:39:05 - Info bpbrm (pid=27316) from client cd: TRV - [/tmp/.font-unix/fs7100] is a socket special file. Skipping
12/04/2017 09:39:05 - Info bpbrm (pid=27316) from client cd: TRV - [/tmp/ssh-DqVcd16242/agent.16242] is a socket special file. Skipping
12/04/2017 09:39:15 - Info bpbkar (pid=28023) 5000 entries sent to bpdbm
12/04/2017 09:39:19 - Info bpbrm (pid=27316) from client cd: TRV - [/tmp/ssh-RcfeE21592/agent.21592] is a socket special file. Skipping
12/04/2017 09:39:20 - Info bpbkar (pid=28023) 10000 entries sent to bpdbm
12/04/2017 09:39:28 - Info bpbkar (pid=28023) 15000 entries sent to bpdbm
12/04/2017 09:39:35 - Info bpbkar (pid=28023) 20000 entries sent to bpdbm
12/04/2017 09:39:41 - Info bpbkar (pid=28023) 25000 entries sent to bpdbm
12/04/2017 09:39:47 - Info bpbkar (pid=28023) 30000 entries sent to bpdbm
12/04/2017 09:39:53 - Info bpbkar (pid=28023) 35000 entries sent to bpdbm
12/04/2017 09:40:01 - Info bpbkar (pid=28023) 40000 entries sent to bpdbm
12/04/2017 09:40:04 - Info bpbkar (pid=28023) 45000 entries sent to bpdbm
12/04/2017 09:40:43 - Info bpbkar (pid=28023) 50000 entries sent to bpdbm
12/04/2017 09:40:43 - Info bpbrm (pid=27316) from client c: TRV - [/sys] is on file system type sysfs. Skipping
12/04/2017 10:06:23 - Error bpbrm (pid=27316) socket read failed: errno = 62 - Timer expired
12/04/2017 10:06:25 - Error bptm (pid=27342) media manager terminated by parent process
12/04/2017 10:06:52 - Info nbmsrp03f (pid=27342) StorageServer=hp-StoreOnceCatalyst:COFC-; Report=scanned: 45004033 KB, CR sent: 36844858 KB, dedup: 18.13%
12/04/2017 10:07:42 - Error bpbrm (pid=27316) cannot send mail to
12/04/2017 10:08:12 - Error bpbrm (pid=27316) cannot send mail to root on client cddbrp01f.backup.internal.vf.net.nz
12/04/2017 10:08:12 - Info bpbkar (pid=28023) done. status: 13: file read failed
12/04/2017 10:08:12 - end writing; write time: 0:29:21
file read failed  (13)

 

bpbrm log :

10:07:42.262 [27316] <2> isDONE: reply DONE 0 <?xml version="1.0" encoding="utf-8"?><StatusMsgDoc><StatusMsg NBUStatus="0" Severity="0" VXULoid="271" VXULtid="0"><MsgString>the&#32;requested&#32;operation&#32;was&#32;successfully&#32;completed</MsgString></StatusMsg></StatusMsgDoc>
10:07:42.262 [27316] <2> isDONE: reply DONE 0
10:07:48.796 [27316] <2> bpcr_get_version_rqst: bpcd version: 07610000
10:07:54.896 [27316] <2> bpcr_get_version_rqst: bpcd version: 07610000
10:08:12.264 [27316] <2> get_long_base: (1) cannot read (byte 1) from network: Interrupted system call (4)
10:08:12.264 [27316] <2> bpcr_send_mail_rqst: couldn't read the user's (root) configuration file
10:08:12.264 [27316] <16> bpbrm Exit: cannot send mail to root on client cddbrp01f.backup.internal.vf.net.nz
10:08:12.264 [27316] <2> set_job_details: Tfile (7457061): LOG 1512335292 16 bpbrm 27316 cannot send mail to root on client cddbrp01f.backup.internal.vf.net.nz
10:08:12.264 [27316] <2> send_job_file: job ID 7457061, ftype = 3 msg len = 101, msg = LOG 1512335292 16 bpbrm 27316 cannot send mail to root on client cddbrp01f.backup.internal.vf.net.nz
10:08:12.264 [27316] <2> ConnectionCache::connectAndCache: Acquiring new connection for host nbcsrp01f.backup.internal.vf.net.nz, query type 1
10:08:12.264 [27316] <2> vnet_sortaddrs: [vnet_addrinfo.c:3852] sorted addrs: 1 0x1
10:08:12.264 [27316] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4750] using interface  ANY
10:08:12.265 [27316] <2> vnet_sortaddrs: [vnet_addrinfo.c:3852] sorted addrs: 1 0x1
10:08:12.265 [27316] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4750] using interface  ANY
10:08:12.265 [27316] <2> vnet_sortaddrs: [vnet_addrinfo.c:3852] sorted addrs: 1 0x1
10:08:12.265 [27316] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4750] using interface  ANY
10:08:12.265 [27316] <2> async_connect: [vnet_connect.c:1499] connect in progress 1 0x1
10:08:12.266 [27316] <2> vnet_pbxConnect: pbxConnectEx Succeeded
10:08:12.266 [27316] <2> do_pbx_service: [vnet_connect.c:2164] via PBX bpdbm CONNECT FROM 172.27.209.53.42165 TO 172.18.192.20.1556 fd = 6
10:08:12.267 [27316] <2> async_connect: [vnet_connect.c:1691] connect async CONNECT FROM 172.27.209.53.42165 TO 172.18.192.20.1556 fd = 6
10:08:12.267 [27316] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM 0.0.0.0 TO 1f.backup.internal..net.nz 172.18.192.20 bpdbm VIA pbx
10:08:12.267 [27316] <2> logconnections: BPDBM CONNECT FROM 172.27.209.53.42165 TO 172.18.192.20.1556 fd = 6
10:08:12.267 [27316] <2> vnet_check_vxss_client_magic_with_info_ex: [vnet_vxss_helper.c:939] Ignoring VxSS authentication 2 0x2
10:08:12.273 [27316] <2> db_end: Need to collect reply
10:08:12.282 [27316] <2> isDONE: reply DONE 0 <?xml version="1.0" encoding="utf-8"?><StatusMsgDoc><StatusMsg NBUStatus="0" Severity="0" VXULoid="271" VXULtid="0"><MsgString>the&#32;requested&#32;operation&#32;was&#32;successfully&#32;completed</MsgString></StatusMsg></StatusMsgDoc>
10:08:12.282 [27316] <2> isDONE: reply DONE 0
10:08:12.283 [27316] <2> set_job_details: Tfile (7457061): LOG 1512335292 4 bpbkar 28023 done. status: 13: file read failed
10:08:12.283 [27316] <2> send_job_file: job ID 7457061, ftype = 3 msg len = 65, msg = LOG 1512335292 4 bpbkar 28023 done. status: 13: file read failed
10:08:12.283 [27316] <2> bpbrm Exit: client backup EXIT STATUS 13: file read failed
10:08:12.284 [27316] <2> vnet_close_socket_safely: [vnet.c:1101] safe close 0 0x0
10:08:12.292 [27316] <2> job_monitoring_exex: ACK disconnect
10:08:12.292 [27316] <2> job_disconnect: Disconnected

 

bptm log :

ORE1, Session=7457061:bptm:27342:03f}  return: OK (delete_image)
10:06:26.577 [27342] <2> delete_image_disk_sts_impl: Deleting disk header for cd_1512333525_C1_HDR
10:06:26.577 [27342] <4> 7457061:bptm:27342:03f: ----- Entering delete_image {Server=hp-StoreOnceCatalyst:COFC-2M2601282501, Lsu=STORE1, Session=7457061:bptm:27342:nbmsrp03f}  (delete_image)
10:06:26.577 [27342] <4> 7457061:bptm:27342:nbmsrp03f: Delete image: cd_1512333525_C1_HDR (delete_image)
10:06:26.787 [27342] <4> 7457061:bptm:27342:nbmsrp03f: ----- Leaving delete_image {Server=hp-StoreOnceCatalyst:COFC, Lsu=STORE1, Session=7457061:bptm:27342:nb}  return: OK (delete_image)
10:06:26.788 [27342] <2> ConnectionCache::connectAndCache: Acquiring new connection for host nbcsrp01f.backup.internal.vf.net.nz, query type 161
10:06:26.790 [27342] <2> vnet_pbxConnect: pbxConnectEx Succeeded
10:06:26.791 [27342] <2> logconnections: BPDBM CONNECT FROM 172.27.209.53.45989 TO 172.18.192.20.1556 fd = 0
10:06:26.799 [27342] <2> db_end: Need to collect reply

10:06:52.749 [27342] <2> send_MDS_msg: MEDIA_DONE 0 -1505639338 0 @aaaaK 0 0 {F525EB32-D869-11E7-AE90-17D6817F5938}
10:06:52.749 [27342] <2> JobInst::sendIrmMsg: starting
10:06:52.749 [27342] <2> packageBptmResourceDoneMsg: msg (MEDIA_DONE 0 -1505639338 0 @aaaaK 0 0 {F525EB32-D869-11E7-AE90-17D6817F5938})
10:06:52.749 [27342] <2> packageBptmResourceDoneMsg: keyword MEDIA_DONE version 0 jobid -1505639338 copyNum 0 mediaId @aaaaK mediaKey 0 unloadDelay 0 allocId {F525EB32-D869-11E7-AE90-17D6817F5938}
10:06:52.749 [27342] <2> packageBptmResourceDoneMsg: returns 0
10:06:52.751 [27342] <2> JobInst::sendIrmMsg: returning
10:06:52.760 [27342] <2> catch_signal: EXITING with status 82
10:06:52.763 [27342] <4> 7457061:bptm:27342:nbmsrp03f: ----- Entering get_server_prop {Server=hp-StoreOnceCatalyst:COFC-2M2601282501, Session=7457061:bptm:27342:nbmsrp03f}  (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: SERVER INFO (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: ============= (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: srv_server:hp-StoreOnceCatalyst:COFC-2M2601282501 (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: srv_sth:0x7fff47714850 (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: srv_flags:790 (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: srv_maxconnect:0 (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: srv_nconnect:1 (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: srv_cap:0x7fff47714a5c (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: srv_interface:0x7fff47714b60 (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: srv_cred:0x7fff47718b60 (get_server_prop)
10:06:52.763 [27342] <2> 7457061:bptm:27342:03f: srv_ndmp_host: (get_server_prop)
10:06:52.763 [27342] <4> 7457061:bptm:27342:03f: ----- Leaving get_server_prop {Server=hp-StoreOnceCatalyst:COFC-2M2601282501, Session=7457061:bptm:27342:03f}  return: OK (get_server_prop)
10:06:52.763 [27342] <2> report_dedup_stats: include stats: verify: ops=0, size=0, time=0 include: ops=0, size=0, time=0
10:06:52.763 [27342] <4> 7457061:bptm:27342:03f: ----- Entering ioctl {Server=hp-StoreOnceCatalyst:COFC-2M2601282501, Session=7457061:bptm:27342:03f}  (ioctl)
10:06:52.763 [27342] <8> 7457061:bptm:27342:03f: ioctl: Wrote 46084129792 logical bytes, cloned 0 bytes, wrote 360378130 meta bytes, wrote 37729135211 dedupe bytes (ioctl)
10:06:52.763 [27342] <4> 7457061:bptm:27342:nbmsrp03f: ----- Leaving ioctl {Server=hp-StoreOnceCatalyst:COFC-2M2601282501, Session=7457061:bptm:27342:03f}  return: OK (ioctl)
10:06:52.763 [27342] <2> ConnectionCache::connectAndCache: Acquiring new connection for host nbcsrp01f.backup.internal.vf.net.nz, query type 1
10:06:52.765 [27342] <2> vnet_pbxConnect: pbxConnectEx Succeeded
10:06:52.765 [27342] <2> logconnections: BPDBM CONNECT FROM 172.27.209.53.53383 TO 172.18.192.20.1556 fd = 0
10:06:52.771 [27342] <2> db_end: Need to collect reply
10:06:52.777 [27342] <4> report_dedup_stats: StorageServer=hp-StoreOnceCatalyst:COFC-2M2601282501; Report=scanned: 45004033 KB, CR sent: 36844858 KB, dedup: 18.13%
10:06:52.777 [27342] <2> set_job_details: Tfile (7457061): LOG 1512335212 4 03f 27342 StorageServer=hp-StoreOnceCatalyst:COFC-2M2601282501; Report=scanned: 45004033 KB, CR sent: 36844858 KB, dedup: 18.13%
10:06:52.777 [27342] <2> send_job_file: job ID 7457061, ftype = 3 msg len = 152, msg = LOG 1512335212 4 03f 27342 StorageServer=hp-StoreOnceCatalyst:COFC-2M2601282501; Report=scanned: 45004033 KB, CR sent: 36844858 KB, dedup: 18.13%
10:06:52.777 [27342] <2> set_job_details: jobData (7457061)
10:06:52.777 [27342] <2> send_structure_data: Index 52 Field m_nDedupRatio Value <18.129999>
10:06:52.786 [27342] <2> job_monitoring_exex: ACK disconnect
10:06:52.786 [27342] <2> job_disconnect: Disconnected
10:06:52.786 [27342] <4> 7457061:bptm:27342:03f: ----- Entering close_server {Server=hp-StoreOnceCatalyst:COFC-2M2601282501, Session=7457061:bptm:27342:03f}  (close_server)
10:06:52.786 [27342] <4> 7457061:bptm:27342:03f: Closing server hp-StoreOnceCatalyst:COFC-2M2601282501 (close_server)
10:06:52.786 [27342] <2> 7457061:bptm:27342:03f: Deleting Server Handle (close_server)
10:06:52.786 [27342] <2> 7457061:bptm:27342:03f: Setting Server Handle NULL (close_server)
10:06:52.786 [27342] <2> 7457061:bptm:27342:03f: Done (close_server)
10:06:52.786 [27342] <4> 7457061:bptm:27342:03f: ----- Leaving close_server {Server=hp-StoreOnceCatalyst:COFC-2M2601282501, Session=7457061:bptm:27342:03f}  return: OK (close_server)
10:06:52.786 [27342] <2> cleanup: Detached from BPBRM shared memory

 

Sujay24
Level 4
Employee

Hi Rupa,

Add client name in Resilient network:-

Admin console->Master server host properties->Resilient Network->Add Client name here and ON the resiliency

Or

Admin Console->Media server Host properties->Select the media server used for backups->Timeout set Client Read timeout to 3600 and Uncheck use OS dependent timeout and set File Browse time to 300

Admin Console->Client Host properties->Select the Client of whose backup is taken->Timeout set Client Read timeout to 3600 and Uncheck use OS dependent timeout and set File Browse time to 300

Initiate the backup and monitor the issue.

Your bpbrm and bptm logs snippets start after the failure.

>12/04/2017 10:06:23 - Error bpbrm (pid=27316) socket read failed: errno = 62 - Timer expired

What happened between 9:40:43 and 10:06:23? Up until 9:40 you had bpbkar reporting to bpbpm that it was sending data to bptm. What's in the bpbkar and bptm logs during that gap?