cancel
Showing results for 
Search instead for 
Did you mean: 

San Client Backup Error Netbackup 7.1.0.2

jounix
Level 3

I have a linux san client and a solaris media server.  The san client errors out with an error 13. The san client can backup most files on the server but when the backup reaches a certain point the backup errors out.

This is the end of the bobrm file. It shows were the media server is aborting the backup for no reason.

13:33:10.673 [10047] <2> vnet_pbxConnect: pbxConnectEx Succeeded
13:33:10.675 [10047] <2> do_pbx_service: ../../libvlibs/vnet_connect.c.1776: 0: via PBX: VNETD CONNECT FROM 10.248.234.126.45013 TO 10.248.237.60.1556 fd = 11
13:33:10.675 [10047] <2> vnet_async_connect: ../../libvlibs/vnet_connect.c.1367: 0: connect: async CONNECT FROM 10.248.234.126.45013 TO 10.248.237.60.1556 fd = 11
13:33:10.675 [10047] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1370: 0: found in cache name: 10.248.237.60
13:33:10.675 [10047] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1371: 0: found in cache service: NULL
13:33:10.675 [10047] <2> vnet_vnetd_connect_forward_socket_begin: ../../libvlibs/vnet_vnetd.c.445: 0: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
13:33:10.720 [10047] <2> vnet_vnetd_connect_forward_socket_begin: ../../libvlibs/vnet_vnetd.c.462: 0: ipc_string: /tmp/vnet-24727326479590415632000000488-JEccVT
13:33:10.831 [10047] <2> vnet_vnetd_connect_forward_socket_begin: ../../libvlibs/vnet_vnetd.c.473: 0: hash_str1: b45c0891a78e07ed4e66604e06c7d783
13:33:10.951 [10047] <2> bpbrm mail_status: OUT_SOCK from bpcr = 10
13:33:10.951 [10047] <2> bpbrm mail_status: IN_SOCK from bpcr = 11
13:33:10.951 [10047] <2> OpenMailPipe: /usr/ucb/mail -s "Backup on frux0327 - 13" "ush@geico.com" >/dev/null 2>/dev/null
13:33:11.070 [10047] <2> bpcr_get_version_rqst: bpcd version: 07100000
13:33:11.192 [10047] <2> bpcr_get_version_rqst: bpcd version: 07100000
13:33:11.430 [10047] <2> bpcr_get_version_rqst: bpcd version: 07100000
13:33:12.054 [10047] <2> bpbrm read_parent_msg: read from parent TERMINATE
13:33:12.054 [10047] <2> bpbrm tell_mm: sending media manager msg: TERMINATE
13:33:12.054 [10047] <2> set_job_details: Tfile (8366971): LOG 1326479592 4 bpbrm 10047 sending media manager msg: TERMINATE

 Here are the details from teh actvity monitor:

 

01/13/2012 13:21:34 - Info nbjm (pid=2338) starting backup job (jobid=8366971) for client frux0327, policy UNIX_TEST_FRUX0327, schedule Incr
01/13/2012 13:21:35 - estimated 0 kbytes needed
01/13/2012 13:21:35 - Info nbjm (pid=2338) started backup job for client frux0327, policy UNIX_TEST_FRUX0327, schedule Incr on storage unit nbuf3a_sep2
01/13/2012 13:21:36 - Info bpbrm (pid=10047) starting bptm
01/13/2012 13:21:36 - Info bpbrm (pid=10047) Started media manager using bpcd successfully
01/13/2012 13:21:36 - started process bpbrm (pid=10047)
01/13/2012 13:21:38 - Info bpbrm (pid=10047) frux0327 is the host to backup data from
01/13/2012 13:21:38 - Info bpbrm (pid=10047) telling media manager to start backup on client
01/13/2012 13:21:38 - Info bptm (pid=10052) using 65536 data buffer size
01/13/2012 13:21:38 - Info bptm (pid=10052) using 32 data buffers
01/13/2012 13:21:38 - Opening Fibre Transport connection, Backup Id: frux0327_1326478895
01/13/2012 13:21:39 - Info bpbrm (pid=10047) spawning a brm child process
01/13/2012 13:21:39 - Info bpbrm (pid=10047) child pid: 10053
01/13/2012 13:21:39 - Info bptm (pid=10052) start backup
01/13/2012 13:21:39 - Info bptm (pid=10052) Waiting for mount of media id SF2037 (copy 1) on server nbuf3a.
01/13/2012 13:21:39 - mounting SF2037
01/13/2012 13:21:39 - connecting
01/13/2012 13:21:40 - Info bpbrm (pid=10047) sending bpsched msg: CONNECTING TO CLIENT FOR frux0327_1326478895
01/13/2012 13:21:40 - Info bpbrm (pid=10047) start bpbkar on client
01/13/2012 13:21:40 - Info bpbkar (pid=30476) Backup started
01/13/2012 13:21:40 - Info bpbrm (pid=10047) Sending the file list to the client
01/13/2012 13:21:40 - connected; connect time: 0:00:00
01/13/2012 13:21:47 - Info bptm (pid=10052) media id SF2037 mounted on drive index 24, drivepath /dev/rmt/16cbn, drivename Sep2-40, copy 1
01/13/2012 13:21:47 - mounted SF2037; mount time: 0:00:08
01/13/2012 13:21:47 - positioning SF2037 to file 1
01/13/2012 13:21:48 - positioned SF2037; position time: 0:00:01
01/13/2012 13:21:48 - begin writing
01/13/2012 13:32:32 - Error bpbrm (pid=10053) socket read failed: errno = 131 - Connection reset by peer
01/13/2012 13:32:32 - Info bpbrm (pid=10047) sending media manager msg: STOP BACKUP frux0327_1326478895
01/13/2012 13:32:37 - end writing; write time: 0:10:49
01/13/2012 13:32:38 - Info bpbrm (pid=10047) media manager for backup id frux0327_1326478895 exited with status 150: termination requested by administrator
01/13/2012 13:33:12 - Info bpbrm (pid=10047) sending media manager msg: TERMINATE
file read failed  (13)01/13/2012 13:21:34 - Info nbjm (pid=2338) starting backup job (jobid=8366971) for client frux0327, policy UNIX_TEST_FRUX0327, schedule Incr
01/13/2012 13:21:35 - estimated 0 kbytes needed
01/13/2012 13:21:35 - Info nbjm (pid=2338) started backup job for client frux0327, policy UNIX_TEST_FRUX0327, schedule Incr on storage unit nbuf3a_sep2
01/13/2012 13:21:36 - Info bpbrm (pid=10047) starting bptm
01/13/2012 13:21:36 - Info bpbrm (pid=10047) Started media manager using bpcd successfully
01/13/2012 13:21:36 - started process bpbrm (pid=10047)
01/13/2012 13:21:38 - Info bpbrm (pid=10047) frux0327 is the host to backup data from
01/13/2012 13:21:38 - Info bpbrm (pid=10047) telling media manager to start backup on client
01/13/2012 13:21:38 - Info bptm (pid=10052) using 65536 data buffer size
01/13/2012 13:21:38 - Info bptm (pid=10052) using 32 data buffers
01/13/2012 13:21:38 - Opening Fibre Transport connection, Backup Id: frux0327_1326478895
01/13/2012 13:21:39 - Info bpbrm (pid=10047) spawning a brm child process
01/13/2012 13:21:39 - Info bpbrm (pid=10047) child pid: 10053
01/13/2012 13:21:39 - Info bptm (pid=10052) start backup
01/13/2012 13:21:39 - Info bptm (pid=10052) Waiting for mount of media id SF2037 (copy 1) on server nbuf3a.
01/13/2012 13:21:39 - mounting SF2037
01/13/2012 13:21:39 - connecting
01/13/2012 13:21:40 - Info bpbrm (pid=10047) sending bpsched msg: CONNECTING TO CLIENT FOR frux0327_1326478895
01/13/2012 13:21:40 - Info bpbrm (pid=10047) start bpbkar on client
01/13/2012 13:21:40 - Info bpbkar (pid=30476) Backup started
01/13/2012 13:21:40 - Info bpbrm (pid=10047) Sending the file list to the client
01/13/2012 13:21:40 - connected; connect time: 0:00:00
01/13/2012 13:21:47 - Info bptm (pid=10052) media id SF2037 mounted on drive index 24, drivepath /dev/rmt/16cbn, drivename Sep2-40, copy 1
01/13/2012 13:21:47 - mounted SF2037; mount time: 0:00:08
01/13/2012 13:21:47 - positioning SF2037 to file 1
01/13/2012 13:21:48 - positioned SF2037; position time: 0:00:01
01/13/2012 13:21:48 - begin writing
01/13/2012 13:32:32 - Error bpbrm (pid=10053) socket read failed: errno = 131 - Connection reset by peer
01/13/2012 13:32:32 - Info bpbrm (pid=10047) sending media manager msg: STOP BACKUP frux0327_1326478895
01/13/2012 13:32:37 - end writing; write time: 0:10:49
01/13/2012 13:32:38 - Info bpbrm (pid=10047) media manager for backup id frux0327_1326478895 exited with status 150: termination requested by administrator
01/13/2012 13:33:12 - Info bpbrm (pid=10047) sending media manager msg: TERMINATE
file read failed  (13)

3 REPLIES 3

Marianne
Level 6
Partner    VIP    Accredited Certified

You cannot only look at the last 2 seconds of one log file to try and determine the reason for the failure...

You need all of the following logs (between 13:21 and 13:33):

On media server: bpbrm and bptm

On client : bpbkar

bpbkar will tell if any data was sent. bptm will tell if any data was received. If no data was received, we are probably looking at a timeout.

jounix
Level 3

I should have mentioned that. Both logs have been evalutate. data has been sent and data has been recieved. The last 2 seconds of the log file is where the error initially occurs and fails from that point.

Omar_Villa
Level 6
Employee

Why you dont try to run a test through the tcp/ip layer, error 13 is more an file system issue and not a SAN Client, I will try to run the same backup across the LAN I know will take longer but if it fails than you can turn on the touch  /usr/openv/netbackup/bpbkar_path_tr and catch the corrupted file that is giving you hard time.

Regards.