Forum Discussion

symsonu's avatar
symsonu
Level 6
11 years ago
Solved

few streams failed with status code 49 client did not start 12/02/2014 09:58:23 - end writing

 

Below is the logs from the detailed status of job

12/02/2014 08:34:51 - Info nbjm (pid=5922) starting backup job (jobid=20968) for client ossbkupIP, policy OSS_i386_DATA_ossbkupIP_syb1bkupIP, schedule Daily_Incr
12/02/2014 08:34:51 - Info nbjm (pid=5922) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=20968, request id:{68FF7CD4-79C3-11E4-97E9-0017A477FC20})
12/02/2014 08:34:51 - requesting resource bksrst1-hcart-robot-tld-0
12/02/2014 08:34:51 - requesting resource bksrst1.NBU_CLIENT.MAXJOBS.ossbkupIP
12/02/2014 08:34:51 - requesting resource bksrst1.NBU_POLICY.MAXJOBS.OSS_i386_DATA_ossbkupIP_syb1bkupIP
12/02/2014 08:34:51 - awaiting resource bksrst1-hcart-robot-tld-0. No drives are available.
12/02/2014 09:12:32 - Info nbrb (pid=5900) Limit has been reached for the logical resource bksrst1.NBU_CLIENT.MAXJOBS.ossbkupIP
12/02/2014 09:58:08 - granted resource  bksrst1.NBU_CLIENT.MAXJOBS.ossbkupIP
12/02/2014 09:58:08 - granted resource  bksrst1.NBU_POLICY.MAXJOBS.OSS_i386_DATA_ossbkupIP_syb1bkupIP
12/02/2014 09:58:08 - granted resource  1128L4
12/02/2014 09:58:08 - granted resource  HP.ULTRIUM4-SCSI.003
12/02/2014 09:58:08 - granted resource  bksrst1-hcart-robot-tld-0
12/02/2014 09:58:08 - estimated 0 kbytes needed
12/02/2014 09:58:08 - Info nbjm (pid=5922) started backup (backupid=ossbkupIP_1417489088) job for client ossbkupIP, policy OSS_i386_DATA_ossbkupIP_syb1bkupIP, schedule Daily_Incr on storage unit bksrst1-hcart-robot-tld-0
12/02/2014 09:58:08 - started process bpbrm (pid=24154)
12/02/2014 09:58:09 - Info bpbrm (pid=24154) ossbkupIP is the host to backup data from
12/02/2014 09:58:09 - Info bpbrm (pid=24154) reading file list from client
12/02/2014 09:58:09 - Info bpbrm (pid=24154) starting bpbkar on client
12/02/2014 09:58:09 - Info bpbkar (pid=0) Starting bpstart_notify script
12/02/2014 09:58:09 - connecting
12/02/2014 09:58:09 - connected; connect time: 0:00:00
12/02/2014 09:58:23 - Error bpbrm (pid=24154) client ossbkupIP aborted
12/02/2014 09:58:23 - Info bpbkar (pid=0) done. status: 49: client did not start
12/02/2014 09:58:23 - end writing
client did not start  (49)

 

 

I figured out that below two commands reporting different host names

admrst1{root} # ./bpclntcmd -pn
expecting response from server bksrst1
admrst1-bk *NULL* 10.224.16.101 25614
admrst1{root} # ./bpclntcmd -self
current domain = rstom.net
NIS does not seem to be running: (10) can't communicate with ypbind
gethostname() returned: ossbkupIP
host ossbkupIP: ossbkupIP at 10.224.16.102
aliases:     ossbkupIP     10.224.16.102
 getfqdn: Error 0
admrst1{root} #

 

admrst1-bk and ossbkupIP are two IPs in  backup network but we  have created policy using ossbkupIP

I am thinking that bpclntcmd  is showing admrst1-bck  is wrong

How to correct  this ?

Please find logs from bprd while running the above command

 

7:06:45.405 [10601] <2> process_request: EXIT STATUS 0
17:06:45.444 [10476] <2> listen_loop: do_schild = 1
17:06:45.444 [10476] <2> childterm: pid=10601 exit=0, signo=0 core=no
17:06:45.444 [10476] <2> schild: wait2() ECHILD
17:06:58.475 [10476] <2> listen_loop: initial schedule event
17:06:58.516 [10476] <2> launch: /usr/openv/netbackup/bin/admincmd/bpstsinfo, pid=10848
17:06:58.517 [10476] <2> listen_loop: do_schild = 1
17:06:58.517 [10476] <2> childterm: pid=10848 exit=0, signo=0 core=no
17:06:58.517 [10476] <2> schild: wait2() ECHILD
17:07:17.647 [10476] <2> vnet_pbxAcceptSocket: Accepted sock[10] from 10.224.16.104:45137
17:07:17.648 [10476] <2> bprd: socket fd from accept() is 10
17:07:17.649 [10476] <2> listen_loop: request complete
17:07:17.654 [11077] <2> logconnections: BPRD ACCEPT FROM 10.224.16.104.45137 TO 10.224.16.105.1556 fd = 10
17:07:17.654 [11077] <2> process_request: setsockopt SO_LINGER on 10 succeeded.
17:07:17.654 [11077] <2> vnet_pcache_init_table: [vnet_private.c:235] starting cache size 200 0xc8
17:07:17.657 [11077] <2> vnet_cached_getnameinfo: [vnet_addrinfo.c:1895] found via getnameinfo OUR_HOST=admrst2-bk IPSTR=10.224.16.104
17:07:17.657 [11077] <2> connected_peer: Connection from host admrst2-bk, 10.224.16.104, on non-reserved port 45137
17:07:17.657 [11077] <2> db_valid_master_server: admrst2-bk is not a valid server
17:07:17.658 [11077] <8> vnet_check_vxss_server_magic: [vnet_vxss_helper.c:495] VxSS magic 329199 0x505ef
17:07:17.658 [11077] <8> vnet_check_vxss_server_magic: [vnet_vxss_helper.c:496] remote_vxss 45 0x2d
17:07:17.658 [11077] <8> vnet_check_vxss_server_magic: [vnet_vxss_helper.c:538] Ignoring VxSS authentication 2 0x2
17:07:17.658 [11077] <2> process_request: command C_CLIENT_ID (45) received
17:07:17.658 [11077] <2> process_request: admrst2-bk is 7.5
17:07:17.658 [11077] <2> get_ccname: determine configured name for admrst2-bk
17:07:17.658 [11077] <2> ConnectionCache::connectAndCache: Acquiring new connection for host bksrst1, query type 84
17:07:17.658 [11077] <2> vnet_same_host_and_update: [vnet_addrinfo.c:2876] matched as locals NAME1=bksrst1 NAME2=localhost
17:07:17.658 [11077] <2> vnet_in_resilient_network: [vnet_addrinfo.c:8752] ignoring local host 0 0x0
17:07:17.658 [11077] <2> vnet_sortaddrs: [vnet_addrinfo.c:3945] sorted addrs: 1 0x1
17:07:17.658 [11077] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4776] Local [strong] check, using interface  ANY
17:07:17.659 [11077] <2> async_connect: [vnet_connect.c:1433] connect immediate CONNECT FROM 10.224.16.105.58202 TO 10.224.16.105.13721 fd = 6
17:07:17.659 [11077] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM 0.0.0.0 TO bksrst1 10.224.16.105 bpdbm
17:07:17.659 [11077] <2> logconnections: BPDBM CONNECT FROM 10.224.16.105.58202 TO 10.224.16.105.13721 fd = 6
17:07:17.659 [11077] <8> vnet_check_vxss_client_magic_with_info: [vnet_vxss_helper.c:871] Ignoring VxSS authentication 2 0x2
17:07:17.681 [11077] <2> get_ccname: unable to get configured name: no entity was found (227)
17:07:17.712 [10476] <2> listen_loop: do_schild = 1
17:07:17.712 [10476] <2> childterm: pid=11077 exit=0, signo=0 core=no
17:07:17.712 [10476] <2> schild: wait2() ECHILD
17:07:44.715 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link local 0 0x0
17:08:45.721 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link local 0 0x0
17:09:46.737 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link local 0 0x0
17:10:47.744 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link local 0 0x0
17:11:43.749 [10476] <2> ConnectionCache::connectAndCache: Acquiring new connection for host bksrst1, query type 98
17:11:43.749 [10476] <2> vnet_same_host_and_update: [vnet_addrinfo.c:2876] matched as locals NAME1=bksrst1 NAME2=localhost
17:11:43.749 [10476] <2> vnet_in_resilient_network: [vnet_addrinfo.c:8752] ignoring local host 0 0x0
17:11:43.749 [10476] <2> vnet_sortaddrs: [vnet_addrinfo.c:3945] sorted addrs: 1 0x1
17:11:43.749 [10476] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4776] Local [strong] check, using interface  ANY
17:11:43.750 [10476] <2> async_connect: [vnet_connect.c:1433] connect immediate CONNECT FROM 10.224.16.105.58293 TO 10.224.16.105.13721 fd = 10
17:11:43.750 [10476] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM 0.0.0.0 TO bksrst1 10.224.16.105 bpdbm
17:11:43.750 [10476] <2> logconnections: BPDBM CONNECT FROM 10.224.16.105.58293 TO 10.224.16.105.13721 fd = 10
17:11:43.750 [10476] <8> vnet_check_vxss_client_magic_with_info: [vnet_vxss_helper.c:871] Ignoring VxSS authentication 2 0x2
17:11:43.750 [10476] <2> db_end: Need to collect reply
17:11:43.801 [10476] <2> launch: /usr/openv/netbackup/bin/admincmd/bpstsinfo, pid=11749
17:11:43.805 [10476] <2> listen_loop: do_schild = 1
17:11:43.805 [10476] <2> childterm: pid=11749 exit=0, signo=0 core=no
17:11:43.805 [10476] <2> schild: wait2() ECHILD
17:11:48.806 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link local 0 0x0
17:12:49.813 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link loca

 

 

 

 

 

 

  • The bptm entries is quite normal. 
    bpbrm (parent process) will stop the backup if error is received from client or if a timeout occurs.
    bpbrm will then terminate the child process - bptm.

    You need to deal with the long path names and the bpstart_notify errors.

    Once these have been resolved and the status 49 in still seen, increase bpbkar logging level to see if anything additional is seen on the client.
    Also check in client's syslog for other issues that may indicate a reboot when status 50 was seen.
    Or ask server owners if bpbkar process was killed. Both of these scenarios will not have anything in bpbkar log.

    You may also want to limit the amount of concurrent jobs on this client. Not sure what it is right now, but 4 is normally a good average.

     

19 Replies