few streams failed with status code 49 client did not start 12/02/2014 09:58:23 - end writing
Below is the logs from the detailed status of job
12/02/2014 08:34:51 - Info nbjm (pid=5922) starting backup job (jobid=20968) for client ossbkupIP, policy OSS_i386_DATA_ossbkupIP_syb1bkupIP, schedule Daily_Incr
12/02/2014 08:34:51 - Info nbjm (pid=5922) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=20968, request id:{68FF7CD4-79C3-11E4-97E9-0017A477FC20})
12/02/2014 08:34:51 - requesting resource bksrst1-hcart-robot-tld-0
12/02/2014 08:34:51 - requesting resource bksrst1.NBU_CLIENT.MAXJOBS.ossbkupIP
12/02/2014 08:34:51 - requesting resource bksrst1.NBU_POLICY.MAXJOBS.OSS_i386_DATA_ossbkupIP_syb1bkupIP
12/02/2014 08:34:51 - awaiting resource bksrst1-hcart-robot-tld-0. No drives are available.
12/02/2014 09:12:32 - Info nbrb (pid=5900) Limit has been reached for the logical resource bksrst1.NBU_CLIENT.MAXJOBS.ossbkupIP
12/02/2014 09:58:08 - granted resource bksrst1.NBU_CLIENT.MAXJOBS.ossbkupIP
12/02/2014 09:58:08 - granted resource bksrst1.NBU_POLICY.MAXJOBS.OSS_i386_DATA_ossbkupIP_syb1bkupIP
12/02/2014 09:58:08 - granted resource 1128L4
12/02/2014 09:58:08 - granted resource HP.ULTRIUM4-SCSI.003
12/02/2014 09:58:08 - granted resource bksrst1-hcart-robot-tld-0
12/02/2014 09:58:08 - estimated 0 kbytes needed
12/02/2014 09:58:08 - Info nbjm (pid=5922) started backup (backupid=ossbkupIP_1417489088) job for client ossbkupIP, policy OSS_i386_DATA_ossbkupIP_syb1bkupIP, schedule Daily_Incr on storage unit bksrst1-hcart-robot-tld-0
12/02/2014 09:58:08 - started process bpbrm (pid=24154)
12/02/2014 09:58:09 - Info bpbrm (pid=24154) ossbkupIP is the host to backup data from
12/02/2014 09:58:09 - Info bpbrm (pid=24154) reading file list from client
12/02/2014 09:58:09 - Info bpbrm (pid=24154) starting bpbkar on client
12/02/2014 09:58:09 - Info bpbkar (pid=0) Starting bpstart_notify script
12/02/2014 09:58:09 - connecting
12/02/2014 09:58:09 - connected; connect time: 0:00:00
12/02/2014 09:58:23 - Error bpbrm (pid=24154) client ossbkupIP aborted
12/02/2014 09:58:23 - Info bpbkar (pid=0) done. status: 49: client did not start
12/02/2014 09:58:23 - end writing
client did not start (49)
I figured out that below two commands reporting different host names
admrst1{root} # ./bpclntcmd -pn
expecting response from server bksrst1
admrst1-bk *NULL* 10.224.16.101 25614
admrst1{root} # ./bpclntcmd -self
current domain = rstom.net
NIS does not seem to be running: (10) can't communicate with ypbind
gethostname() returned: ossbkupIP
host ossbkupIP: ossbkupIP at 10.224.16.102
aliases: ossbkupIP 10.224.16.102
getfqdn: Error 0
admrst1{root} #
admrst1-bk and ossbkupIP are two IPs in backup network but we have created policy using ossbkupIP
I am thinking that bpclntcmd is showing admrst1-bck is wrong
How to correct this ?
Please find logs from bprd while running the above command
7:06:45.405 [10601] <2> process_request: EXIT STATUS 0
17:06:45.444 [10476] <2> listen_loop: do_schild = 1
17:06:45.444 [10476] <2> childterm: pid=10601 exit=0, signo=0 core=no
17:06:45.444 [10476] <2> schild: wait2() ECHILD
17:06:58.475 [10476] <2> listen_loop: initial schedule event
17:06:58.516 [10476] <2> launch: /usr/openv/netbackup/bin/admincmd/bpstsinfo, pid=10848
17:06:58.517 [10476] <2> listen_loop: do_schild = 1
17:06:58.517 [10476] <2> childterm: pid=10848 exit=0, signo=0 core=no
17:06:58.517 [10476] <2> schild: wait2() ECHILD
17:07:17.647 [10476] <2> vnet_pbxAcceptSocket: Accepted sock[10] from 10.224.16.104:45137
17:07:17.648 [10476] <2> bprd: socket fd from accept() is 10
17:07:17.649 [10476] <2> listen_loop: request complete
17:07:17.654 [11077] <2> logconnections: BPRD ACCEPT FROM 10.224.16.104.45137 TO 10.224.16.105.1556 fd = 10
17:07:17.654 [11077] <2> process_request: setsockopt SO_LINGER on 10 succeeded.
17:07:17.654 [11077] <2> vnet_pcache_init_table: [vnet_private.c:235] starting cache size 200 0xc8
17:07:17.657 [11077] <2> vnet_cached_getnameinfo: [vnet_addrinfo.c:1895] found via getnameinfo OUR_HOST=admrst2-bk IPSTR=10.224.16.104
17:07:17.657 [11077] <2> connected_peer: Connection from host admrst2-bk, 10.224.16.104, on non-reserved port 45137
17:07:17.657 [11077] <2> db_valid_master_server: admrst2-bk is not a valid server
17:07:17.658 [11077] <8> vnet_check_vxss_server_magic: [vnet_vxss_helper.c:495] VxSS magic 329199 0x505ef
17:07:17.658 [11077] <8> vnet_check_vxss_server_magic: [vnet_vxss_helper.c:496] remote_vxss 45 0x2d
17:07:17.658 [11077] <8> vnet_check_vxss_server_magic: [vnet_vxss_helper.c:538] Ignoring VxSS authentication 2 0x2
17:07:17.658 [11077] <2> process_request: command C_CLIENT_ID (45) received
17:07:17.658 [11077] <2> process_request: admrst2-bk is 7.5
17:07:17.658 [11077] <2> get_ccname: determine configured name for admrst2-bk
17:07:17.658 [11077] <2> ConnectionCache::connectAndCache: Acquiring new connection for host bksrst1, query type 84
17:07:17.658 [11077] <2> vnet_same_host_and_update: [vnet_addrinfo.c:2876] matched as locals NAME1=bksrst1 NAME2=localhost
17:07:17.658 [11077] <2> vnet_in_resilient_network: [vnet_addrinfo.c:8752] ignoring local host 0 0x0
17:07:17.658 [11077] <2> vnet_sortaddrs: [vnet_addrinfo.c:3945] sorted addrs: 1 0x1
17:07:17.658 [11077] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4776] Local [strong] check, using interface ANY
17:07:17.659 [11077] <2> async_connect: [vnet_connect.c:1433] connect immediate CONNECT FROM 10.224.16.105.58202 TO 10.224.16.105.13721 fd = 6
17:07:17.659 [11077] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM 0.0.0.0 TO bksrst1 10.224.16.105 bpdbm
17:07:17.659 [11077] <2> logconnections: BPDBM CONNECT FROM 10.224.16.105.58202 TO 10.224.16.105.13721 fd = 6
17:07:17.659 [11077] <8> vnet_check_vxss_client_magic_with_info: [vnet_vxss_helper.c:871] Ignoring VxSS authentication 2 0x2
17:07:17.681 [11077] <2> get_ccname: unable to get configured name: no entity was found (227)
17:07:17.712 [10476] <2> listen_loop: do_schild = 1
17:07:17.712 [10476] <2> childterm: pid=11077 exit=0, signo=0 core=no
17:07:17.712 [10476] <2> schild: wait2() ECHILD
17:07:44.715 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link local 0 0x0
17:08:45.721 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link local 0 0x0
17:09:46.737 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link local 0 0x0
17:10:47.744 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link local 0 0x0
17:11:43.749 [10476] <2> ConnectionCache::connectAndCache: Acquiring new connection for host bksrst1, query type 98
17:11:43.749 [10476] <2> vnet_same_host_and_update: [vnet_addrinfo.c:2876] matched as locals NAME1=bksrst1 NAME2=localhost
17:11:43.749 [10476] <2> vnet_in_resilient_network: [vnet_addrinfo.c:8752] ignoring local host 0 0x0
17:11:43.749 [10476] <2> vnet_sortaddrs: [vnet_addrinfo.c:3945] sorted addrs: 1 0x1
17:11:43.749 [10476] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4776] Local [strong] check, using interface ANY
17:11:43.750 [10476] <2> async_connect: [vnet_connect.c:1433] connect immediate CONNECT FROM 10.224.16.105.58293 TO 10.224.16.105.13721 fd = 10
17:11:43.750 [10476] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM 0.0.0.0 TO bksrst1 10.224.16.105 bpdbm
17:11:43.750 [10476] <2> logconnections: BPDBM CONNECT FROM 10.224.16.105.58293 TO 10.224.16.105.13721 fd = 10
17:11:43.750 [10476] <8> vnet_check_vxss_client_magic_with_info: [vnet_vxss_helper.c:871] Ignoring VxSS authentication 2 0x2
17:11:43.750 [10476] <2> db_end: Need to collect reply
17:11:43.801 [10476] <2> launch: /usr/openv/netbackup/bin/admincmd/bpstsinfo, pid=11749
17:11:43.805 [10476] <2> listen_loop: do_schild = 1
17:11:43.805 [10476] <2> childterm: pid=11749 exit=0, signo=0 core=no
17:11:43.805 [10476] <2> schild: wait2() ECHILD
17:11:48.806 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link local 0 0x0
17:12:49.813 [10476] <2> insert_if_not_dup: [vnet_addrinfo.c:5705] ignoring IPv6 link loca
The bptm entries is quite normal.
bpbrm (parent process) will stop the backup if error is received from client or if a timeout occurs.
bpbrm will then terminate the child process - bptm.You need to deal with the long path names and the bpstart_notify errors.
Once these have been resolved and the status 49 in still seen, increase bpbkar logging level to see if anything additional is seen on the client.
Also check in client's syslog for other issues that may indicate a reboot when status 50 was seen.
Or ask server owners if bpbkar process was killed. Both of these scenarios will not have anything in bpbkar log.You may also want to limit the amount of concurrent jobs on this client. Not sure what it is right now, but 4 is normally a good average.