cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to start bprd service, stops after 5 seconds

truwarrior22
Level 3

Server is a MS 2000 Server.  Netbackup 5.1 MP 7 Here's what's in the log file.

 

20:39:25.335 [3304.3344] <2> bprd: INITIATING bprd (VERBOSE = 0): NetBackup 5.1 0 on mm1-100725
20:39:25.335 [3304.3344] <2> bprd: the request timeout value is 300 seconds
20:39:30.367 [3304.3344] <2> logconnections: bpdbm CONNECT FROM 10.42.10.15.4343 TO 10.42.10.15.13721
20:39:30.367 [3304.3344] <2> logconnections: BPDBM CONNECT FROM 10.42.10.15.4343 TO 10.42.10.15.13721
20:39:30.382 [3304.3344] <2> ParseConfigExA: Unknown configuration option on line 50: RenameIfExists = 0
20:39:30.414 [3304.3344] <2> ParseConfigExA: Unknown configuration option on line 50: RenameIfExists = 0
20:39:30.539 [3304.3344] <2> get_long: (1) cannot read (byte 1) from network: An existing connection was forcibly closed by the remote host.
20:39:30.539 [3304.3344] <2> db_getdata: get_string() failed: An existing connection was forcibly closed by the remote host.  (10054) network read error (-3)
20:39:30.539 [3304.3344] <2> db_dbm_alive: unexpected reply: network read failed (42)
20:39:30.539 [3304.3344] <2> bprd: cannot contact database daemon...exiting
20:55:27.210 [3332.1056] <2> bprd: INITIATING bprd (VERBOSE = 0): NetBackup 5.1 0 on mm1-100725
20:55:27.210 [3332.1056] <2> bprd: the request timeout value is 300 seconds
20:55:34.132 [3332.1056] <2> logconnections: bpdbm CONNECT FROM 10.42.10.15.4420 TO 10.42.10.15.13721
20:55:34.132 [3332.1056] <2> logconnections: BPDBM CONNECT FROM 10.42.10.15.4420 TO 10.42.10.15.13721
20:55:34.132 [3332.1056] <2> ParseConfigExA: Unknown configuration option on line 50: RenameIfExists = 0
20:55:34.164 [3332.1056] <2> ParseConfigExA: Unknown configuration option on line 50: RenameIfExists = 0
20:55:34.335 [3332.1056] <2> get_long: (1) cannot read (byte 1) from network: An existing connection was forcibly closed by the remote host.
20:55:34.335 [3332.1056] <2> db_getdata: get_string() failed: An existing connection was forcibly closed by the remote host.  (10054) network read error (-3)
20:55:34.335 [3332.1056] <2> db_dbm_alive: unexpected reply: network read failed (42)
20:55:34.335 [3332.1056] <2> bprd: cannot contact database daemon...exiting

 

 

Also when I try to telnet to bpdbm  13721 I'm losing my connection right about with this in the log:

 

22:21:19.601 [2096.2144] <2> logconnections: getsockname(460) failed: 10038
22:21:19.601 [2096.2144] <32> bpdbm: cannot determine connection host name
22:21:19.601 [2096.2144] <2> bpdbm: request complete: exit status 23 socket read failed

 

Any help would be appreciated!

Message Edited by truwarrior22 on 09-03-2008 08:24 PM
Message Edited by truwarrior22 on 09-03-2008 08:25 PM
19 REPLIES 19

Andy_Welburn
Level 6

Has something changed recently? (I presume it has worked at some point?)

 

Certain parts of the error messages indicate that this could be a hostname resolution issue. Check your DNS/hosts entries. bpclntcmd could be useful here (e.g. bpclntcmd -self, bpclntcmd -hn hostname, bpclntcmd -ip ip address, bpclntcmd -pn), nslookup etc.

 

A few Tech Notes to keep you going (!) :

 

How to troubleshoot backups that fail with status 23 "socket read failed".

 

GENERAL ERROR: Windows Remote Administration console fails to connect, with the error " does not hav...

 

DOCUMENTATION: Explanation of bpclntcmd options, the system calls being used, and recommended troubl...

 

How to verify name resolution for Veritas NetBackup (tm) systems, using the "bpclntcmd" command

 

They may not relate directly to your scenario, but may give you some avenues to check out.

 

sdo
Moderator
Moderator
Partner    VIP    Certified

Classic gotchas are multiple DNS entries either/both in name lookup and reverse IP lookup.

 

Always enter your nslookup commands twice to see if DNS rotates around to another instance.  I mean, you may enter "nslookup $client" and see the correct address, but enter "nslookup $client" again immediately to see if there is a duplicate name.  Same goes for duplicate reverse entries, i.e. do "nslookup $ip" twice in succession to see if you have duplicates.

 

Any duplicates, either way, will give NetBackup a headache.

truwarrior22
Level 3

The DNS servers have correct forward and reverse lookup entries and the Netbackup server is reference the correct DNS server.  The server was previously useing a hostfile for it's own name lookup.  I commented out the host file and the issue still occurs. I ran the nslookup commands multiple time by name and ip and the results are correct.

 

The only things that have changed is that someone ran a utility called ccleaner to remove temp files from the server.  Also right before the issue occured I noticed that one of the tape drives where down so I rebooted the server.  Ever since the reboot and the ccleaner, I'm unable to start the bprd service and the console won't open because of it.  I tried uninstalling McAfee 8.0i antivirus and the issue is still occuring.

 

I'll attempt the further sugguestions above but if anyone has any further sugguestions please forward my way!  Thank you!

Message Edited by truwarrior22 on 09-04-2008 07:02 AM

Andy_Welburn
Level 6

Did you get expected results from the bpclntcmd?

e.g.

 

bpclntcmd -self

bpclntcmd -hn <hostname>

bpclntcmd -ip <ip address>

bpclntcmd -pn

 

Can do this in any 'direction' to test connectivity/name resolution (Master->Client, Master->Master, Client->Client, Client-Master)

truwarrior22
Level 3

Ran the bpclntcmd command above and everything looks ok.

 

E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -self
gethostname() returned: mm1-100725
host mm1-100725: MM1-100725.mydomain.local at 10.42.10.15 (0xf0a2a0a)
checkhname: aliases:

E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -hn mm1-100725
host mm1-100725: MM1-100725.mydomain.local at 10.42.10.15 (0xf0a2a0a)
checkhname: aliases:
^C
E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -ip 10.42.10.15
checkhaddr: host   : mm1-100725: mm1-100725.mydomain.local at 10.42.10.1
5 (0xf0a2a0a)
checkhaddr: aliases:

E:\Program Files\VERITAS\NetBackup\bin>

truwarrior22
Level 3

 bpclntcmd -pn didn't seem to return anything.  I tried bpclntcmd which seemed to failed but I'm not sure what vxss is...?

 

E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -pn
^C
E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -check_vxss
cannot connect on socket

E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -check_vxss_with_host 10.42.10.
15
cannot connect on socket

 

Thanks for the help so far!

truwarrior22
Level 3
I'm think that the bpclntcmd -pn doesn't return anything because the bprd service will not stay running.

Andy_Welburn
Level 6

@truwarrior22 wrote:
I'm think that the bpclntcmd -pn doesn't return anything because the bprd service will not stay running.

True!!

 

No longer seems that it is a hostname resolution problem then.

 

From one of the Tech Notes earlier I presume the entry is correct for bprd?

"...

 - Ensure the entry in the services file (in C:\WINDOWS\system32\drivers\etc) for the bprd service is correct - if this is missing/commented out or the port number is incorrect, then this problem can occur.   (i.e:  bprd   13720/tcp    bprd)

..."

truwarrior22
Level 3

Here's the contents of the services file for the netbackup services:

 

bpcd  13782/tcp
bprd  13720/tcp
vopied  13783/tcp
vnetd  13724/tcp
bpdbm  13721/tcp
vmd  13701/tcp
acsd  13702/tcp
tl8cd  13705/tcp
odld  13706/tcp
ts8d  13709/tcp
tldcd  13711/tcp
tl4d  13713/tcp
tsdd  13714/tcp
tshd  13715/tcp
tlmd  13716/tcp
tlhcd  13717/tcp
lmfcd  13718/tcp
bpjobd  13723/tcp
rsmd  13719/tcp
nbdbd  13784/tcp
visd  9284/tcp

 

Does it require an aliases like you formatted below? I would think not since it's the same name. 

 

sdo
Moderator
Moderator
Partner    VIP    Certified

Before you do anything else - do you know which media your catalog backup is on?  Write protect that media!!! 

 

 

Did ccleaner leave a log file of what was deleted?  Or even just some totals?  What action was ccleaner configured to take?  What's its default behaviour?  Did it actually delete files, or just move them to a holding area before deletion, i.e. can you put the fles back?  Any files in the recycler?

 

 

I think the tool may have removed something that one or more NetBackup services require.

 

You could try uninstalling whatever was the latest maintenance pack, and re-installing it - but that might fail and leave you in worse state if files needed for the uninstall have been removed too.

 

Or just try installing the lastest update patch kit over the top of what you have right now. Your EMM and catalog data should be ok, but you'll loose any custom "notify" scripts, so save a copy of these beforehand.

sdo
Moderator
Moderator
Partner    VIP    Certified

Thinking about it, it is possible that ccleaner has removed some of your catalog meta data, and without a log file you'll never know.  When was ccleaner run?  Did you have a successful catalog backup before ccleaner was run?

 

Got a DR plan?

 

 

If you're on v6.x and above - then do you have your DR file for your last successful catalog backups stored safely off-host, or were these deleted too?

truwarrior22
Level 3

The catalogs backup to disk and to tape. I don't have a log of what the ccleaner removed since another person ran it and it doesn't save a log.  Where is the catalog meta data stored so I can check if something might be missing?

 

While the issue was already occuring, I upgraded to 5.1 MP7 from 5.1 MP2.  Same issue resulted but it appears that the upgrade to 5.1 MP7 was successful.

 

Thank you.

Message Edited by truwarrior22 on 09-04-2008 10:38 AM
Message Edited by truwarrior22 on 09-04-2008 10:44 AM

truwarrior22
Level 3
Any idea how I can atleast get the Netbackup in a state to where I can open the console and perform backup even if I might loose some previous backup information if that's what's causing the issue?

truwarrior22
Level 3

Performed some more testing after importing the server into a VMware Virtual Machine for testing...  If I perform a repair or modify, the service will still not run for longer then a few seconds. 

 

If I completely uninstall and reinstall Netbackup the service works fine, but of courses, I loose all my settings.

 

Any futher ideas?  No sugguestion will be turned down, I can just revert to a VM snapshot if it makes it worse. I'm getting deseprate here! lol

truwarrior22
Level 3

Also can't seem to ruin a bpverify or the verify_images under the goodies directory, returns:

verify not allowed: network read failed (42)

 

Wonder if it's not working because the I can't start the that NetBackup Request Manager or if it's something else.

Jussi_Riipinen
Level 2

@truwarrior22 wrote:

 Also right before the issue occured I noticed that one of the tape drives where down so I rebooted the server.  Ever since the reboot and the ccleaner, I'm unable to start the bprd service and the console won't open because of it.  I tried uninstalling McAfee 8.0i antivirus and the issue is still occuring.

One tape drive was down, see if OS can see it. If not, try to remove that missing drive from NetBackup configuration from command line. Then see if NB starts up.

sdo
Moderator
Moderator
Partner    VIP    Certified

A tape drive being down won't stop "bprd" which is the Request Manager Daemon.

 

We now that bprd won't stay up, but there's no indication yet of why.  It could be seriously fundamental or just a simple fix required.  I'd call Symentec Support if you don't want to un-install and re-install and then perform a catalog recovery (which BTW will recover the entire catalog, EMM dtabase, policies, and device configuration) - and see if they can help you find out what is missing or corrupted.

 

Have you tried looking in latest log file in:

   \NetBackup\logs\bprd\*.log

...to see if bprd is even logging why it is failing. 

 

 

 

If you do decide to un-install and re-install, then you will need to re-install the master server software to the patch level/version that ran the catalog backup.  So, if your catalog backup media was written with v5.1 MP2, then don't go patching to MP7.

 

Before installing, you need to aware that you will loose any site specific modifications, so... things to save beforehand:

1) Save a copy of your vm.conf from volmgr.

2) Try saving your master server "bp.conf" settings from the Windows registry with:

    bpgetconfig -M $master > old.lis               (but if this requires bprd, you won't be able to).

3) Save a copy of any "notify" scripts that you may have customized in:

    \NetBackup\bin

4) Look for any SIZE_DATA_BUFFERS*, NET_BUFFER_SZ*, NUMBER_DATA_BUFFERS* files - I use a "*" here to indicate that you may also find the "restore" counterparts to these tuning files.

5) Save these files:

     \Java\Debug.Properties

     \Java\auth.conf

     \NetBackup\db\altnames\*.*      (i.e. all files in this folder)

6) Also, any other fully CAPITILIZED files without a file name extension in:

     \NetBackup

     \NetBackup\db

     \NetBackup\db\config

     \NetBackup\bin

     \Volmgr

     \Volmgr\config

...as these may be other touch files that your require.

7) Try to save your cold catalog information (if you need to):

    bpsyncinfo -L > old-bpsyncinfo.lis

8) Try to save some of the basic config details:

   bpconfig -L > old-bpconfig.lis

 

 

If you do go ahead with a catalog recovery... when done, I suggest:

1) Carefully check the recovery wizard GUI screen log to be sure that it actually has worked.

2) De-activate all policies immediately after the successful recovery (if the GUI will let you).

3) Compare and amend the new vm.conf with your old copy.

4) Take another listing of the master server settings with:

    bpgetconfig -M $master > new.lis

..and compare old.lis to new.lis and see what else you need.

 

 

HTH.

Omar_Villa
Level 6
Employee
bprd is the main daemon, if doesnt came up I will run stop all and restart, if still goes down, run with Symantec.

truwarrior
Not applicable

After uninstalling a Kaseya agent, it resolved the issue.  It must conflict with one of the ports Netbackup uses...  Thanks for the help everyone!  Now I just have to fix the SL500 with 1 of the 2 drive in a downed state :\

 

Thanks again!