09-03-2008 08:23 PM
Server is a MS 2000 Server. Netbackup 5.1 MP 7 Here's what's in the log file.
20:39:25.335 [3304.3344] <2> bprd: INITIATING bprd (VERBOSE = 0): NetBackup 5.1 0 on mm1-100725
20:39:25.335 [3304.3344] <2> bprd: the request timeout value is 300 seconds
20:39:30.367 [3304.3344] <2> logconnections: bpdbm CONNECT FROM 10.42.10.15.4343 TO 10.42.10.15.13721
20:39:30.367 [3304.3344] <2> logconnections: BPDBM CONNECT FROM 10.42.10.15.4343 TO 10.42.10.15.13721
20:39:30.382 [3304.3344] <2> ParseConfigExA: Unknown configuration option on line 50: RenameIfExists = 0
20:39:30.414 [3304.3344] <2> ParseConfigExA: Unknown configuration option on line 50: RenameIfExists = 0
20:39:30.539 [3304.3344] <2> get_long: (1) cannot read (byte 1) from network: An existing connection was forcibly closed by the remote host.
20:39:30.539 [3304.3344] <2> db_getdata: get_string() failed: An existing connection was forcibly closed by the remote host. (10054) network read error (-3)
20:39:30.539 [3304.3344] <2> db_dbm_alive: unexpected reply: network read failed (42)
20:39:30.539 [3304.3344] <2> bprd: cannot contact database daemon...exiting
20:55:27.210 [3332.1056] <2> bprd: INITIATING bprd (VERBOSE = 0): NetBackup 5.1 0 on mm1-100725
20:55:27.210 [3332.1056] <2> bprd: the request timeout value is 300 seconds
20:55:34.132 [3332.1056] <2> logconnections: bpdbm CONNECT FROM 10.42.10.15.4420 TO 10.42.10.15.13721
20:55:34.132 [3332.1056] <2> logconnections: BPDBM CONNECT FROM 10.42.10.15.4420 TO 10.42.10.15.13721
20:55:34.132 [3332.1056] <2> ParseConfigExA: Unknown configuration option on line 50: RenameIfExists = 0
20:55:34.164 [3332.1056] <2> ParseConfigExA: Unknown configuration option on line 50: RenameIfExists = 0
20:55:34.335 [3332.1056] <2> get_long: (1) cannot read (byte 1) from network: An existing connection was forcibly closed by the remote host.
20:55:34.335 [3332.1056] <2> db_getdata: get_string() failed: An existing connection was forcibly closed by the remote host. (10054) network read error (-3)
20:55:34.335 [3332.1056] <2> db_dbm_alive: unexpected reply: network read failed (42)
20:55:34.335 [3332.1056] <2> bprd: cannot contact database daemon...exiting
Also when I try to telnet to bpdbm 13721 I'm losing my connection right about with this in the log:
22:21:19.601 [2096.2144] <2> logconnections: getsockname(460) failed: 10038
22:21:19.601 [2096.2144] <32> bpdbm: cannot determine connection host name
22:21:19.601 [2096.2144] <2> bpdbm: request complete: exit status 23 socket read failed
Any help would be appreciated!
09-04-2008 12:43 AM
Has something changed recently? (I presume it has worked at some point?)
Certain parts of the error messages indicate that this could be a hostname resolution issue. Check your DNS/hosts entries. bpclntcmd could be useful here (e.g. bpclntcmd -self, bpclntcmd -hn hostname, bpclntcmd -ip ip address, bpclntcmd -pn), nslookup etc.
A few Tech Notes to keep you going (!) :
How to troubleshoot backups that fail with status 23 "socket read failed".
GENERAL ERROR: Windows Remote Administration console fails to connect, with the error " does not hav...
DOCUMENTATION: Explanation of bpclntcmd options, the system calls being used, and recommended troubl...
How to verify name resolution for Veritas NetBackup (tm) systems, using the "bpclntcmd" command
They may not relate directly to your scenario, but may give you some avenues to check out.
09-04-2008 02:45 AM
Classic gotchas are multiple DNS entries either/both in name lookup and reverse IP lookup.
Always enter your nslookup commands twice to see if DNS rotates around to another instance. I mean, you may enter "nslookup $client" and see the correct address, but enter "nslookup $client" again immediately to see if there is a duplicate name. Same goes for duplicate reverse entries, i.e. do "nslookup $ip" twice in succession to see if you have duplicates.
Any duplicates, either way, will give NetBackup a headache.
09-04-2008 07:00 AM
The DNS servers have correct forward and reverse lookup entries and the Netbackup server is reference the correct DNS server. The server was previously useing a hostfile for it's own name lookup. I commented out the host file and the issue still occurs. I ran the nslookup commands multiple time by name and ip and the results are correct.
The only things that have changed is that someone ran a utility called ccleaner to remove temp files from the server. Also right before the issue occured I noticed that one of the tape drives where down so I rebooted the server. Ever since the reboot and the ccleaner, I'm unable to start the bprd service and the console won't open because of it. I tried uninstalling McAfee 8.0i antivirus and the issue is still occuring.
I'll attempt the further sugguestions above but if anyone has any further sugguestions please forward my way! Thank you!
09-04-2008 07:27 AM
Did you get expected results from the bpclntcmd?
bpclntcmd -hn <hostname>
bpclntcmd -ip <ip address>
Can do this in any 'direction' to test connectivity/name resolution (Master->Client, Master->Master, Client->Client, Client-Master)
09-04-2008 07:30 AM
Ran the bpclntcmd command above and everything looks ok.
E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -self
gethostname() returned: mm1-100725
host mm1-100725: MM1-100725.mydomain.local at 10.42.10.15 (0xf0a2a0a)
E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -hn mm1-100725
host mm1-100725: MM1-100725.mydomain.local at 10.42.10.15 (0xf0a2a0a)
E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -ip 10.42.10.15
checkhaddr: host : mm1-100725: mm1-100725.mydomain.local at 10.42.10.1
09-04-2008 07:36 AM
bpclntcmd -pn didn't seem to return anything. I tried bpclntcmd which seemed to failed but I'm not sure what vxss is...?
E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -pn
E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -check_vxss
cannot connect on socket
E:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -check_vxss_with_host 10.42.10.
cannot connect on socket
Thanks for the help so far!
09-04-2008 07:42 AM
09-04-2008 08:05 AM
I'm think that the bpclntcmd -pn doesn't return anything because the bprd service will not stay running.
No longer seems that it is a hostname resolution problem then.
From one of the Tech Notes earlier I presume the entry is correct for bprd?
- Ensure the entry in the services file (in C:\WINDOWS\system32\drivers\etc) for the bprd service is correct - if this is missing/commented out or the port number is incorrect, then this problem can occur. (i.e: bprd 13720/tcp bprd)
09-04-2008 08:31 AM
Here's the contents of the services file for the netbackup services:
Does it require an aliases like you formatted below? I would think not since it's the same name.
09-04-2008 08:51 AM
Before you do anything else - do you know which media your catalog backup is on? Write protect that media!!!
Did ccleaner leave a log file of what was deleted? Or even just some totals? What action was ccleaner configured to take? What's its default behaviour? Did it actually delete files, or just move them to a holding area before deletion, i.e. can you put the fles back? Any files in the recycler?
I think the tool may have removed something that one or more NetBackup services require.
You could try uninstalling whatever was the latest maintenance pack, and re-installing it - but that might fail and leave you in worse state if files needed for the uninstall have been removed too.
Or just try installing the lastest update patch kit over the top of what you have right now. Your EMM and catalog data should be ok, but you'll loose any custom "notify" scripts, so save a copy of these beforehand.
09-04-2008 08:58 AM
Thinking about it, it is possible that ccleaner has removed some of your catalog meta data, and without a log file you'll never know. When was ccleaner run? Did you have a successful catalog backup before ccleaner was run?
Got a DR plan?
If you're on v6.x and above - then do you have your DR file for your last successful catalog backups stored safely off-host, or were these deleted too?
09-04-2008 10:37 AM
The catalogs backup to disk and to tape. I don't have a log of what the ccleaner removed since another person ran it and it doesn't save a log. Where is the catalog meta data stored so I can check if something might be missing?
While the issue was already occuring, I upgraded to 5.1 MP7 from 5.1 MP2. Same issue resulted but it appears that the upgrade to 5.1 MP7 was successful.
09-04-2008 02:03 PM
09-04-2008 10:37 PM
Performed some more testing after importing the server into a VMware Virtual Machine for testing... If I perform a repair or modify, the service will still not run for longer then a few seconds.
If I completely uninstall and reinstall Netbackup the service works fine, but of courses, I loose all my settings.
Any futher ideas? No sugguestion will be turned down, I can just revert to a VM snapshot if it makes it worse. I'm getting deseprate here! lol
09-05-2008 12:53 AM
Also can't seem to ruin a bpverify or the verify_images under the goodies directory, returns:
verify not allowed: network read failed (42)
Wonder if it's not working because the I can't start the that NetBackup Request Manager or if it's something else.
09-05-2008 02:06 AM
@truwarrior22 wrote:One tape drive was down, see if OS can see it. If not, try to remove that missing drive from NetBackup configuration from command line. Then see if NB starts up.
Also right before the issue occured I noticed that one of the tape drives where down so I rebooted the server. Ever since the reboot and the ccleaner, I'm unable to start the bprd service and the console won't open because of it. I tried uninstalling McAfee 8.0i antivirus and the issue is still occuring.
09-05-2008 08:40 AM
A tape drive being down won't stop "bprd" which is the Request Manager Daemon.
We now that bprd won't stay up, but there's no indication yet of why. It could be seriously fundamental or just a simple fix required. I'd call Symentec Support if you don't want to un-install and re-install and then perform a catalog recovery (which BTW will recover the entire catalog, EMM dtabase, policies, and device configuration) - and see if they can help you find out what is missing or corrupted.
Have you tried looking in latest log file in:
...to see if bprd is even logging why it is failing.
If you do decide to un-install and re-install, then you will need to re-install the master server software to the patch level/version that ran the catalog backup. So, if your catalog backup media was written with v5.1 MP2, then don't go patching to MP7.
Before installing, you need to aware that you will loose any site specific modifications, so... things to save beforehand:
1) Save a copy of your vm.conf from volmgr.
2) Try saving your master server "bp.conf" settings from the Windows registry with:
bpgetconfig -M $master > old.lis (but if this requires bprd, you won't be able to).
3) Save a copy of any "notify" scripts that you may have customized in:
4) Look for any SIZE_DATA_BUFFERS*, NET_BUFFER_SZ*, NUMBER_DATA_BUFFERS* files - I use a "*" here to indicate that you may also find the "restore" counterparts to these tuning files.
5) Save these files:
\NetBackup\db\altnames\*.* (i.e. all files in this folder)
6) Also, any other fully CAPITILIZED files without a file name extension in:
...as these may be other touch files that your require.
7) Try to save your cold catalog information (if you need to):
bpsyncinfo -L > old-bpsyncinfo.lis
8) Try to save some of the basic config details:
bpconfig -L > old-bpconfig.lis
If you do go ahead with a catalog recovery... when done, I suggest:
1) Carefully check the recovery wizard GUI screen log to be sure that it actually has worked.
2) De-activate all policies immediately after the successful recovery (if the GUI will let you).
3) Compare and amend the new vm.conf with your old copy.
4) Take another listing of the master server settings with:
bpgetconfig -M $master > new.lis
..and compare old.lis to new.lis and see what else you need.
09-05-2008 03:35 PM
09-06-2008 03:53 PM
After uninstalling a Kaseya agent, it resolved the issue. It must conflict with one of the ports Netbackup uses... Thanks for the help everyone! Now I just have to fix the SL500 with 1 of the 2 drive in a downed state :\