cancel
Showing results for 
Search instead for 
Did you mean: 

NDMP Backups failing for NetApp shares, Status 99

jsimpson42
Level 3

My office maintains a backup server (Windows Server 2008R2) and tape library for disaster recovery; we use NetBackup 7.5.0.6 to manage the backups. I inherited this system about a year ago, so my knowledge is limited, though I'm working on it. I work for the government, so all the server information and domain info is changed to generic stuff.

Current issue is NDMP failures. We have multiple shares hosted on NetApp filers. These shares are backed up by policies in NetBackup - I haven't had any issues with these since I took over the system. Last week, however, all of them failed, all with Status 99 messages. There are two netapps (generic-netapp1, generic-netapp2). Both were failing backups.

Today, the backups for generic-netapp1 started working again. The only change I have made was that a week ago, I changed one policy's client from 'generic-netapp1' to 'generic-netapp1.foo.bar' (it still failed, obviously) and today I changed it back. The other, netapp2, still fails, same messages.

'tpautoconf -verify generic-netapp2' fails ("NDMP failed to verify host"). Succeeds on netapp1 (now).

bpclntcmd lookup on generic-netapp2 is good.

I have no idea what is happening, and any help is appreciated. I'm able to browse to the netapp shares from the backup server (bk-srv-1) without issue. I've attached some files from ndmpagent logs, sanitized hopefully.

12 REPLIES 12

PatS729
Level 5

Hi,

It looks like, the changes you made caused the problem (client name change in policy), but it could be even beyond that.

Try following steps :

1>  Confirm tpautoconf -verify filer_ip_address  and tpautoconf -verify filer_name works

2> Add hosts file  shortname and fqdn entries on both Server and Filer side if required.

3> Confirm  forward and reverse lookup (shortname & fqdn ) works from both Server and Filer side.

4> Run bpclncmd -clear_host_cache on Master -Media Server 

5> Retry backup.

If this doesnt work then we need VERBOSE / HIGH DEBUG level logs to investigate.

I think I caused some confusion in my question. To be clear, the issue started BEFORE I changed the client name in the policy. Both generic-netapp1 and generic-netapp2 policies were failing, I changed the client name in one generic-netapp1 policy, policy continued to fail, a week later I changed the client name back to what it had been, and it started working on ALL of the policies, not just the one I had changed. Policies for generic-netapp2 continue to fail.

1> tpautoconf fails for generic-netapp2, succeeds for generic-netapp1. Using it on IP address doesn't work for either. tpautoconf -verify WAS previously failing for generic-netapp1, when it the policies were failing with Status 99.

C:\>tpautoconf -verify generic-netapp2
Connecting to host "generic-netapp2" as user "ndmpuser"...
Waiting for connect notification message...
: Unable to process NDMP message
: host "generic-netapp2" failed
NDMP failed to verify host

C:\>tpautoconf -verify generic-netapp1
Connecting to host "generic-netapp1" as user "ndmpuser"...
Waiting for connect notification message...
Opening session--attempting with NDMP protocol version 4...
Opening session--successful with NDMP protocol version 4
host supports MD5 authentication
Getting MD5 challenge from host...
Logging in using MD5 method...
Host info is:
host name "generic-NETAPP1"
os type "NetApp"
os version "NetApp Release 8.1.1P1 7-Mode"
host id "0151753230"
Login was successful
Host supports LOCAL backup/restore
Host supports 3-way backup/restore

C:\>tpautoconf -verify 192.168.1.220
This NDMP hostname does not exist.
NDMP failed to verify host

C:\>tpautoconf -verify 192.168.1.200
This NDMP hostname does not exist.
NDMP failed to verify host

2> This is a Windows Server 2008R2 system, it's operating via DNS, there should be no need for changes in the hosts file. It's also worth noting that there have never been entires in the host file and it was previously working.

3> Forward and reverse lookup work both directions.

4&5> I did this previously, it unfortunately didn't work.

Additional information: The netapp logs on generic-netapp2 do not show any attempted ndmp connections from the backup server. On generic-netapp1, it does show connections since the policies started working again.

Regarding the VERBOSE / HIGH DEBUG logs, I included the logs from the ndmpagent in my original post. What additional logs would be best?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Forget about NBU.

As long as tpautoconf -verify doesn't work, nothing else is going to work.
Is ndmp up and running on the problematic filer?

NDMP service is running on generic-netapp2, and connections are being monitored. The filer isn't seeing any connection attempts being made.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
So, NBU server is sending connection request but nothing is arriving at netapp2?
Sounds like something wrong at network level.
IP address change, routing problem, DNS issue, firewall?

A network issue was my thought as well, but there's no firewall, port 10000 is open on both filer and server, ping is good, even telnet on port 10000 executes. DNS resolution and routing doesn't seem to have any issues except for NetBackup executing the backup policy.

C:\>bpclntcmd -hn generic-netapp2
host generic-netapp2: generic-netapp2.foo.bar at 192.168.1.220
aliases: generic-netapp2.foo.bar generic-netapp2 192.168.1.220

mnolan
Level 6
Employee Accredited Certified

https://www.veritas.com/support/en_US/article.000024101

You mentioned the verify also failed and the filer is not seeing an incoming connect.

This definitely means that we are not reaching it at port 10000, if the telnet to port 10000 works then either something in between is accepting this connection (firewall) or something else on the filer is accepting the connection (unlikely)

HI,

As Marianne mentioned without tpautoconf -verify works successful for generic-netapp2 .. NDMP backup not going to work. But i see from your posts that the ping and telnet on port 10000 works and there is no firewall which may or dropping incoming connections on port 10000. I feel its NDMP service on filer needs RESTART. Is it possible ? if NDMP service restart doesnt work then we need following logs to investigate

Master / NDMP Host: tpcommand, NDMP and NDMPAGENT

For tpcommand logs

1. create "tpcommand" log directory at path : /usr/openv/volmgr/debug/tpcommand

2. Add "VERBOSE" to /usr/openv/volmgr/vm.conf

For NDMP and NDMPAGENT logs

Run command : /usr/openv/netbackup/bin/vxlogcfg -a -p 51216 -o 134 -s DebugLevel=6 (NDMPAGENT)

/usr/openv/netbackup/bin/vxlogcfg -a -p 51216 -o 151 -s DebugLevel=6 (NDMP)

Once you enabled the logs, try running "tpautoconf -verify" command couple of times and then gather these logs.

/usr/openv/netbackup/bin/vxlogview -p 51216 -o 134 -t 00:20:00 -d all -y > /tmp/134.log

/usr/openv/netbackup/bin/vxlogview -p 51216 -o 151 -t 00:20:00 -d all -y > /tmp/151.log

Also gather log file from /usr/openv/volmgr/debug/tpcommand directory for reveiw.

 

Reset the NDMP service on generic-netapp2, no change unfortunately. I made the changes to debug level and ran tpautoconf -verify a few times on both netapp1 (success, per previous) and netapp2 (failure, still).

I found it odd that there were no entries for netapp2 in either ndmp log.

Logs are attached.

Oopps.. it seems you misssed to upload logs.

Huh, I'd swear I attached them. No matter, I'll try again.

Edit: There they go. Issue appeared to be that it wouldn't upload any of them because the NDMPAGENT_log.txt file is actually just empty. So that one isn't getting uploaded, you'll have to take my word that it doesn't have anything in it. :)

I edited my last reply again to upload a properly sanitized tpcommand log. Sorry about that.