cancel
Showing results for 
Search instead for 
Did you mean: 

fatal error on device 5 (ndmp_connect_open_and_auth), NDMP attribute lookup failed - verify attributes with set_ndmp_attr, DOWN'ing it

rclegarto
Level 4
Certified

Anyone has an idea on how to resolve this issue? We had re-executed the set_ndmp_attr -auth command several times and restarted NBU services and also reboot Master server but still the drives keeps down'ing. Here are the result of the -verify and -probe option. Hope you guys could help me with this. Its been a week of failed backup for us already :(


D:\Program Files\VERITAS\Volmgr\bin>set_ndmp_attr -probe sm3apnas05.ap.lilly.com

Host "sm3apnas05.ap.lilly.com" tape device model "ULTRIUM-TD2 ":
Device "c64t0l0" attributes=(0x4) RAW
COMPRESSION=1
SERIAL_NUMBER=1110143331
Host "sm3apnas05.ap.lilly.com" tape device model "ULTRIUM-TD2 ":
Device "c96t0l0" attributes=(0x4) RAW
COMPRESSION=1
SERIAL_NUMBER=9110327583



D:\Program Files\VERITAS\Volmgr\bin>set_ndmp_attr -verify
Verify Host name: sm3apnas05.ap.lilly.com
Connecting to host "sm3apnas05.ap.lilly.com" as user "ndmp"...
Waiting for connect notification message...
Opening session--attempting with NDMP protocol version 4...
Opening session--successful with NDMP protocol version 4
host supports TEXT authentication
host supports MD5 authentication
Getting MD5 challenge from host...
Logging in using MD5 method...
Host info is:
host name "server_2"
os type "DartOS"
os version "EMC Celerra File Server.T.5.5.33.2"
host id "abc1997"
Login was successful
Host supports LOCAL backup/restore
Host supports 3-way backup/restore
8 REPLIES 8

schmaustech
Level 5
I can only speak from a Netapp experience, but when we saw this issue, the problem was that the Netapp no longer saw the tape device from the filer standpoint.  I am not sure the commands to see the tape devices on an EMC, but I would start there.  Verify that all the tape devices are visible on the EMC side.

Regards,

Benjamin Schmaus

rclegarto
Level 4
Certified
Thanks for the tip Schmaustech but this command here, I think, displays the tape device the filer is seeing.

Device "c64t0l0" attributes=(0x4) RAW
COMPRESSION=1
SERIAL_NUMBER=1110143331
Host "sm3apnas05.ap.lilly.com" tape device model "ULTRIUM-TD2 ":
Device "c96t0l0" attributes=(0x4) RAW
COMPRESSION=1
SERIAL_NUMBER=9110327583


The 2nd drive is actually newly replaced, and this command was able to display the correct serial number of the new device installed.

BTW, forgot to mention that my system is running on Windows Server 2003 platform. NBU version is 5.1 MP6

Andy_Welburn
Level 6
Can't see this on your output ....

***Edit***

The only reason I say this is that I was under the impression, back in the days when we used 5.1, that to utilise NDMP tape drives the robot control host had to be the same host as that for the NDMP tape drives. I'm sure I'll be corrected if that's not the case! :p

John_Stockard
Level 5
Partner Certified
First off, I'd highly recommend that you upgrade to NBU 6.0 or NBU 6.5.  NBU 5.1 is now an end-of-support product (as of March 31, 2008).  Symantec will not issue any more patches for it and they will not offer any phone tech support for it.

Do the tape drives go down when you try to use them for a backup job, or do they go down even during idle periods (when there are no backups running)? 

If they only go down when you try to run a backup to them, it's possible that they could be mapped incorrectly in NBU.  In other words, NBU thinks that device c64t0l0 is drive 1 in the library, when in reality it's drive 2 in the library.  When NBU tells the library to mount a tape in drive 1, the library happily mounts a tape in what it thinks is drive 1, but then NBU never sees a tape show up in device c64t0l0 (because the tape actually ended up in device c96t0l0).  NBU will then down device c64t0l0, thinking that it's malfunctioning.

Are the tape drives which are attached to the Celerra also zoned to other hosts in the SAN, or is the Celerra the only device zoned to see the tape drives?  These tape drives need to be zoned to that only the Celerra data mover has access to them.  You cannot share SAN tape drives between an NDMP host and other hosts in NBU 5.1.  (This feature was added in NBU 6.0 and NBU 6.5, though.)

Have these tape drives ever worked?  If so, what changed around the time when they started going down?

When you defined the NDMP tape drives in the NBU Admin Console, did you use the fully-qualified hostname of the Celerra (sm3apnas05.ap.lilly.com) or did you use the short name of the Celerra (sm3apnas05)?  NetBackup is picky about this when it comes to NDMP devices and NDMP backups.  If you defined the NDMP tape drives with only the short host name, try changing it to use the fully-qualified host name so that it exactly matches the information that NBU has in it's NDMP authorization database.

You might also want to upgrade to NetBackup 5.1 MP7 (if you don't want to upgrade to NBU 6.0 or NBU 6.5).  There were a few NDMP-related bugs in NBU 5.1 that were fixed in NBU 5.1 MP7:

--------------------------------------------------------------------------------
Etrack Incident = ET830908

Description:
Using long hostnames for NDMP hosts may result in the credentials being added incorrectly.
--------------------------------------------------------------------------------
Etrack Incident = ET968678
Associated Primary Etrack = ET805849
Titan cases: 290-374-573

Description:
NDMP environment variables passed using a file list to the filer is only being honored for the first volume in the file list.

Workaround:
To resolve this issue, set the environment variable after each entry in the file list.

Additional Notes:
This problem also existed in NetBackup 5.1MP5 and 5.1MP6.
--------------------------------------------------------------------------------
Etrack Incident = ET1020130
Associated Primary Etrack = ET1003895
Titan cases: 220-088-034

Description:
A successful NDMP restore may produce a status code 83 when the media is write protected and spanning media. The following error is logged in the bptm log file.

15:51:14.479 [6272.8404] <16> open_ndmp_device: cannot open ndmp device nrst07a, error code 11 (NDMP_WRITE_PROTECT_ERR)

Workaround:
If you receive this error, you can ignore the error status because the restore completes successfully, or unmark the tape not to be write protected.
--------------------------------------------------------------------------------
Etrack Incident = ET1026333
Associated Primary Etrack = ET1018020
Titan cases: 230-335-679

Description:
A restore attempt would fail when using a duplicated copy of a standard backup image that was duplicated using an NDMP attached tape drive. An exit status 25 (cannot connect on socket) would be the result. The root cause of this problem was that bpdm was started instead of bptm.
--------------------------------------------------------------------------------
Etrack Incident = ET1051661

Description:
Restore files on the second tape of an NDMP backup would hang and fail.
--------------------------------------------------------------------------------
Etrack Incident = ET1094378
Associated Primary Etrack = ET1086614
Titan cases: 311-593-827

Description:
If a directory name contained Japanese characters then an NDMP restore of an item in that directory would fail. This issue only applied to Microsoft Windows platforms.
--------------------------------------------------------------------------------
Etrack Incident = ET647774

Description:
An NDMP restore would mount the wrong fragment after mover_paused, reason EOF. This problem occurred with Overland Storage, but might occur with other NDMP servers as well.

The restore would fail and the bptm log showed that either a fragment was used twice in a row, or a fragment that should not have been skipped was skipped.
--------------------------------------------------------------------------------
Etrack Incident = ET848039
Associated Primary Etrack = ET841853
Titan cases: 280-687-557 290-397-014

Description:
Incremental NDMP backups would fail with status code 99 when no data has changed. The problem occurred because of an incremental backup that had no changed data, and thus, was a 0-byte backup.

A change was made to add support for backup, restore, duplicate, verify, and import of 0-byte backups.
--------------------------------------------------------------------------------
Etrack Incident = ET1204221
Associated Primary Etrack = ET1114189
Titan cases: 220-091-399

Description:
A change was made to correct a TLDCD handle leak that would occur when using an NDMP control path for robotic control and the robot had an error that prevented the open from succeeding.

rclegarto
Level 4
Certified
It was first determined that the issue was one of the drives is already faulty so it had to be replaced. So I had to update in NBU the serial number of the new drive. Trying that, reveals irregular drive mappings and so we re-configure all the drives with the correct drive index. Now that the drive index, robot drive number in NBU and in the actual tape library is consistent, the issue with the 2 NDMP drives arises.

They are being down'd even when idle. I will up the drive manually and then in the next 2 minutes, drive will be down'd with the given error.

John_Stockard
Level 5
Partner Certified
If the NDMP drives are going down even when idle, that suggests to me one of the following things:

1. The Celerra data mover can't consistently see the tape drives through the SAN fabric.  Make sure that only the Celerra data mover is zoned to these tape drives.  No other host on the SAN needs access to these drives, not even the NetBackup master server or media servers.

2. NetBackup can't correctly communicate with the Celerra data mover.  Make sure that the drive definitions in NetBackup are setup with the fully-qualified hostname of the Celerra data mover, not the short hostname.

3. Cover the basics -- make sure the tape library is not powered down or offline.

If none of the above help, try enabling detailed bptm debug logging on your NetBackup media server.  This log should tell you why NetBackup is marking the drives as being "down", since the bptm process is used by NetBackup to check on the status of the NDMP drives.

rclegarto
Level 4
Certified
Unfortunately, I have very limited access (only on master server), and I am working remotely which makes it more difficult for me to troubleshoot. I'll be suggesting this to the appropriate person I'll keep you updated. Thanks for the tip.

zippy
Level 6
 password has not been set on the filer, if it has you need to add it to the netbackup server.

search my ID for NDMP