cancel
Showing results for 
Search instead for 
Did you mean: 

Isilon Backups failing with error cannot open ndmp device tape005, error code 7 (NDMP_IO_ERR)

koribilli12
Level 4
2 ACCEPTED SOLUTIONS

Accepted Solutions

Thank you everyone for your valuable suggestions.

However the issue is still not yet resolved and we suspected that the issue with KMA (Key Manager) Device problem and that needs to be replaced. Browsed thru Oracle KB articles and found that backups will fail with 83/84 when KMA device is having problem.

We have KMS manager in place for encrypting the data via tape drive KMA card which is infact attached to all the drives.

 

View solution in original post

Hello Everyone,

 

The issue has now fixed after we have remove the teaming from the backup server.

Earlier we use to have NIC teaming enabled, where it has two NIC's and using virtual teaming software both the NIC's are under 1 IP address. Now we have only one NIC working and all the backups completed successfully when it got disabled.

So what I learned is like teaming might cause the network disruptions and due to that backups will fail with error 84.

View solution in original post

16 REPLIES 16

Michael_G_Ander
Level 6
Certified

As you have changed something on the Isilon, it the natural place to start.

But you probably have to reconfigure the tape devices in Netbackup too.

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

@Michael_G_Ander

We have rescanned from Isilon and reconfigured the drives as well. Other backups run fine and Isilon backups also complete fine if that is small volume. The issue is only with the volume which is huge >5 TB.

Multiple times tape library and server has also been rebooted.

 

Nicolai
Moderator
Moderator
Partner    VIP   

This looks like a device configuration issue. From the logs:

cannot open ndmp device tape005, error code 7 

Suspend media 010702 with

bpmedia -m 010702 -suspend

If media is damaged the tape drive wont be able to open it.

Check tape005 is defined at the right location in the robot (a common configuration error). Robot think tape005 is located at index e.g.1 when robot think it is a e.g. index 2. The Islon then look at a drive with no tape mounted, thus cannot open the tape. This config error usally caused other tape drives going to the down state becuase they where found full when expected empty.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I have followed the PID for this particular backup in bptm log.

All was fine until the first tape filled up and the next tape was loaded. 

02:57:25.356 [4452.6128] <16> open_ndmp_device: cannot open ndmp device tape005, error code 7 (NDMP_IO_ERR)

There are more errors like these - can you confirm that this is only seen during large backups when 1st tape is filled up and a new tape is requested?

 

There seems to be a problem with the filer that does not acknowledge or report the tape change.
Can you check the Isilon system log?

Probably a good idea to log a Support call with EMC and Veritas for joint troubleshooting. 

I found an OLD TN with similar error message, but different as we do not see that NBU is choosing the NDMP media server device name and not the filer device name: http://www.veritas.com/docs/000035830

This means that we can eliminate NBU as having the same bug. 

We are getting multiple media errors and the number is increasing every day. I suspected some problems with the media and changing all the media and loading the brand new scratch tapes to eliminate the error.

Lets hope some postive results.

@MarianneI have now encountered the problems with the file system as well. I could see many backups are failing with error 83 and 84. Tape library vendor confirmed that there is no issue with the library or drives.

I have tested the Isilon backup on the local disk and that is getting success without any issues.

Is there anything else we can check. Do we need to analyze any other logs to find the exact root cause.

Nicolai
Moderator
Moderator
Partner    VIP   

A media write error does not mean it is a tape drive issue. A status code 84 simply mean the write operation failed. A FC issue can also cause a status 84.  Status 83 media open error is not that often seen, any power outages lateley ?

Do you see SCSI sence keys in either the syslog/event log ?

You can see the list of sense codes at:

https://en.wikipedia.org/wiki/Key_Code_Qualifier

 

 

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

You have showed us Device scan on a Windows media server.

Are your NDMP backups sent across the network to a Windows media server? 
The bptm log seems to indicate something else  - that NDMP backups are done to NDMP devices (drives).

Maybe best to go back to the drawing board and tell us how devices are attached...

'tpconfig -l' will be a good start (please copy the text instead of screenshot)
Your 2nd attachment does not have an extention, meaning we have no idea what kind of file this is.

If the OS is losing device connectivity, you need to troubleshoot device connectivity, starting with your SAN. 

Nicolai
Moderator
Moderator
Partner    VIP   

This is for sure a SAN related issue.

On the Isilon inspect /var/log/messages, and as I mentioned before look for SCSI sense key codes

@MarianneI have sent you the attachment of tpconfig -l

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Have you completely deleted old NDMP device info before you reconfigured devices:

Probably not, as most drives shows 3 paths (assuming Windows media server and 2 NDMP paths):

  drive    -    0  hcart   10      UP  -          000-HP.ULTRIUM4-SCSI.10  {4,0,0,0} 
  drive    -    0  hcart   10      UP  -          000-HP.ULTRIUM4-SCSI.10  tape011   
  drive    -    0  hcart   10      UP  -          000-HP.ULTRIUM4-SCSI.10  c32t0l0   

Probably best to delete ALL devices, restart Device Management and reconfigure NBU devices.

Before you do that - I suggest that you carefully check device attachment/zoning - it seems that you have way too many devices attached to one HBA on the Windows server - the robot and 8 drives on c4 and 2 drives on c5. 
c5 seems to be problematic with both drives down. You need to check Event Viewer System log for errors or else switch logs for errors on the port. 

Check filer syslog for errors as well.
Where the drive path is DOWN on both Windows server and the filer, the problem is likely to be between the switch port and the tape drive. 

As per previous posts - you need to carefully check your SAN.

            

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

You have either misunderstood me or you did not read my previous post. 

I honestly don't know what else to say.... 

Good luck!

Thank you everyone for your valuable suggestions.

However the issue is still not yet resolved and we suspected that the issue with KMA (Key Manager) Device problem and that needs to be replaced. Browsed thru Oracle KB articles and found that backups will fail with 83/84 when KMA device is having problem.

We have KMS manager in place for encrypting the data via tape drive KMA card which is infact attached to all the drives.

 

Hello Everyone,

 

The issue has now fixed after we have remove the teaming from the backup server.

Earlier we use to have NIC teaming enabled, where it has two NIC's and using virtual teaming software both the NIC's are under 1 IP address. Now we have only one NIC working and all the backups completed successfully when it got disabled.

So what I learned is like teaming might cause the network disruptions and due to that backups will fail with error 84.