StorageTek SL500 Issues with the the drives

Hello Everyone,

 

We have in the environment tape library SL500.

We recently starting to face the error below whenever we try to load some media tapes and starting backup operations:

Mar 12, 2019 4:11:29 PM - granted resource  HP.ULTRIUM5-SCSI.004
Mar 12, 2019 4:11:29 PM - granted resource  XXXXX-hcart2-robot-tld-0
Mar 12, 2019 4:11:29 PM - end writing
Mar 12, 2019 4:11:29 PM - mounting BAR245
Mar 12, 2019 4:11:35 PM - current media BAR245 complete, requesting next media Any
Mar 12, 2019 4:12:19 PM - current media -- complete, awaiting next media Any. Waiting for resources. 

Mar 12, 2019 4:12:24 PM - Info bptm (pid=20049) Waiting for mount of media id BAR245 (copy 1) on server XXXXX.
Mar 12, 2019 4:12:30 PM - Error bptm (pid=20049) error requesting media, TpErrno = Robot operation failed
Mar 12, 2019 4:12:30 PM - Warning bptm (pid=20049) media id BAR245 load operation reported an error

 

We contoacted Oracle support and they advised, after sharing a lot of logs, to replace the robot hand.

We did it but the problem persists, then we jumped to the media tapes, removed all the old tapes and inserted new ones, and still on the same point and getting the same error.

We tried to replace some drives, but no success noting that the drives are giving tape error whenever we load/unload media tapes.

The Tape Drives by itlsef is HP LTO5 FC, and from the dump drive, it is showing that it is functioning well without error.

But whenever we try to run a backup, the drive from the host goes down and give the error.

 

Anyone faced this problem before?

10 Replies

Re: StorageTek SL500 Issues with the the drives

Test your library with robtest:

Check these guides:

https://www.veritas.com/support/en_US/article.100016165.html

http://qsupport.quantum.com/callisto/node/2600

If you are unable to mount/unmount tape media through robtest, you won't be use the library in NBU. 

Re: StorageTek SL500 Issues with the the drives

Please check if the density of your tape and tape drive matches, if none of the drive is working that might be the issue. 

Is this a new setup or did you change anything recently.

Are you able to move the tape to drive via robtest or may be via library GUI. If tape movement within the library is working it might be configuration.

 

Re: StorageTek SL500 Issues with the the drives

The denisty does match, if it didn't it wouldn't even assign the tape, and no attempt to load a tape would be made.

Re: StorageTek SL500 Issues with the the drives

@lteife 

There are quite a number of TechNotes and forum posts related to ' TpErrno = Robot operation failed' errors.

Very little of them were caused by faulty robot or faulty media.

https://www.veritas.com/support/en_US/article.100018475.html
Here incorrect zoning was part of the problem.

https://vox.veritas.com/t5/NetBackup/error-requesting-media-TpErrno-Robot-operation-failed-on-all-ne...
Incorrect labels caused this problem

https://www.veritas.com/support/en_US/article.100026246
Caused by Linux device drivers

https://vox.veritas.com/t5/NetBackup/Error-bptm-pid-5832-error-requesting-media-TpErrno-Robot/td-p/5...
A post that lists various possible reasons was marked as solution, but the OP never told us what the real problem was.

Best TN ever: 
Troubleshooting Robot or Drive Issues in NetBackup
https://www.veritas.com/content/support/en_US/article.100014480.html#Robot_load_issue
A number of possible reasons and troubleshooting steps are listed here.

Please add VERBOSE entry in volmgr/vm.conf on robot control host as well as all media servers, and restart ltid.
This will increase media manager logging to OS syslog (e.g. /var/adm/messages on Solaris).

Ensure that bptm log exists on all media servers with bptm logging level set to 3 (only enable level 5 if you intend logging Support call with Veritas). No restart is needed.

Check above logs after next failure. 

Re: StorageTek SL500 Issues with the the drives

Look for the SCSI sense keys reported by the robot. Sense key are usually written to the syslog for event log.

A sense consist of a key, additional Sense Codes and Qualifiers (Sense Key ASC ASCQ)

Sense 5h 3Ah 00h  Medium Not Present, Drive Not Unloaded

The L500 interface manual at https://docs.oracle.com/cd/E19724-01/96122G/96122G.pdf

page 212 list all the possible values along with root cause. Once you know what is causing the mount issues, it is much easier to fix it.

Re: StorageTek SL500 Issues with the the drives

Please share "print log error" output from the tape library.

Re: StorageTek SL500 Issues with the the drives

Re: StorageTek SL500 Issues with the the drives

Are you tapes encrypted? Our OKM had a switch go bad, and the drives could not get the keys, and I was getting the same errors.

Replace switch and voila!

Check your key manager to se the keys are getting there.

Agree with previous posts - most likely issue is at robot /drive level.

Heck, I had a drive firmware that once it labelled the tape, that tape could only be read in that drive.

Check drive firmware as well.

 

NetBackup 8.1.2 on Solaris 11, writing to DataDomain 9800
duplicating via SLP to LTO5 in SL8500 via ACSLS
Tags (2)

Re: StorageTek SL500 Issues with the the drives

Weird thing is that @lteife has not replied to any of our attempts to assist...
Highlighted

Re: StorageTek SL500 Issues with the the drives

 Hello Marianne,

 

Thanks for the support first.

Second, we actually replaced the media server, and we migrated all the the tape drives to the new one.

ONce done, we triggered backup operation test but not all the drives worked correctly.

After that we requested from Oracle to replace all the tape drives which are not working and one we replaced all of them are working Idle till the moment. The problem took too much time to get resolved, since all the parts were coming from Oracle germany.

 

At the moment, the TL and all the drives are working fine without any single error.

Thanks all again. Smiley Happy