03-12-2019 09:39 AM
We have in the environment tape library SL500.
We recently starting to face the error below whenever we try to load some media tapes and starting backup operations:
Mar 12, 2019 4:11:29 PM - granted resource HP.ULTRIUM5-SCSI.004
Mar 12, 2019 4:11:29 PM - granted resource XXXXX-hcart2-robot-tld-0
Mar 12, 2019 4:11:29 PM - end writing
Mar 12, 2019 4:11:29 PM - mounting BAR245
Mar 12, 2019 4:11:35 PM - current media BAR245 complete, requesting next media Any
Mar 12, 2019 4:12:19 PM - current media -- complete, awaiting next media Any. Waiting for resources.
Mar 12, 2019 4:12:24 PM - Info bptm (pid=20049) Waiting for mount of media id BAR245 (copy 1) on server XXXXX.
Mar 12, 2019 4:12:30 PM - Error bptm (pid=20049) error requesting media, TpErrno = Robot operation failed
Mar 12, 2019 4:12:30 PM - Warning bptm (pid=20049) media id BAR245 load operation reported an error
We contoacted Oracle support and they advised, after sharing a lot of logs, to replace the robot hand.
We did it but the problem persists, then we jumped to the media tapes, removed all the old tapes and inserted new ones, and still on the same point and getting the same error.
We tried to replace some drives, but no success noting that the drives are giving tape error whenever we load/unload media tapes.
The Tape Drives by itlsef is HP LTO5 FC, and from the dump drive, it is showing that it is functioning well without error.
But whenever we try to run a backup, the drive from the host goes down and give the error.
Anyone faced this problem before?
03-12-2019 04:42 PM
Test your library with robtest:
Check these guides:
If you are unable to mount/unmount tape media through robtest, you won't be use the library in NBU.
03-12-2019 11:41 PM
Please check if the density of your tape and tape drive matches, if none of the drive is working that might be the issue.
Is this a new setup or did you change anything recently.
Are you able to move the tape to drive via robtest or may be via library GUI. If tape movement within the library is working it might be configuration.
03-13-2019 12:28 AM
The denisty does match, if it didn't it wouldn't even assign the tape, and no attempt to load a tape would be made.
03-13-2019 12:49 AM
There are quite a number of TechNotes and forum posts related to ' TpErrno = Robot operation failed' errors.
Very little of them were caused by faulty robot or faulty media.
Here incorrect zoning was part of the problem.
Incorrect labels caused this problem
Caused by Linux device drivers
A post that lists various possible reasons was marked as solution, but the OP never told us what the real problem was.
Best TN ever:
Troubleshooting Robot or Drive Issues in NetBackup
A number of possible reasons and troubleshooting steps are listed here.
Please add VERBOSE entry in volmgr/vm.conf on robot control host as well as all media servers, and restart ltid.
This will increase media manager logging to OS syslog (e.g. /var/adm/messages on Solaris).
Ensure that bptm log exists on all media servers with bptm logging level set to 3 (only enable level 5 if you intend logging Support call with Veritas). No restart is needed.
Check above logs after next failure.
03-13-2019 12:51 AM - edited 03-13-2019 12:53 AM
Look for the SCSI sense keys reported by the robot. Sense key are usually written to the syslog for event log.
A sense consist of a key, additional Sense Codes and Qualifiers (Sense Key ASC ASCQ)
Sense 5h 3Ah 00h Medium Not Present, Drive Not Unloaded
The L500 interface manual at https://docs.oracle.com/cd/E19724-01/96122G/96122G.pdf
page 212 list all the possible values along with root cause. Once you know what is causing the mount issues, it is much easier to fix it.
03-13-2019 02:24 AM
Please share "print log error" output from the tape library.
03-14-2019 12:19 AM
04-09-2019 01:14 PM
Are you tapes encrypted? Our OKM had a switch go bad, and the drives could not get the keys, and I was getting the same errors.
Replace switch and voila!
Check your key manager to se the keys are getting there.
Agree with previous posts - most likely issue is at robot /drive level.
Heck, I had a drive firmware that once it labelled the tape, that tape could only be read in that drive.
Check drive firmware as well.
04-09-2019 09:17 PM
04-09-2019 11:46 PM
Thanks for the support first.
Second, we actually replaced the media server, and we migrated all the the tape drives to the new one.
ONce done, we triggered backup operation test but not all the drives worked correctly.
After that we requested from Oracle to replace all the tape drives which are not working and one we replaced all of them are working Idle till the moment. The problem took too much time to get resolved, since all the parts were coming from Oracle germany.
At the moment, the TL and all the drives are working fine without any single error.
Thanks all again. :)