24/03/2014 11:46:04 - Info bptm(pid=10040) Waiting for mount of media id 000107 (copy 2) on server appnbu5.
24/03/2014 11:46:04 - started process bptm (10040)
24/03/2014 11:46:04 - mounting 000107
24/03/2014 11:46:04 - Info bptm(pid=10040) INF - Waiting for mount of media id 000107 on server appnbu5 for writing.
24/03/2014 11:46:05 - granted resource 000107
24/03/2014 11:46:05 - granted resource Drive002
24/03/2014 11:46:05 - granted resource Mediaserver5-hcart-robot-tld-0
24/03/2014 12:00:52 - Error bptm(pid=10040) error requesting media, TpErrno = Robot operation failed
24/03/2014 12:00:52 - Warning bptm(pid=10040) media id 000107 load operation reported an error
24/03/2014 12:00:53 - current media 000107 complete, requesting next resource Any
24/03/2014 12:01:06 - Info bptm(pid=10040) Waiting for mount of media id 000107 (copy 2) on server appnbu5.
24/03/2014 12:01:06 - started process bptm (10040)
24/03/2014 12:01:06 - mounting 000107
24/03/2014 12:01:06 - Info bptm(pid=10040) INF - Waiting for mount of media id 000107 on server appnbu5 for writing.
24/03/2014 12:01:06 - granted resource 000107
24/03/2014 12:01:06 - granted resource Drive002
24/03/2014 12:01:06 - granted resource Mediaserver5-hcart-robot-tld-0
24/03/2014 12:15:49 - Error bptm(pid=10040) error requesting media, TpErrno = Robot operation failed
24/03/2014 12:15:49 - Warning bptm(pid=10040) media id 000107 load operation reported an error
24/03/2014 12:15:50 - current media 000107 complete, requesting next resource Any
I wouldn't log any calls just yet, as per my usual comments on cases like this it's unlikely to be NBU (the vast majority of tape/ library issues are outside of NBU) but it's not impossible it's not us.
We would need to demonstrate it's not NBU before passing the issue to a 3rd party vendor (only polite ...)
Sometimes I request assistance from a 3rd part if I need their 'help' but we're not at that stage.
The problem here is tha the log messages generally don't get any better than what we're seen here - for whatever reason, NBU does't think the tape is in the drive, hence the error - it can't tell why, so we get no real details.
You've done som good work so far, so let's just confirm :
The fact the library console can move the tape is good - it shows the physical mechanics work.
The jobs fails as explained, but you see the tape loaded in the robot log:
: <../../lib/MmDeviceMappings.cpp:974> search robot list (length=406) for ADIC Scalar i500, type 8
12:00:36.866 [9040.9048] <4> MmDeviceMappings::GetRobotAttributes
: <../../lib/MmDeviceMappings.cpp:1227> found match: "ADIC Scalar i500" ADIC Scalar i500
12:00:36.866 [9040.9048] <5> tldcd:inquiry: inquiry() function processing library ADIC Scalar i500 636G:
12:00:36.866 [9040.9048] <6> tldcd:read_element_status_drive: RES drive 1
12:00:37.773 [9040.9048] <6> tldcd:tape_in_drive: valid = 1, sel = 4127, barcode = (000107L4 )
12:00:37.773 [9040.9048] <6> tldcd:read_element_status_drive: RES drive 1
12:00:39.085 [9040.9048] <6> tldcd:read_element_status_slot: RES storage element 32
12:00:40.382 [9040.9048] <5> tldcd:move_medium: TLD(0) initiating MOVE_MEDIUM from addr 256 to addr 4127
12:00:51.101 [9040.9048] <5> tldcd:tld_main: TLD(0) closing/unlocking robotic path
12:00:51.101 [9040.9048] <6> tldcd:ChildExit: TLD Child process 9040 has exited normally: STATUS_SUCCESS
Persoanlly my next step would be to run a job and then check this is true using robtest from the robot control host.
It's easy to do:
1. Run job, what activity monitor until you see it mounting media, give it a minute to be sure
2. Run robtest from the robot control host, select the correct robot
3. Once in robtest, run the command s d
This shows the drives, and you can see the barcodes of the tapes in the drives, do you see the tape in a drive that is listed in Activity Monitor.
If you don't report back here ...
If you do, um, report back here ...
What should happen, is that once a tape is requested to be put in the drive by NBU, NBU has to determine when the tape is actually in the drive and ready for use.
It does this by constant asking the drive 'Are you ready' ... it does this multiple times (quite quickly) until the drive responds, 'Yep I've got a tape loaded, I'm ready'.
The way it does this, is by sending the scsi command TUR (Test unit ready).
If TUR never returns with the correct response (or sometimes any responce) then NBU cannot determine the tape is loaded and you get a robot load failure, even though, the tape did actually get loaded - it's a bit of a false error in a way, but that's how it is.
I suspect that this is what is happening here, there is some issue on the TUR side (the reason is that this is the only thing I can think of, providing the tape is really in the drive, hence the robtest check). Marks post is worth checking first, as if the tape is in the wrong drive, then we don;t get the TUR back.
It could also be worth watching the library physically if some one is close enough - is the tape fully in the drive, or does it pop out again in an 'ejected' position (not sure what robtest would report for that).
If I am correct, then the issue changes slightly - but you are most likely looking at one of:
Drive fault
Firmware issue
Driver issue
SAN issue (see below, if HBA firmware can cause thyis, then quite possible some other SAN issue can cause it (eg switch firmware)
HBA issue (I've seen HBA firmware cause this exact issue)
Of course if no drivers have been changed recently, this is unlikely (+ I would expect it to affect other drives)
We can kinda rule out NBU already as this affects only one drive, if it was NBU I would expect it to affect all drives, as the same code and commands (scsi commands) are sent for each drive.
One test I'm thinking of is loading te tape using robtest, and then sending a TUR command outside of NBU. I can;t think if windows has the scsi_command (if it does it will be in volmgr\bin dir ) - ok this is a Symantec command, but not a NBU command as such and is useful (at least in Unix land) for issues like this.
Hope this helps.