We have an issue where all off a sudden the Media/Master server Netbackup 18.104.22.168 we have will not write to any tapes.
It fires off the job, it says it has selected a tape to mount, I see the tape move in the library into the drive (Can see that on the web gui also), but that is all that happens. It never writes anything to the tape. The tape sits idle in the tape drive and Netbackup just sits there waiting for the tape to be read.
When using vmoprcmd it shows the tape drives, but does not show that there is a tape in them.
I have upgraded and downgraded the Firmware on the tape drives and library, and it does nothing. Still the same issue. I have checked the drivers and they are all up to date, and none of those have change anyway.
I have removed Microsoft Security patches to see if that will make a change, but that does not change.
It seems that either Netbackup is not reading what is returned from the library/tape drives or the library/tape drives are not sending the load/mount completion back to Netbackup.
Any help would be greatly appreciated.
Solved! Go to Solution.
Ok, that was worth a try.
My best advice from here is inspecting debug logs. vxlogview is a good command since it will report everything it can find.
E.g : #vxlogview -p 51216 -X "jobid=12345"
Firstly have you inventoried the robot recently?
And one suggestion (if the tapes are new and unused), change the media ID generation rules for the robot to use the first 6 characters of the barcode and not the default last 6 (I never understood why NetBackup did it this way). Delete the tapes from the GUI, update the generation rule for the robot (under advanced options which doing an inventory). Refer to the one of the Server Admin Guides for details. You could also add something like "MEDIA_ID_BARCODE_CHARS = 0 8 1:2:3:4:5:6" to /usr/openv/volmgr/vm.conf to achieve the same result (the first digit is the robot number, the second the the number of chars in the barcode, and the last is the chars to use from the barcode).
A tape is not 'mounted' just because it is in the drive (a very common mis-conception).
The following is the process ....
CDB 0xa5 MOVE MEDIUMcommand to the library
CDB 0x1d SEND DIAGNOSTICto the drive, a valid correct response is required.
CDB 0x00 Test Unit Readyover and over every few seconds – once TUR finally returns
0x00 READY,only then know do we know the tape is actually physically and correctly loaded in the drive.
A failure of any of those steps will result in the tape not being mounted and available for use - for example, if step 5 or 6 fails, although the tape is physically in the drive, NetBackup was not able to get confirmation of this, and so this would cause a '
robot load error', despite the fact the robot physically moved the tape into the drive.
We know 5 happens, as you have confirmed the tape is in the drive.
So the failure is between 6 and 8 (arguable I guess even 8 could be happening and not telling the rest on NBU, although I have never seen that happen in 14 years ....)
If 6 or 7 fails to return, I think you do get something in the robots log (timed out waiting for drive to become ready) or words to that effect, but proving it is very very difficult, impossible from NetBackup, you need the library vendor to confirm if the drive/ robot received the CDB, and if it sent a response, or, a scsi analyzer.
7 may be easier - on Linux, I'd use strace on AVRD, you can clearly see it 'reading' the header, and of course if that happens, it must have completed 5 and 6.
On Windows, I guess you'd be looking at something like procmon to trace avrd when a tape is trying to mount.
Hi Martin, yes I know. The idea was to inspect if cartridges somehow was rejected (wrong type, RFID error) by the drive and then ending up on the "unmountable" list of tapes. Once we know the tape drives go into the ready state, we know to look at the issue from the OS side. If just mt -f /dev/rtmxxx status was available on Windows ....
So we have figured out the issue now.
When the tapes were inserted they never checked what version of Tapes they were putting in.
We took one out last night and had a look ..... LTO8 tapes. Drives are LTO7. So correct, not a netbackup issue at all, and in the great words of Homer Simpson - "D'OH!!!!!"
So new tapes should be arriving tomorrow and no more issues we hope.
Thanks all for jumping in to help ou in this.
I would report tat back to the vendor, as I also thought the tape was ejected in such a case, which 'might' suggest a firmware issue.
There is a lso 'unsupported tape format' tape alert which I would have thought was logged, which NetBackup should have read when the tape was ejected, maybe this was logged in ...netbackup/db/media/errors on the media server.