cancel
Showing results for 
Search instead for 
Did you mean: 

SLP is trying to write to a stuck tape

randes2000
Level 4

Master server: Netbackup 7.5.0.4 - Solaris 10 SPARC
Media server: Netbackup 7.5.0.4 - Solaris 10 SPARC (also the robot control servers)
Media server Netbackup 7.5.0.4 - Windows 2008R2 (job trying to write to stuck tape)
Tape library - StorageTek SL150 - 6 LTO5 drives
Using SLP for duplication to tape.

I have a job atempting to write to a stuck tape.  Netbackup loaded the tape by tape ID (U00719) and sometime during the job process the library could no longer read the tape label.  The tape library console reports the tape as unreadable.  I cannot manually move the tape via the library controls as it cannot read it.  I brought down the library and removed the tape drive to check the cartridge.  The label looks perfect.  I have since reinstalled the tape drive and set if offline.  I have cycled Netbackup on the master and Solaris media server and rebooted both servers.  The SLP continues to attempt to write to this tape.  Other jobs are working fine, ignoring the downed drive.  Netbackup shows the tape (media) as a member of my offiste pool (as expected), so I would like to return this tape to the database intact.  I have a service call into Oracle to fix the issue.  Any ideas as to how I can get this job to move on and complete?

1 ACCEPTED SOLUTION

Accepted Solutions

Nicolai
Moderator
Moderator
Partner    VIP   

If the tape is not allocated move it to the none pool, if its allocated freeze it.

That will prevent further writing.

View solution in original post

9 REPLIES 9

Nicolai
Moderator
Moderator
Partner    VIP   

Try first to cancel the job - a SLP operation will re-try by itself. If that does not work its likely a stuck EMM allocation for media ID  U00719. If you can't cancel it  - run command:

# nbrbutil -dump 

Grep for media ID U00719. You then need to cancel the allocation or reservation associated with the tape. nbrbutil has different options to do that.

A quick fix is using either nbrbutil -resetAll or nbrbutil -resetMediaServer [mediaserver]. You need to stop all operation or stop operation for that media server in question before running either command.

http://www.symantec.com/docs/HOWTO43779

randes2000
Level 4

I was able to cancel the job, however my experience with SLP is that it will attempt again to complete.  Earlier I completely shutdown Netbackup and upon restart the SLP attempted to write to the stuck tape.  That job just won't give up.
The output of the nbrbutil -dump command did not contain the media ID U00719, but I expected that because I was able to cancel the job.
I attempted to eject the tape via robtest and it failed with:

Initiating MOVE_MEDIUM from address 505 to 10
move_medium failed
sense key = 0x4, asc = 0x53, ascq = 0x0, MEDIA LOAD OR EJECT FAILED

Looks like I may have to manually eject the tape.  What are the procedures to reintroduce the tape to Netbackup as a tape with data?

mph999
Level 6
Employee Accredited

Just stick it back in the library and run an inventory ...

Nicolai
Moderator
Moderator
Partner    VIP   

If the tape is not allocated move it to the none pool, if its allocated freeze it.

That will prevent further writing.

randes2000
Level 4

I guess I can just reassign it to the correct Volume Pool.  I'll go with that once I get the dern thing out of the tape drive. 

Nicolai
Moderator
Moderator
Partner    VIP   

If the tape jam in the tape drive the casing may have cracked somewhere. It almost impossible to see but the tape does not fit in the tape drive.

If the tape is re-inserted into the robot Netbackup will know what volume pool where it belong (if not scratch status).

randes2000
Level 4

When I finally get the tape out of the drive I will inspect it.  The tape is already allocated, robot type TLD,  is in a volume group, a volume pool and has 86 images on it.  I attempted to freeze it or suspend it earlier, but both attempts failed at the time.  I just checked the status and the tape is now frozen.  Freaky.  I'm still waiting for Oracle support to schedule an onsite to fix the issue.  Typical for Oracle if you don't open the support request with the highest priority.

Since cancelling the SLP job this morning, it has not attempted to re-run.  We'll see.

randes2000
Level 4

Oracle support tech had to manually eject tape from drive.  A very long process as the tape had to be rewound by hand.  He broke the tape leader, but was able to repair it.  He also replaced the tape drive.  After configuring the new tape drive all is working.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I would not try and reuse that tape. Best to try and duplicate all images on that tape and then get rid of it.
Once all images have been duplicated, expire all images associated with it so that NBU will not attempt to restore from it.

About the following:

 I attempted to freeze it or suspend it earlier, but both attempts failed at the time.  I just checked the status and the tape is now frozen.  Freaky.

NBU will automatically freeze a tape if 3 I/O errors in a period of 12 hours occurred.
We don't know what your error message was when you tried to freeze the tape, so, we cannot tell you why it failed.