cancel
Showing results for 
Search instead for 
Did you mean: 

report for frozen tapes and down'ed drives in NB 6.5?

stan56
Level 4
I recently upgraded from NB 5.1 to 6.5.4. I wrote a script a while back that would run "bperror -media -U" and after some grep/awk/sed magic, it would give me a nice report of all frozen tapes for a week, e.g.
Tape ID: UD6678
===============
08/25/2009 14:51:48 candor nyfs08  cannot write image to media id UD6678, drive index 4, Input/output error
08/25/2009 16:40:09 candor nyfs22p  cannot write image to media id UD6678, drive index 0, Input/output error
08/25/2009 18:18:58 candor nyap45p  cannot write image to media id UD6678, drive index 20, Input/output error
08/25/2009 18:18:59 candor nyap45p  FREEZING media id UD6678, it has had at least 3 errors in the last 12 hour(s)

The above example is a bad tape that I would eject and discard. I always get a few bad tapes per week. The srcript would also search for the DOWN'ed drives in the bperror output and report the down drives.

After I upgraded to 6.5.4, I realized that that bperror no loger reports the frozen media or the down drives! Actually it reports some frozen media, the ones due to bad label or NetBackup catalog tapes, but none due to drive/tape errors. I know for a fact that I'm getting frozen tapes and downed drives. How can I produce a report now? I do see TapeAlerts, but they don't really explain much. Here's a few examples from the bperror output, I grepped for some of the tapes that I found to be frozen:
09/17/2009 15:34:35 orion nybc02  ioctl (MTREW) failed on media id UD0270, drive index 20, Input/output error (bptm.c.8142)
09/17/2009 15:35:34 orion nybc02  ioctl (MTREW) failed on media id UD0270, drive index 20, Input/output error (bptm.c.9453)
09/17/2009 15:35:39 orion -  TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive IBMULT3580-TD113 (index 20), Media Id UD0270
09/17/2009 15:35:39 orion -  TapeAlert Code: 0x05, Type: Critical, Flag: READ FAILURE, from drive IBMULT3580-TD113 (index 20), Media Id UD0270
09/17/2009 15:35:40 orion -  TapeAlert Code: 0x06, Type: Critical, Flag: WRITE FAILURE, from drive IBMULT3580-TD113 (index 20), Media Id UD0270
09/17/2009 15:35:40 orion -  TapeAlert Code: 0x27, Type: Warning, Flag: DIAGNOSTICS REQ., from drive IBMULT3580-TD113 (index 20), Media Id UD0270

09/17/2009 17:08:43 utopia mckinley  media id UD0535 load operation reported an error
09/17/2009 17:08:49 utopia -  TapeAlert Code: 0x27, Type: Warning, Flag: DIAGNOSTICS REQ., from drive IBMULT3580-TD114 (index 4), Media Id UD0535
09/17/2009 17:25:24 utopia mckinley  media id UD0535 load operation reported an error
09/17/2009 17:25:32 utopia -  TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive IBMULT3580-TD118 (index 20), Media Id UD0535
09/17/2009 17:25:33 utopia -  TapeAlert Code: 0x05, Type: Critical, Flag: READ FAILURE, from drive IBMULT3580-TD118 (index 20), Media Id UD0535
09/17/2009 17:25:34 utopia -  TapeAlert Code: 0x06, Type: Critical, Flag: WRITE FAILURE, from drive IBMULT3580-TD118 (index 20), Media Id UD0535
09/17/2009 17:25:34 utopia -  TapeAlert Code: 0x27, Type: Warning, Flag: DIAGNOSTICS REQ., from drive IBMULT3580-TD118 (index 20), Media Id UD0535

09/18/2009 01:52:02 orion -  TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive IBMULT3580-TD101 (index 0), Media Id UD0393
09/18/2009 01:52:03 orion -  TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive IBMULT3580-TD101 (index 0), Media Id UD0393
09/18/2009 01:52:03 orion -  TapeAlert Code: 0x27, Type: Warning, Flag: DIAGNOSTICS REQ., from drive IBMULT3580-TD101 (index 0), Media Id UD0393

09/18/2009 14:05:41 utopia monet  cannot write image to media id UD1602, drive index 12, Input/output error
09/18/2009 14:06:42 utopia -  TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive IBMULT3580-TD101 (index 12), Media Id UD1602
09/18/2009 14:06:42 utopia -  TapeAlert Code: 0x06, Type: Critical, Flag: WRITE FAILURE, from drive IBMULT3580-TD101 (index 12), Media Id UD1602
09/18/2009 14:06:43 utopia -  TapeAlert Code: 0x27, Type: Warning, Flag: DIAGNOSTICS REQ., from drive IBMULT3580-TD101 (index 12), Media Id UD1602
Nowhere it indicates that the tapes were frozen or the drive went DOWN.

Also, what happened to the rules, such as the tape is frozen after 3 errors within 12 hours, or the drive down'ed after 3 errors? Again, I don't see anything to that effect in the bperror anymore.

I found another discussion on this, but it didn't answer the question:

http://www.symantec.com/connect/forums/where-does-logging-frozen-tapes-go-6531

There's nothing in /usr/openv/netbackup/db/error logs that I don't find in the bperror output.

Any other ideas/suggestions?
13 REPLIES 13

Android
Level 6
Partner Accredited Certified
Have you tried sorting the output of available_media to look for frozen tapes? 

Also tpconfig -d should show you the state of your drives.

stan56
Level 4
I need more info than just the list of frozen media. I need to see why they were frozen. In the old report, I was able to find the tapes that were frozen because they had 3 errors within 12 hours and I would know they were bad. I can't just assume every tape that is frozen is a bad tape - I'll run out of tapes in no time!

stan56
Level 4
Judging by the lack of replies, I'm guessing nobody else cares for such report. So let me ask a different question. What does everyone else do about frozen tapes and downed drives? Do you just unfreeze all tapes and bring the down drives up?

Mouse
Moderator
Moderator
Partner    VIP    Accredited Certified
You have the list with FROZEN tapes (using available_media, or whatever report), don't you?

Next, we have the log with error messages. Go thru this log and grep by each mediaID in previous list.

You'll get TapeAlert codes. You know which code relates to tape and which to drive or library. Sort out failed tapes, unfreeze all others.

It's a 20-30 minutes of scripting, I believe

Mouse
Moderator
Moderator
Partner    VIP    Accredited Certified
nobody will look inside until you write something :)

J_H_Is_gone
Level 6
I get an email from NOM every time it freezes a tape.

The email tells me if it gets frozen because it was write protected (meaning I still had a tape in the library for a restore and latter that night it tried to write to it).  Each morning I look at the emails from NOM - any that said it froze because it was write protected - I unfreeze it and see if my restore finished OK.

If it was frozen for some other reason then I look into it more - my library has advanced reporting that can tell me info about a tape and if it had write errors - so if I have a tape frozen I look at the report on the library for that tape and see what it has to say- if it says it had write issues - then I work off of what I find.


As for down drives - again the same think - NOM will email me when it downs a drive - as I use SSO and can have 4 different media servers using the drive I have to look at which server downed the drive and why - and again I can go to advanced reporting on the library to see if the robot says there was anything wrong with the drive - and if the library actually took the tape drive offline or had errors the library would also have emailed me about the drive.

So, based on MY environment - I get emails from NOM and the library about tapes and drives and the emails usually tell me why.  So no need for a special report to get info from the logs.

stan56
Level 4
Funny you just replied suggesting to use NOM for this - I spent some time this morning playing around in NOM trying to set up alerts fror frozen media. I've seen a post in another thread suggesting to use NOM for this so I decided to give it a try. How did you set this up, did you configure an alert for this? I already got an alert, but it doesn't tell me the reason why the media was frozen. This is what I get in the email (and also in the NOM GUI):

Date: October 12, 2009 11:18 AM
Master Server: xxxx
Frozen Media Name: UD6678
Media Server: xxxx
NOM Policy: Frozen Tapes
NOM Server: xxxx
Severity: Warning


No reason why the tape was frozen is provided. Do you get more information in your alerts?

J_H_Is_gone
Level 6
You are correct that NOM does not tell you why it froze it - my email from my library tells me it tried to write to a write protected tape - I match the two up and know I can unfreeze it.  My mistake I thought NOM was telling me that.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Found this in bperror output on NBU 6.5.4 (Solaris) master:
FREEZING media id AD0611, it is write protected and cannot be used for backups
incorrect media found in drive index 10, expected AD0815, found AD0816, FREEZING AD0815

stan56
Level 4
I still get those, what I no longer get are the errors like this one:

FREEZING media id UD6678, it has had at least 3 errors in the last 12 hour(s)


I know I have a number of frozen tapes each week due to either mounting errors or read/write errors, it appears NB 6.x no longer logs anywhere as they're being frozen. Except in NOM, which seems like my solution for now (see above).

stan56
Level 4
Out of the curiousity, what tape library are you using? I have IBM 3854. It does have SNMP alert capabilities, but I don't see anything in the setup to configure email alerts.

stan56
Level 4
I spent a bit more time playing around in NOM, and so far I somewhat impressed what I'm getting, as far as frozen tapes are concerned. The alerts work well, every time a tape is frozen an email goes out, and one can also see the alerts in the alerts list. As an added bonus, if I unfreeze a tape, the alert is cleared automatically (and again, an email is sent). I can also see all frozen tapes in the media status screen, and I can unfreeze them from there. This is all very helpful!

Now if it would log the reason for the frozen tape... I guess I'd have to rely on bperror output and TapeAlerts for that.

Setting up alerts for DOWN drives doesn't quite give me what I need though. I have SSO, if a drive goes down on just one media server, it's reported as "down path", and the drive becomes "mixed". So ti doesn't see it as a down drive and I don't get an alert. Unfortunately there are no alerts for down paths or mixed drives.

Carlos_V
Level 6
Hi everyone!
When I ran the command bperror i see this message:

2/15/2009 11:42:47 crcgesms03 -  TapeAlert Code: 0x12, Type: Warning, Flag:
                    DIRECTORY CORRUPTED ON LOAD, from drive STK.T10000B.000
                    (index 6), Media Id 036174


After finishing the backup, the cartridge state is frozen,  then NOM send me an alter. The error does not always happen, but I'm annoyed.