cancel
Showing results for 
Search instead for 
Did you mean: 

Checking on TapeAlert cleaning

Eamonn
Level 3

Hello,

 I'm trying to verify if our tape drives are getting cleaned when the TapeAlert is generated.

 

Using NetBackup 6.5.3 with HP MSL6000 series tape libraries, HCART Media Type. Windows Server 2003 R2, 64bit.

Current configuration for the cleaning tapes is:

Media Type: HC_CLN

Volume Pool: None

The tape libraries are set to "Logging Disabled".

Also there is no "NO_TAPEALERT" being used here.

The drives are all set to Cleaning Frequency: 0 as we want the drives only to be cleaned when a TapeAlert is generated.

The cleaning tapes all have uses left on them.

 

Issue is we received an alert over the weekend from the library:

Report from the library named: , located at IP address: x.x.x.x An alert has been posted. These are the details:

A tape drive requires cleaning.

 

I checked the library, and the drive still reported that it needed cleaning, which I ran manually from the robot management page.

 

Looking at the BPTM logs I can only find the following for TapeAlerts:

00:04:45.258 [21572.21776] <2> process_tapealert: TapeAlert returned 0x00000000 0x00000000 (from io_terminate_tape)

There are no 0x14, or 0x15 entries in the logs as far as I can see.

 

How do I check when/if a drive was cleaned? Also any thoughts on why I wouldn't see an entry for the TapeAlert even though the robot is reporting it?

Thank you in advance for any help you can provide.

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited

... and I believe here we have the answer ...

"I checked the library, and the drive still reported that it needed cleaning, which I ran manually from the robot management page."

From this, I know that the library is reading the status of the drives.

Tapealerts work like this.

When the drive has a problem (eg. clean me)  a 'clean me' bit is set in the firmware.

When the library (in this case) reads the bits , it is able to display the message you see.  However, the act of the library reading the 'bits' resets them, so they can never be read by NBU.

So, you need to configure the library so it knows nothing at all about cleaning or cleaning tapes.  Once the library is stopped from reading the drive status it will not reset the 'bits'.  Then, when NBU wishes to get the drive status (after each backupI think) the bits will not have been reset, and it should detect the drive needs cleaning.

Hope this helps,

Martin

View solution in original post

6 REPLIES 6

mph999
Level 6
Employee Accredited

If you have the "NO_TAPEALERT" touch file in place, as I understand you do, this makes NBU ignore tapealerts and it will not clean the drives.

To enable NBU to clean drives, remove this touch file (may have to restart ltid, not sure).

I suspect, with the touchfile in place, perhaps this prevents it being seen in the logs.  Remove the touchfile and see how it goes.

The settings you describe, look correct (apart from the no tapealert file).

To see whats cleaned, use the tpclean command in ...volmgr/bin

 

Martin

Eamonn
Level 3

Sorry, no there is not a "NO_TAPEALERT" being used here.

 

I'll check tpclean and report back.

mph999
Level 6
Employee Accredited

... and I believe here we have the answer ...

"I checked the library, and the drive still reported that it needed cleaning, which I ran manually from the robot management page."

From this, I know that the library is reading the status of the drives.

Tapealerts work like this.

When the drive has a problem (eg. clean me)  a 'clean me' bit is set in the firmware.

When the library (in this case) reads the bits , it is able to display the message you see.  However, the act of the library reading the 'bits' resets them, so they can never be read by NBU.

So, you need to configure the library so it knows nothing at all about cleaning or cleaning tapes.  Once the library is stopped from reading the drive status it will not reset the 'bits'.  Then, when NBU wishes to get the drive status (after each backupI think) the bits will not have been reset, and it should detect the drive needs cleaning.

Hope this helps,

Martin

Chris_Morris
Level 3
Employee Certified

You can right-click on a drive in the topography view of your hardware in the Activity Monitor - Drive Details.  In there, you'll see the last cleaning time date.

Eamonn
Level 3

First off thank you both for your replies, each gave me something further to look at and steered me in the right direction.

So to update on this last night another drive on the same robot reported a drive needed cleaning (not the same drive as the night before). This time NetBackup did clean the drive.

I checked the management page on the robot, and none of the drives reported they needed cleaning

I went to the drive details in Netbackup and it reported it had been cleaned last night slighty after the alert.

I also can see the job for the drive cleaning.

Looking in the log I see the TapeAlert:

18:31:40.639 [19656.22988] <16> process_tapealert: TapeAlert Code: 0x14, Type: Critical, Flag: CLEAN NOW, from drive Drive003 (index 1), Media Id V70131

 

What's odd here is nothing was changed yet this time it worked. However looking at all the other drives they don't report ever being cleaned.

tpclean information below:

D:\Program Files\Veritas\Volmgr\bin>tpclean -L
Drive Name              Type      Mount   Time        Frequency     Last Cleaned         Comment
**********                ****       **********  *********   ****************     *******                  *******
Drive010                hcart*    192.8       0                N/A
Drive008                hcart*    215.1       0                N/A
Drive009                hcart*    216.4       0                N/A
Drive011                hcart*    374.6       0                N/A
Drive012                hcart*    400.9       0                N/A
Drive013                hcart*    386.2       0                N/A
Drive014                hcart*    369.0       0                N/A
Drive000                hcart*    269.9       0                N/A
Drive001                hcart*    289.8       0                N/A
Drive002                hcart*    342.6       0                N/A
Drive003                hcart*    2.5          0           18:35 03/14/2011
Drive004                hcart*    329.7       0                N/A
Drive005                hcart*    344.1       0                N/A
Drive006                hcart*    352.9       0                N/A
Drive007                hcart*    302.0       0                N/A
Drive015                hcart*    187.6       0                N/A

 

What might be the reason for this? It's odd that only the one drive that needed cleaning last night is reporting it's ever been cleaned. Could something be clearing this value?

mph999
Level 6
Employee Accredited

I'm not aware you can clear these values as such - they may reset for a particular drive if one is swapped out I've honestly no idea as I've never looked, but, as previous, not aware you can reset them to "0".

Is it possible the drives have never been cleaned by NetBackup ?

How old are the drives for example, what is the history of the environment ?  Can you say for 100% that NetBackup has previous cleaned the drives, or has the robot been cleaning the drives previously ?

If this was working fine previously, what has changed ?

From the details you have given, it certainly is the case that the robot is able to at least clean some of the drives, as it is detecting the requirement for cleaning, and, you mentioned you cleaned from the library console.

We know now, that NetBackup is also able to clean the drives also.  I am not aware that NetBackup would be fussy about only checking certain drives, that is not configerable, it is either on or off.  I know we can argue the same kind of idea from the robot side also, but we reach 'stalemate' again which is no use.

However, from the Symantec point of view ...  tape/ drive operations NetBackup does very well, and in my experience, it is usually that a problem in this area is either broken hardware or mis-configeration of some sort.  As you will appreciate, most customers of NBU clean drives, and if this were a NBU 'bug' or something, I'd probably know about it  + I checked the knowledgebase today and didn't find any hits (well I did, but not  NBU issues).  Now, I'm not saying you haven't hit some really odd unique issue that we haven't seen before, but, given what I know, my experience, the database and your explanation so far (which has been excellent) I wouldn't say NBU is the most likely cause.  I could be wrong, but hopefully you will agree that we should start with the more likely causes.  I presume your drives are working fine in other respects, and NBU sends at different times various SCSI comands to check status (eg, when mounting a tape) so this 'mechanism' would appear to be working.

I receommend therefore that we go with my previous suggestion, that is,  remove all traces of 'cleaning' from the robot, so it cannot detect when a drive needs cleaning, therefore we know the bits are not getting reset and they should therefore be 'available' for NetBackup to read.  Certainly, the drives that have needed cleaning as detected by the robot, would never be cleaned by NBU, due to the bits getting reset - we most stop this happenig and go from there.  For your reassurance, and to give credit, I chatted briefly to a BL colleague today who is very experienced in this area - therefore the direction suggested is also confirmed by BL.

Martin