cancel
Showing results for 
Search instead for 
Did you mean: 

Getting frequent media frozen (around 100 tapes)

A_3
Level 4
Certified

Master/Media - Linux ( NetBackup-RedHat2.6.18 7.6.1.2 )

I have unfreeze all the medias (3 days before) but again today i am seeing around 40 tapes are frozen.

Pls suggest

29 REPLIES 29

Marianne
Level 6
Partner    VIP    Accredited Certified

There are many reasons why tapes get frozen.
(e.g. more than 3 errors in 12 hours, certain TapeAlerts (faulty media), different format tapes (e.g. TAR format), tape labels changed, incorrect device mappings, write protected media, etc... etc...)

A good start is to look at Details tab of failing jobs.

Please copy the text in Job details and post here. 

For further troubleshooting, ensure that bptm log folder exists on each media server and add VERBOSE entry in /usr/openv/volmgr/vm.conf, then restart NBU. 

TN : 
How Symantec NetBackup determines if a tape should be frozen or the status of a tape drive should be changed to down, and how to change this behavior 
http://www.veritas.com/docs/000042344 

A_3
Level 4
Certified

@Maarianne: Job is getting success but only Frozen medias are many. Both Master and Media server are same.

 

Marianne
Level 6
Partner    VIP    Accredited Certified

There will still be 'something' in job details where one piece of media is attempted, frozen for whatever reason and then continued with another piece of media.

You can also look at Tape Logs report.
Use the filter option to only extract Warning and Error severity.

bptm log is plain text and can be read with any text editor.

You can copy logs to descriptive .txt files (e.g. bptm14Jan.txt bptm15Jan.txt, etc) and upload here as attachments. 

A_3
Level 4
Certified

Hi Marrianne: For bptm logs

Once I create folder bptm here /usr/openv/netbackup/logs

then need to set verose = 5 in bp.conf or /usr/openv/volmgr/vm.conf ? please confirm

Marianne
Level 6
Partner    VIP    Accredited Certified

BPTM_VERBOSE = 5 
goes into bp.conf.

This is where I have a problem with high level logging -
Veritas Support will always ask for level 5 logs.
Here on VOX most of us trying to assist do not have the time to sift through level 5 logs.
I have found that in 99% of cases that level 0 is sufficient. 
In another 0.5% of instances a bit higher logging level is needed - level 3 is fine in these instances.
In extreme cases, level 5 log is needed an a Support Call with Veritas is the only way to go forward.

So, if you want us here on VOX to assist - no higher than a level 3, please. 
NBU does not need a a restart for adding or changing this logging level.

VERBOSE entry in vm.conf only has this one word on a new line. No 'level'.
This will log device-level errors in /var/log/messages.
NBU (or at least ltid) must be restarted.

Have you had a look at Tape Log Report yet?
There may be enough info already in this report to know what is causing the problem.

 

Nicolai
Moderator
Moderator
Partner    VIP   

what does the command below say :

dmesg | grep ^st

Genericus
Moderator
Moderator
   VIP   

Take a look at your drives - FROZEN is caused when you exceed a limit of too many errors within a certain time.

However - it may not be a tape issue it can be a bad drive!

if a drive gets a worn read/write head, it will detect errors, and freeze tapes.

If a tape gets stuck in a drive, and NetBackup is unaware if the issue - every time it tries to load a tape in that drive, it fails, and NetBackup blames the tape!

My robot has a SLConsole I can monitor the tape and drive errors, in combination with the Problem report from NetBackup ( filter for contains TapeAlert ) -  here is the key!

Compare the errors and drives - many tape errors on one drive = DRIVE ISSUE

many drive errors on one tape = TAPE ISSUE

 

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Genericus
Moderator
Moderator
   VIP   

Also - there have been firmware issues where a tape formatted or initially written at one level, is unreadable at another one - check with your tape/drive vendor and confirm this is not the case.

 

What has changed?

 

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

mph999
Level 6
Employee Accredited

From each media server that shows the symptoms, please attach to this post the /usr/openv/netbackup/db/media/errors file.

bptm log is also good, um, I like the higher VERBOSE levels, but at the risk of being beaten up by Marianne, I'll agree to whatever she suggests ...

/usr/openv/volmgr/debug/tpcommand and /usr/open/volmgr/debug/robots log are also good, also add VERBOSE (no number, just the word) into /usr/openv/volmgr/vmconf and restart ltid:

/usr/openv/volmgr/bin/stopltid  (wait a few moments)

/usr/openv/volmgr/bin/ltid -v

Maybe don't really need those, but sometimes you get more info, depending on what is going on.

OS messages log is good also.

Word of warning about unfreezing tapes ...  NBU freezes tapes / downs drives if it thinks there is an issue.  If media is unfrozen without the cause being found, there is a 'remote' change that is the drives are bad, they could damage the media.  If you then re-use this media, over and over, it could damge other drives, which in turn damages other tapes.

It's rare, but does happpen.

Up until about last month I'd only seen it myself a few times.  last month I was involved in a case where exactly this had happened, drive(s) all damaged, all media damaged.

Marianne
Level 6
Partner    VIP    Accredited Certified

@A_3  have you tried to run Media/Tape Logs report yet?

Or try this command (adjust hoursago to include the time that media got frozen):

/usr/openv/netbackup/bin/admincmd/bperror -media -hoursago 72 |grep -i freez

 (grep freez because the message can be 'Freeze media id.... ' or 'FREEZING media id ....')

Sometimes tapes are automatically frozen due to certain TapeAlerts. Try this as well:

/usr/openv/netbackup/bin/admincmd/bperror -media -hoursago 72 |grep -i tapealert

Please share the output of above commands.

A_3
Level 4
Certified

[root@master admincmd]# bperror -media -hoursago 72 |grep -i freez [root@master admincmd]# bperror -media -hoursago 7200 |grep -i freez 1483219015 1 388 8 fr0-nbuapm88-p01 1553586 1553586 0 fr0-lxdodad-v22 bptm FREEZING media id B02111, it contains ANSI-format data and cannot be used for backups 1483225325 1 388 8 nbuadv2 1553555 1553552 0 fr0-lxdodba-p09 bptm FREEZING media id B00132, it contains ANSI-format data and cannot be used for backups 1483225432 1 388 16 nbuadv2 1553555 1553552 0 fr0-lxdodba-p09 bptm incorrect media found in drive index 33, expected B01321, found TIME, FREEZING B01321 1483225539 1 388 8 nbuadv2 1553555 1553552 0 fr0-lxdodba-p09 bptm FREEZING media id B00900, it contains ANSI-format data and cannot be used for backups 1483225647 1 388 8 nbuadv2 1553555 1553552 0 fr0-lxdodba-p09 bptm FREEZING media id B02648, it contains ANSI-format data and cannot be used for backups 1483225754 1 388 8 nbuadv2 1553555 1553552 0 fr0-lxdodba-p09 bptm FREEZING media id B00895, it contains ANSI-format data and cannot be used for backups 1483225857 1 388 8 nbuadv2 1553555 1553552 0 fr0-lxdodba-p09 bptm FREEZING media id B00894, it contains ANSI-format data and cannot be used for backups 1483225962 1 388 16 nbuadv2 1553555 1553552 0 fr0-lxdodba-p09 bptm incorrect media found in drive index 33, expected B01710, found TIME, FREEZING B01710 1483226063 1 388 16 nbuadv2 1553555 1553552 0 fr0-lxdodba-p09 bptm incorrect media found in drive index 33, expected B01716, found TIME, FREEZING B01716 1483226189 1 388 16 nbuadv2 1553555 1553552 0 fr0-lxdodba-p09 bptm incorrect media found in drive index 33, expected B01726, found TIME, FREEZING B01726 1483226239 1 388 8 fr0-nbuapm88-p02 1553965 0 0 fr0-lxdodba-p07 bptm FREEZING media id B00682, it contains ANSI-format data and cannot be used for backups 1484333032 1 388 8 fr0-nbuapm88-p02 1626101 0 0 fr0-lxctmcs-p07 bptm FREEZING media id B02111, it contains ANSI-format data and cannot be used for backups 1484371671 1 388 8 nbuadv2 1630308 1628572 0 fr0-lxdodba-p06 bptm FREEZING media id B00682, it contains ANSI-format data and cannot be used for backups 1484371783 1 388 8 nbuadv2 1630308 1628572 0 fr0-lxdodba-p06 bptm FREEZING media id B00132, it contains ANSI-format data and cannot be used for backups 1484371909 1 388 16 nbuadv2 1630308 1628572 0 fr0-lxdodba-p06 bptm incorrect media found in drive index 36, expected B01321, found TIME, FREEZING B01321 1484372008 1 388 8 nbuadv2 1630308 1628572 0 fr0-lxdodba-p06 bptm FREEZING media id B00900, it contains ANSI-format data and cannot be used for backups 1484372125 1 388 8 nbuadv2 1630308 1628572 0 fr0-lxdodba-p06 bptm FREEZING media id B02648, it contains ANSI-format data and cannot be used for backups 1484372230 1 388 8 nbuadv2 1630308 1628572 0 fr0-lxdodba-p06 bptm FREEZING media id B00895, it contains ANSI-format data and cannot be used for backups 1484372334 1 388 8 nbuadv2 1630308 1628572 0 fr0-lxdodba-p06 bptm FREEZING media id B00894, it contains ANSI-format data and cannot be used for backups 1484372451 1 388 16 nbuadv2 1630308 1628572 0 fr0-lxdodba-p06 bptm incorrect media found in drive index 36, expected B01710, found TIME, FREEZING B01710 1484372569 1 388 16 nbuadv2 1630308 1628572 0 fr0-lxdodba-p06 bptm incorrect media found in drive index 36, expected B01716, found TIME, FREEZING B01716 1484372698 1 388 16 nbuadv2 1630308 1628572 0 fr0-lxdodba-p06 bptm incorrect media found in drive index 36, expected B01726, found TIME, FREEZING B01726 [root@master admincmd]#

 

----------------------------

[root@master admincmd]# bperror -media -hoursago 72 |grep -i tapealert
1484393825 1 388 16 fr0-nbuapa29-p02 1630230 0 0 fr0-lxdodad-p03 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B01135
1484397993 1 388 16 fr0-nbuapa29-p02 1630238 0 0 fr0-atvlive-p02 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B02265
1484431722 1 386 16 fr0-nbuapm88-p01 0 0 0 *NULL* bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B01136
1484434740 1 388 16 fr0-nbuapm88-p01 1634400 1634400 0 p595n04 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B00708
1484439449 1 388 16 fr0-nbuapm88-p04 1634677 0 0 fr0-closmessp-p bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B00410
1484443442 1 388 16 fr0-nbuapm88-p04 1634686 0 0 fr0-uprep-p04 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B00378
1484446809 1 388 16 fr0-nbuapa29-p01 1634666 0 0 fr0-lxdodad-p30 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B00559
1484449281 1 388 16 fr0-nbuapm88-p01 1634426 1634426 0 fr0-lxdodad-v22 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B01543
1484450044 1 388 16 fr0-nbuapm88-p01 1635189 1635189 0 fr0-sed01 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B00532
1484455012 1 386 16 fr0-nbuapm88-p01 0 0 0 *NULL* bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B00645
1484456511 1 388 16 fr0-nbuapm88-p01 1634227 0 0 fr0-dodbam-p01 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B00499
1484461502 1 388 16 fr0-nbuapm01-p02 1635210 0 0 fr0-lxdodba-p08 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B00380
1484494769 1 388 16 nbumtr1 1635511 1635504 0 nbumtr1 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 8), Media Id B00818
1484508658 1 388 16 fr0-nbuapm88-p01 1635834 1635834 0 p595dodap02 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B02026
1484541258 1 386 16 fr0-nbuapm01-p01 0 0 0 *NULL* bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B01494
1484597197 1 388 16 fr0-nbuapa29-p02 1640269 0 0 fr0-lxdodad-p03 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B02664
1484635127 1 388 16 fr0-nbuapm01-p01 1643228 0 0 fr0-lxdodad-p10 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B00822
1484641712 1 388 16 fr0-nbuapm01-p03 1644580 0 0 fr0-asi-p37 bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive B4_LTO4_2_0_1_5 (index 6), Media Id B00714
[root@master admincmd]#

Judging by the messages, there is data on these tapes that is not in netbackup format. Something else must 've written data on the tapes that are getting frozen. Are you using another backup solution (Backup Exec, System recovery, ...)?

You can get them outside of your netbackup pool and do not allow netbackup to use them, or you can format them.

A_3
Level 4
Certified

No These are used only for Netbackup.

How you are confirming that other data is writting to this ? Please tell me

A 'tidy-up' of the "grep -i freez" output of bperror:

FREEZING media id B00132, it contains ANSI-format data and cannot be used for backups
FREEZING media id B00682, it contains ANSI-format data and cannot be used for backups
FREEZING media id B00894, it contains ANSI-format data and cannot be used for backups
FREEZING media id B00895, it contains ANSI-format data and cannot be used for backups
FREEZING media id B00900, it contains ANSI-format data and cannot be used for backups
FREEZING media id B02111, it contains ANSI-format data and cannot be used for backups
FREEZING media id B02648, it contains ANSI-format data and cannot be used for backups
incorrect media found in drive index 33, expected B01321, found TIME, FREEZING B01321
incorrect media found in drive index 33, expected B01710, found TIME, FREEZING B01710
incorrect media found in drive index 33, expected B01716, found TIME, FREEZING B01716
incorrect media found in drive index 33, expected B01726, found TIME, FREEZING B01726

- looks like 7 media are reported as containing ANSI-format data and 4 media are incorrectly labelled (as "TIME")

The output of the "grep -i tapealert" seems to indicate a possible drive issue as all but one (index 8) relate to one drive (index 6) ... suprised it hasn't DOWNED?

Marianne
Level 6
Partner    VIP    Accredited Certified

You firstly need to find out where these pieces of media came from - why do some of them have ANSI headers?

Why do some of them have internal label of TIME ?

Did you receive 2nd-hand tapes from somewhere else?

If you are 100% they are not needed somewhere else, you can do the following:

Unfreeze all of the frozen media.

In Host Properties -> Media Servers (select Master server) -> Media -> Allow Media Overwrite:
Select ANSI (and any other tape format that might be in your environment).
Click OK.

For the 4 media-id's with 'TIME' internal label: 
Select the Media-id in Media section of the GUI, right-click, select Label.
In the next screen, de-select the 'Verify label' option.

 

A_3
Level 4
Certified

No now all drives are up..

Now what is the action plan for these ? How can i proceed next ?

Marianne
Level 6
Partner    VIP    Accredited Certified

Have you seen my previous post?

Let's first deal with frozen tapes...

 

Marianne
Level 6
Partner    VIP    Accredited Certified

About the TapeAlerts and DOWN drives - it seems you have a bunch of tapes in the environment that are old and should be discarded. 

See this TN for explanation of TapeAlerts: http://www.veritas.com/docs/000005226

extract: 
0x04: 'Media Performance Degraded, Data Is At Risk',