06-10-2011 09:35 AM
Netbackup Master server on HP-UX 11.23 IA-64 running NBU 6.5.5
I have a tape (700528) that was frozen, due to getting 3 write errors within 12 hrs. The thing is none of the logs I checked show a message stating that the media was frozen and why.
output from /usr/openv/netbackup/db/error for tape in question.
1307587095 1 388 16 verstke3 4218141 0 0 cis004zil bptm cannot write image to media id 700528, drive index 13, I/O error
1307587103 1 386 8 verstke3 0 0 0 *NULL* bptm TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive STKT9940B1 (index 13), Media Id 700528
1307587103 1 386 16 verstke3 0 0 0 *NULL* bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive STKT9940B1 (index 13), Media Id 700528
output from bptm log for tape in question.
19:38:15.361 [12954] <16> write_data: cannot write image to media id 700528, drive index 13, I/O error
19:38:23.396 [18408] <8> process_tapealert: TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive STKT9940B1 (index 13), Media Id 700528
19:38:23.398 [18408] <16> process_tapealert: TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive STKT9940B1 (index 13), Media Id 700528
output from /usr/openv/netbackup/db/media/errors for tape in question
06/08/11 19:38:15 700528 13 WRITE_ERROR STKT9940B1
06/08/11 19:38:23 700528 13 TAPE_ALERT STKT9940B1 0x30000000 0x00000000
0x30000000 0x00000000 is equal to the two errors in the bptm log and the error log (HARD ERROR and MEDIA)
Status of tape from ADMIN GUI
700528 700528 HCART2 ACS 6/8/2011 9:28 10/5/2008 9:46 6/8/2011 18:59 6/8/2011 19:00 Frozen
I know the available_media command will show the Frozen tapes as well but it does not show why it was frozen, which is what I am looking for. I need this so I can keep track of potential bad media so they can be replaced.
Solved! Go to Solution.
06-13-2011 11:49 AM
I have noticed the changed logging behaviour since NBU 6.x - up and until NBU 5.1, I could simply run a 'bperror -media -hoursago |grep -i freez' on a daily basis on the master and catch all tapes being frozen on all media servers.
bperror/Tape Logs report will still log errors like these:
FREEZING media id B02176, it is write protected and cannot be used for backups
FREEZING media id B02696, it is write protected and cannot be used for backups
FREEZING media id B02970, too many data blocks written, check tape/driver block size configuration
These tapes were also frozen, but the Media Log report contains ONLY the TapeAlert (the TapeAlert code is supposed to tell us that the tape has been frozen - as per the TapeAlert codes in Admin Guide II):
TapeAlert Code: 0x08, Type: Warning, Flag: NOT DATA GRADE, from drive IBM.ULTRIUM-TD2.005 (index 5), Media Id BO1826
TapeAlert Code: 0x0e, Type: Critical, Flag: UNREC. MECH. CARTRIDGE FAILURE, from drive IBM.ULTRIUM-TD2.011 (index 14), Media Id B94543
TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive IBM.ULTRIUM-TD2.011 (index 14), Media Id B04829
06-10-2011 11:22 AM
The TapeAlert is causing the media freeze.
Please see NBU Admin Guide II under Using TapeAlert:
p. 192:
A set of TapeAlert conditions are defined that can cause the media in use to be frozen. An additional set of conditions are defined that can cause a drive to be downed. Table 3-13 on page 192 describes the TapeAlert codes..
06-13-2011 07:22 AM
media was frozen and why"
... the fact that NetBackup is "FREEZING media id XXXXXX, etc etc etc" should be in the media servers bptm log. Maybe the logging level needs increasing? Also, OS logs?
A few T/N's for NB7.x, but they are still relevant:
Logs for troubleshooting frozen media
http://www.symantec.com/business/support/index?page=content&id=HOWTO33178
Frozen media troubleshooting considerations
http://www.symantec.com/business/support/index?page=content&id=HOWTO33061
About conditions that cause media to freeze
http://www.symantec.com/business/support/index?page=content&id=HOWTO33062
06-13-2011 09:31 AM
I understand why the tape was fozen, it had 3 write erros within the default 12 hr window. What I don't see from the logs OS or Netbakup is a statement saying the tape was frozen.
Another example when I checked frozen tapes today I had one 701922 that was frozen on the 12th becaues it had 3 POSITION ERRORS. checked the the log files (bptm) (errors) (syslog.log), see the errors in bptm, see the errors in the errors log file. But again no message saying that the tape is being frozen because of the errors.
06-13-2011 10:37 AM
The process that freezes a tape, or downs a drive is ltid.
The log is
/usr/openv/volmgr/debug/ltid (create the directory)
To set verbose, add
VERBOSE
into the /usr/openv/volmgr/vm.conf file There is no number after VERBOSE, just the word.
I would create these to set up a reasonable log collection for media manager operations:
mkdir /usr/openv/volmgr/debug/ltid
mkdir /usr/openv/volmgr/debug/tpcommand
mkdir /usr/openv/volmgr/debug/ltid/robots
Create these empty files :
/usr/openv/volmgr/DRIVE_DEBUG and ROBOT_DEBUG
Restart ltid
stopltid
ltid -v
The touch files will increase the drive/ robots messages in the system log.
Martin
06-13-2011 11:49 AM
I have noticed the changed logging behaviour since NBU 6.x - up and until NBU 5.1, I could simply run a 'bperror -media -hoursago |grep -i freez' on a daily basis on the master and catch all tapes being frozen on all media servers.
bperror/Tape Logs report will still log errors like these:
FREEZING media id B02176, it is write protected and cannot be used for backups
FREEZING media id B02696, it is write protected and cannot be used for backups
FREEZING media id B02970, too many data blocks written, check tape/driver block size configuration
These tapes were also frozen, but the Media Log report contains ONLY the TapeAlert (the TapeAlert code is supposed to tell us that the tape has been frozen - as per the TapeAlert codes in Admin Guide II):
TapeAlert Code: 0x08, Type: Warning, Flag: NOT DATA GRADE, from drive IBM.ULTRIUM-TD2.005 (index 5), Media Id BO1826
TapeAlert Code: 0x0e, Type: Critical, Flag: UNREC. MECH. CARTRIDGE FAILURE, from drive IBM.ULTRIUM-TD2.011 (index 14), Media Id B94543
TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive IBM.ULTRIUM-TD2.011 (index 14), Media Id B04829
07-11-2011 10:57 AM
Added VERBOSE to the vm.conf file and added the empty DRIVE_DEBUG and ROBOT_DEBUG files. Ltid will be stopped and started when our weekly maintenace is done which will be on Thursday.
07-11-2011 11:58 AM
If you have NOM you can setup an alert to tell you when a tapes is frozen.