Forum Discussion

Daryl_Kinnaird's avatar
14 years ago

No log of tape being frozen

Netbackup Master server on HP-UX 11.23 IA-64  running NBU 6.5.5

I have a tape (700528) that was frozen, due to getting 3 write errors within 12 hrs.  The thing is none of the logs I checked show a message stating that the media was frozen and why.

 

output from /usr/openv/netbackup/db/error  for tape in question.

 1307587095 1 388 16 verstke3 4218141 0 0 cis004zil bptm cannot write image to media id 700528, drive index 13, I/O error
1307587103 1 386 8 verstke3 0 0 0 *NULL* bptm TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive STKT9940B1 (index 13), Media Id 700528
1307587103 1 386 16 verstke3 0 0 0 *NULL* bptm TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive STKT9940B1 (index 13), Media Id 700528 

 

output from bptm log for tape in question.

 19:38:15.361 [12954] <16> write_data: cannot write image to media id 700528, drive index 13, I/O error
19:38:23.396 [18408] <8> process_tapealert: TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive STKT9940B1 (index 13), Media Id 700528
19:38:23.398 [18408] <16> process_tapealert: TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive STKT9940B1 (index 13), Media Id 700528 

output from /usr/openv/netbackup/db/media/errors for tape in question

06/08/11 19:38:15 700528 13 WRITE_ERROR STKT9940B1
06/08/11 19:38:23 700528 13 TAPE_ALERT STKT9940B1 0x30000000 0x00000000

0x30000000 0x00000000 is equal to the two errors in the bptm log and the error log (HARD ERROR and MEDIA)

 

Status of tape from ADMIN GUI

 700528    700528    HCART2    ACS    6/8/2011 9:28    10/5/2008 9:46    6/8/2011 18:59    6/8/2011 19:00    Frozen   

I know the available_media command will show the Frozen tapes as well but it does not show why it was frozen, which is what I am looking for.  I need this so I can keep track of potential bad media so they can be replaced.                                                                                                                            

  • I have noticed the changed logging behaviour since NBU 6.x - up and until NBU 5.1, I could simply run a 'bperror -media -hoursago |grep -i freez' on a daily basis on the master and catch all tapes being frozen on all media servers.

    bperror/Tape Logs report will still log errors like these:

    FREEZING  media id B02176, it is write protected and cannot be used for backups

    FREEZING  media id B02696, it is write protected and cannot be used for backups

    FREEZING  media id B02970, too many data blocks written, check tape/driver block size configuration

    These tapes were also frozen, but the Media Log report contains ONLY the TapeAlert (the TapeAlert code is supposed to tell us that the tape has been frozen - as per the TapeAlert codes in Admin Guide II):

    TapeAlert  Code: 0x08, Type: Warning, Flag: NOT DATA GRADE, from drive IBM.ULTRIUM-TD2.005 (index 5), Media Id BO1826

    TapeAlert  Code: 0x0e, Type: Critical, Flag: UNREC. MECH. CARTRIDGE FAILURE, from drive IBM.ULTRIUM-TD2.011 (index 14), Media Id B94543

    TapeAlert  Code: 0x04, Type: Critical, Flag: MEDIA, from drive IBM.ULTRIUM-TD2.011 (index 14), Media Id B04829

  • The TapeAlert is causing the media freeze.

    Please see NBU Admin Guide II under Using TapeAlert:

    p. 192:

    A set of TapeAlert conditions are defined that can cause the media in use to be frozen. An additional set of conditions are defined that can cause a drive to be downed. Table 3-13 on page 192 describes the TapeAlert codes..

  • media was frozen and why"

    ... the fact that NetBackup is "FREEZING media id XXXXXX, etc etc etc" should be in the media servers bptm log. Maybe the logging level needs increasing? Also, OS logs?

    A few T/N's for NB7.x, but they are still relevant:

    Logs for troubleshooting frozen media
    http://www.symantec.com/business/support/index?page=content&id=HOWTO33178

    Frozen media troubleshooting considerations
    http://www.symantec.com/business/support/index?page=content&id=HOWTO33061

    About conditions that cause media to freeze
    http://www.symantec.com/business/support/index?page=content&id=HOWTO33062

  • I understand why the tape was fozen, it had 3 write erros within the default 12 hr window.  What I don't see from the logs OS or Netbakup is a statement saying the tape was frozen. 

    Another example when I checked frozen tapes today I had one 701922 that was frozen on the 12th  becaues it had 3 POSITION ERRORS.  checked the the log files (bptm) (errors) (syslog.log), see the errors in bptm, see the errors in the errors log  file.  But again no message saying that the tape is being frozen because of the errors.

  • The process that freezes a tape, or downs a drive is ltid.

    The log is 

    /usr/openv/volmgr/debug/ltid (create the directory)

    To set verbose, add

    VERBOSE

    into the /usr/openv/volmgr/vm.conf file    There is no number after VERBOSE, just the word.

    I would create these to set up a reasonable log collection for media manager operations:

    mkdir /usr/openv/volmgr/debug/ltid

     

    mkdir /usr/openv/volmgr/debug/tpcommand

     

    mkdir /usr/openv/volmgr/debug/ltid/robots

    Create these empty files :

    /usr/openv/volmgr/DRIVE_DEBUG and ROBOT_DEBUG

    Restart ltid

    stopltid

    ltid -v

    The touch files will increase the drive/ robots messages in the system log.

     

    Martin

  • I have noticed the changed logging behaviour since NBU 6.x - up and until NBU 5.1, I could simply run a 'bperror -media -hoursago |grep -i freez' on a daily basis on the master and catch all tapes being frozen on all media servers.

    bperror/Tape Logs report will still log errors like these:

    FREEZING  media id B02176, it is write protected and cannot be used for backups

    FREEZING  media id B02696, it is write protected and cannot be used for backups

    FREEZING  media id B02970, too many data blocks written, check tape/driver block size configuration

    These tapes were also frozen, but the Media Log report contains ONLY the TapeAlert (the TapeAlert code is supposed to tell us that the tape has been frozen - as per the TapeAlert codes in Admin Guide II):

    TapeAlert  Code: 0x08, Type: Warning, Flag: NOT DATA GRADE, from drive IBM.ULTRIUM-TD2.005 (index 5), Media Id BO1826

    TapeAlert  Code: 0x0e, Type: Critical, Flag: UNREC. MECH. CARTRIDGE FAILURE, from drive IBM.ULTRIUM-TD2.011 (index 14), Media Id B94543

    TapeAlert  Code: 0x04, Type: Critical, Flag: MEDIA, from drive IBM.ULTRIUM-TD2.011 (index 14), Media Id B04829

  • Added VERBOSE to the vm.conf file and added the empty DRIVE_DEBUG and ROBOT_DEBUG files.  Ltid will be stopped and started when our weekly maintenace is done which will be on Thursday.