Forum Discussion

rsm_gbg's avatar
rsm_gbg
Level 5
11 years ago

Tape mount problems

I got a NBU master server and a media server, the media server has a SL48 robot with 2 SCSI-LTO4

All my backups are running just fine, mounting tapes all the time, except my Vault_catalogbackup.

When the catalog backup starts it tries to mount 2 inline copies, NLTxxx NRTxxx

Looking at the NBU logs it seems it can't mount any of them, timing out.

The strange thing is that it works ~ 4 days a week and fails ~ 3

 

Server: bpkar log

22:00:48.129 [21007] <2> bpbkar resolve_path: INF - Actual mount point of /opt/openv/db/staging is /opt/openv/db/staging
22:00:48.129 [21007] <2> bpbkar SelectFile: INF - Resolved_path = /opt/openv/db/staging/DBM_DATA.db
22:00:48.130 [21007] <4> bpbkar PrintFile: /opt/openv/db/staging/
22:01:40.949 [21007] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 32: Broken pipe
22:01:40.950 [21007] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 24: socket write failed
22:01:40.950 [21007] <4> bpbkar Exit: INF - EXIT STATUS 24: socket write failed
22:01:40.950 [21007] <2> bpbkar Exit: INF - Close of stdout complete
22:01:40.950 [21007] <4> bpbkar Exit: INF - setenv FINISHED=0
 

media server: bptm log

22:00:47.877 [12382] <2> tapelib: wait_for_ltid, Mount, timeout 0
22:01:10.384 [12397] <2> bptm: INITIATING (VERBOSE = 5): -rptdrv -jobid -1402269145 -jm
22:01:10.384 [12397] <2> bptm: PORT_STATUS = 0x00000000
22:01:10.384 [12397] <2> drivename_open: Called with Create 0, file HP.ULTRIUM4-SCSI.001
22:01:10.384 [12397] <2> drivename_checklock: Called
22:01:10.384 [12397] <2> drivename_checklock: PID 12382 has lock
22:01:10.384 [12397] <2> report_drives: DRIVE = HP.ULTRIUM4-SCSI.001 LOCK = TRUE CURTIME = 1403006470
22:01:10.384 [12397] <2> report_drives: MODE = 0
22:01:10.384 [12397] <2> report_drives: TIME = 1403006446
22:01:10.384 [12397] <2> report_drives: MASTER = ipndms
22:01:10.384 [12397] <2> report_drives: SR_KEY = 0 1
22:01:10.384 [12397] <2> report_drives: PATH = /dev/rmt/1cbn
22:01:10.384 [12397] <2> report_drives: MEDIA = NLT004
22:01:10.384 [12397] <2> report_drives: REQID = -1402269143
22:01:10.384 [12397] <2> report_drives: ALOCID = 139498
22:01:10.384 [12397] <2> report_drives: RBID = {F825C9EE-F616-11E3-894C-00144FF80172}
22:01:10.385 [12397] <2> report_drives: PID = 12382
22:01:10.385 [12397] <2> report_drives: FILE = /usr/openv/netbackup/db/media/tpreq/drive_HP.ULTRIUM4-SCSI.001
22:01:10.385 [12397] <2> drivename_open: Called with Create 0, file HP.ULTRIUM4-SCSI.002
22:01:10.385 [12397] <2> drivename_checklock: Called
22:01:10.385 [12397] <2> drivename_checklock: PID 12382 has lock
22:01:10.385 [12397] <2> report_drives: DRIVE = HP.ULTRIUM4-SCSI.002 LOCK = TRUE CURTIME = 1403006470
22:01:10.385 [12397] <2> report_drives: MODE = 0
22:01:10.385 [12397] <2> report_drives: TIME = 1403006446
22:01:10.385 [12397] <2> report_drives: MASTER = ipndms
22:01:10.385 [12397] <2> report_drives: SR_KEY = 0 1
22:01:10.385 [12397] <2> report_drives: PATH = /dev/rmt/0cbn
22:01:10.385 [12397] <2> report_drives: MEDIA = NRT101
22:01:10.385 [12397] <2> report_drives: REQID = -1402269143
22:01:10.385 [12397] <2> report_drives: ALOCID = 139499
22:01:10.385 [12397] <2> report_drives: RBID = {F82ACD18-F616-11E3-A122-00144FF80172}
22:01:10.385 [12397] <2> report_drives: PID = 12382
22:01:10.385 [12397] <2> report_drives: FILE = /usr/openv/netbackup/db/media/tpreq/drive_HP.ULTRIUM4-SCSI.002
22:01:10.385 [12397] <2> main: Sending [EXIT STATUS 0] to NBJM
22:01:10.385 [12397] <2> bptm: EXITING with status 0 <----------
22:01:40.987 [12382] <2> Media_signal_poll: 2:Terminate detected (tapelib.c:615)
22:01:40.987 [12382] <2> mount_open_media: mount canceled detected in tpreq(), signo = 1
22:01:40.987 [12382] <2> set_job_details: Tfile (61970): LOG 1403006500 16 bptm 12382 media manager terminated during mount of media id NLT004, possible media mount timeout

I have tried freezing certain tapes to lock out particular tape problems, doesnt' matter what I do.

Checking the SL48 Library logs tells me nothing either.

Any ideas what to do next?

- Roland
 

  • Hi,

     

    This post got a bit sidelined...

    Yes I think I solved it, think...

    There was a stray sticker that had went off a tape inside the library.
    It was discovered when a tape was ejected without a sticker!
    After pulling the library apart and removed the sticker the problem dissappeared.

    For what ever reason the sticker must have been in a very odd spot to cause such strange behaviour.
    And why just the Vault?
    Maybe vault does a scan of all the tapes and thus hit the sticker "bug"  all the time?
    One could only speculate!

    - Roland

12 Replies

Replies have been turned off for this discussion