Forum Discussion

byx's avatar
byx
Level 2
7 years ago

PEND-TLD always for all media servers and 1 drive after rebot

Hi! Need help

We have 2 drives on netbackup with 4 media servers. Several days ago all media servers have been rebooted. Since then 1 drive (HP.ULTRIUM6-SCSI.000) is always go to PEND-TLD for all 4 media servers. And another drive is OK (HP.ULTRIUM6-SCSI.001). Rebooted netbackup soft several times , but no luck.
/dev/rmt/2cbn - HP.ULTRIUM6-SCSI.001
/dev/rmt/3cbn - HP.ULTRIUM6-SCSI.000

I'm not a specialist in netbackup, but get some information

root@host1# /opt/openv/volmgr/bin/vmoprcmd -crawlreleasebyname HP.ULTRIUM6-SCSI.000
  Host host1 returned: SPC-2 RESERVED
  Host host2 returned: SPC-2 RESERVED
  Host host3 returned: SPC-2 RESERVED
  Host host4 returned: SPC-2 RESERVED

  Host host1 returned: SPC-2 RESERVED

root@host1# /opt/openv/volmgr/bin/vmoprcmd -crawlreleasebyname HP.ULTRIUM6-SCSI.001
  Host host1 returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found
  Host host2 returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found
  Host host3 returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found
  Host host4 returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found

In Solaris logs :
  Jul 19 08:15:16 host1 tldd[29666]: [ID 702911 daemon.notice] TLD(0) going to UP state
  Jul 19 08:15:22 host1 avrd[29691]: [ID 702911 daemon.notice] st.conf configuration for HP.ULTRIUM6-SCSI.001 (device 0), name [HP Ultrium 6-SCSI ], vid [HP Ultrium 6-SCSI ], type 0x3b, block size 0, options 0x18619 (see st(7D) man page)
  Jul 19 08:15:23 host1 avrd[29691]: [ID 702911 daemon.notice] st.conf configuration for HP.ULTRIUM6-SCSI.000 (device 1), name [], vid [], type 0x0, block size 0, options 0x440 (see st(7D) man page)
  Jul 19 08:23:58 host1 ltid[1732]: [ID 702911 daemon.notice] Operator requested SCSI Release of Drive HP.ULTRIUM6-SCSI.000 returned RESERVATION CONFLICT
  Jul 19 08:24:03 host1 ltid[1851]: [ID 702911 daemon.notice] Operator requested SCSI Release of Drive HP.ULTRIUM6-SCSI.000 returned RESERVATION CONFLICT

Seems strange for me that HP.ULTRIUM6-SCSI.001 and HP.ULTRIUM6-SCSI.000 have different driver(or what?) configuration.

Also tried this and get same result

root@host1# mt -f /dev/rmt/2cbn config
  "HP Ultrium 6-SCSI", "HP Ultrium 6-SCSI ", "CFGHPULTRIUM6SCSI";
  CFGHPULTRIUM6SCSI = 2,0x3B,0,0x18619,4,0x58,0x58,0x5A,0x5A,3,60,1200,600,1200,600,600,18000;
root@host1# mt -f /dev/rmt/3cbn config
  "", "", "CFG";
  CFG = 2,0x0,0,0x440,4,0x00,0x00,0x00,0x00,0,120,120,3600,3600,3600,3600,3600;

Not sure there the problem is.
Thanks

  • You can certainly try those.

    For SPC-2 reservations, only the HBA that made the reservation can release it - so if the reservation was made for example by a media server that is now down/ removed/ unwell ....  you cannot release via commands.

    I recommend using persistent reservation, this uses a numeric key, we know if we made the reservation ( the key is held on the drive firmware/ memory) and so any HBA can make it.  If we do not recognize the key, we won't release it (for example, if it was made by some other application ... which would be bad, but ...)

    The golden rule:

    All devices (media servers , ndmp hosts etc ) that share a drive MUST use the same type of reservation.  If you have some using SPC2 and others persistent, you WILL get data loss at some point.

    Persistent reservation is more intellegent, it can also offer slightly more ways to troubleshot (using for example the sg_utils 3rd party commands).

  • byx's avatar
    byx
    7 years ago

    Tried 

    root@host1# mt -f /dev/rmt/3cbn forcereserve
    root@host1# mt -f /dev/rmt/3cbn status

    And now all is OK, drive is working. Thanks for help!

  • If crawlreleasebyname doesn't release it, try power cycling the drive.  This needs to be a real power cycle, not some fluufy soft 'power cycle' from the library console ....

    • byx's avatar
      byx
      Level 2

      Thanks, may be have an opportunity to power cycle only 2 weeks later. How do you think, may be such commands can help to run from all media servers or they are identical to crawlreleasebyname?

      To break an SPC-2 reservation on Solaris

      1. Issue mt -f drive_path_name forcereserve.
      2. Issue mt -f drive_path_name release.

        See the mt(1) man page for more information.

      • mph999's avatar
        mph999
        Level 6

        You can certainly try those.

        For SPC-2 reservations, only the HBA that made the reservation can release it - so if the reservation was made for example by a media server that is now down/ removed/ unwell ....  you cannot release via commands.

        I recommend using persistent reservation, this uses a numeric key, we know if we made the reservation ( the key is held on the drive firmware/ memory) and so any HBA can make it.  If we do not recognize the key, we won't release it (for example, if it was made by some other application ... which would be bad, but ...)

        The golden rule:

        All devices (media servers , ndmp hosts etc ) that share a drive MUST use the same type of reservation.  If you have some using SPC2 and others persistent, you WILL get data loss at some point.

        Persistent reservation is more intellegent, it can also offer slightly more ways to troubleshot (using for example the sg_utils 3rd party commands).