Solved: PEND-TLD always for all media servers and 1 drive ...

byx · ‎07-18-2018

Hi! Need help

We have 2 drives on netbackup with 4 media servers. Several days ago all media servers have been rebooted. Since then 1 drive (HP.ULTRIUM6-SCSI.000) is always go to PEND-TLD for all 4 media servers. And another drive is OK (HP.ULTRIUM6-SCSI.001). Rebooted netbackup soft several times , but no luck.
/dev/rmt/2cbn - HP.ULTRIUM6-SCSI.001
/dev/rmt/3cbn - HP.ULTRIUM6-SCSI.000

I'm not a specialist in netbackup, but get some information

root@host1# /opt/openv/volmgr/bin/vmoprcmd -crawlreleasebyname HP.ULTRIUM6-SCSI.000
Host host1 returned: SPC-2 RESERVED
Host host2 returned: SPC-2 RESERVED
Host host3 returned: SPC-2 RESERVED
Host host4 returned: SPC-2 RESERVED

Host host1 returned: SPC-2 RESERVED

root@host1# /opt/openv/volmgr/bin/vmoprcmd -crawlreleasebyname HP.ULTRIUM6-SCSI.001
Host host1 returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found
Host host2 returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found
Host host3 returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found
Host host4 returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found

In Solaris logs :
Jul 19 08:15:16 host1 tldd[29666]: [ID 702911 daemon.notice] TLD(0) going to UP state
Jul 19 08:15:22 host1 avrd[29691]: [ID 702911 daemon.notice] st.conf configuration for HP.ULTRIUM6-SCSI.001 (device 0), name [HP Ultrium 6-SCSI ], vid [HP Ultrium 6-SCSI ], type 0x3b, block size 0, options 0x18619 (see st(7D) man page)
Jul 19 08:15:23 host1 avrd[29691]: [ID 702911 daemon.notice] st.conf configuration for HP.ULTRIUM6-SCSI.000 (device 1), name [], vid [], type 0x0, block size 0, options 0x440 (see st(7D) man page)
Jul 19 08:23:58 host1 ltid[1732]: [ID 702911 daemon.notice] Operator requested SCSI Release of Drive HP.ULTRIUM6-SCSI.000 returned RESERVATION CONFLICT
Jul 19 08:24:03 host1 ltid[1851]: [ID 702911 daemon.notice] Operator requested SCSI Release of Drive HP.ULTRIUM6-SCSI.000 returned RESERVATION CONFLICT

Seems strange for me that HP.ULTRIUM6-SCSI.001 and HP.ULTRIUM6-SCSI.000 have different driver(or what?) configuration.

Also tried this and get same result

root@host1# mt -f /dev/rmt/2cbn config
"HP Ultrium 6-SCSI", "HP Ultrium 6-SCSI ", "CFGHPULTRIUM6SCSI";
CFGHPULTRIUM6SCSI = 2,0x3B,0,0x18619,4,0x58,0x58,0x5A,0x5A,3,60,1200,600,1200,600,600,18000;
root@host1# mt -f /dev/rmt/3cbn config
"", "", "CFG";
CFG = 2,0x0,0,0x440,4,0x00,0x00,0x00,0x00,0,120,120,3600,3600,3600,3600,3600;

Not sure there the problem is.
Thanks

mph999 · ‎07-19-2018

You can certainly try those.

For SPC-2 reservations, only the HBA that made the reservation can release it - so if the reservation was made for example by a media server that is now down/ removed/ unwell .... you cannot release via commands.

I recommend using persistent reservation, this uses a numeric key, we know if we made the reservation ( the key is held on the drive firmware/ memory) and so any HBA can make it. If we do not recognize the key, we won't release it (for example, if it was made by some other application ... which would be bad, but ...)

The golden rule:

All devices (media servers , ndmp hosts etc ) that share a drive MUST use the same type of reservation. If you have some using SPC2 and others persistent, you WILL get data loss at some point.

Persistent reservation is more intellegent, it can also offer slightly more ways to troubleshot (using for example the sg_utils 3rd party commands).

View solution in original post

byx · ‎07-19-2018

Tried

root@host1# mt -f /dev/rmt/3cbn forcereserve
root@host1# mt -f /dev/rmt/3cbn status

And now all is OK, drive is working. Thanks for help!