cancel
Showing results for 
Search instead for 
Did you mean: 

Frozen Media

H_Sharma
Level 6

Hello Experts,

We have Master Server in Cluster Windows Netbackup version 7.6.0.3 (both cluster Using SCSI persistent Reserve) 6 Media Servers all windows (Using SP2 SCSI reserve). I could see many tapes getting frozen with the below error. Tape Drive is IBM TS3500 which supports Persistent Binding.

Logs from /installationpath/db/media/errors

05/08/16 22:33:45 O00463 -1 RESERVE_ERROR Drive037 0 1 0 0

I could see this reserve error from different drives.

What do you suggest what is causing problem here. We dont have frequent cluster failover that  may lead us to look into persistent binding etc

Thanks,

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

mph999
Level 6
Employee Accredited

Persistent reserve is correct for a cluster, and is more intellegent compared with SPC-2

Issue is most likely to be either something outside NBU having access to the drives (another server) or if you are running some software that 'polls' drives.  HBA issues can also cause this sort of thing as can faulty drives (though faulty drives is not likely if is happening on multiple at the same time).  Drive firmware is another consideration.

One thing it isn't is NBU, all we are doing is issue a scsi reservation, only to find that 'something' the drive already has one.  With persistent reservation a 'reservation key' is used, which is held on the drive itself.  Additionally, persistent reservation allows any machine to break the reservation (which is why it's used in a cluster), but, the reservation should only be broken by an application, if the application made the reservation in the first place.  It works this out by checking the reservation key on the drive.  The fact we're not breaking it would suggest further that we didn't make it.

I did state above this won't be NBU - one exception to that.

If there are many drives whichg are SSO between many media servers (so say 40 media servers sharing say around 50 drives, all SSO) - I've seen this cause issues with drives going in to PEND and being unusable.   If you have a sensible number of drives per media server it won't be this.

It is very difficult ot find the cause of this, there is nothing we can do in NBU, there are no tools  that we have available.

On LInux, there is a 3rd part tool(s) called sg3_utils, this has the ability to actyually tread the reservation key off a drive:

Eg.

We see in bptm log, the reservation key:

14:29:20.123 [14098] <2> io_open: SCSI PERSISTENT RESERVE (Verified reservation with key 0x01d00006 001e8488)

Using sg3_utils we can read the reservation ket straight from the drive:

dr-media1:/usr/openv/netbackup/logs/bptm # sg_persist --read-reservation -d /dev/sg5
  IBM       ULT3580-TD5       0103
  Peripheral device type: tape
  PR generation=0xb, Reservation follows:
    Key=0x1d00006001e8488
    scope: LU_SCOPE,  type: Exclusive Access

I know however you are on Windows.

I beleieve there is a utilty now, I think called sc3_utils, which is a re-write of sg3_utils for windows.  I've not used it yet but this may have the ability to read the reservation kety off the drive.  If there is no key, it 100% wasn't NBU.  If there is a key, it can be compared to those logged into bptm log on each media server, again, if it doen't match, it's not us.

You can also try vmoprcmd -crawlreleasebyname - run this from say the master.

This will go round each media server and try and release the drive.  If it was SPC-2 it would tell you which media server made the reservation (but not which application), I'm not sure woth persistent reservation if vmoprcmd still displays the server that made the reservation, as any server can break it - but worth trying.

 

 

 

 

View solution in original post

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

It seems that the default SPC-2 is enabled on Solaris media servers and Persistent SCSI Reserve only enabled on clustered master server nodes.

If memory serves me right - in SSO environment ALL media servers should have the same option selected.

So, my advice is to enable Persistent SCSI Reserve on all media servers in SSO config.

One more thing - you seem to confuse Persistent SCSI Reserve with Persistent Binding.

These are totally different topics but both extremely important.

Have a look at these (very) old but still relevant TNs:

What is Persistent Binding? What are the advantages of Persistent Binding?​
http://www.veritas.com/docs/000026930

DOCUMENTATION: Requirements for using persistent binding with the NetBackup Shared Storage Option
http://www.veritas.com/docs/000025953

View solution in original post

10 REPLIES 10

mph999
Level 6
Employee Accredited

Persistent reserve is correct for a cluster, and is more intellegent compared with SPC-2

Issue is most likely to be either something outside NBU having access to the drives (another server) or if you are running some software that 'polls' drives.  HBA issues can also cause this sort of thing as can faulty drives (though faulty drives is not likely if is happening on multiple at the same time).  Drive firmware is another consideration.

One thing it isn't is NBU, all we are doing is issue a scsi reservation, only to find that 'something' the drive already has one.  With persistent reservation a 'reservation key' is used, which is held on the drive itself.  Additionally, persistent reservation allows any machine to break the reservation (which is why it's used in a cluster), but, the reservation should only be broken by an application, if the application made the reservation in the first place.  It works this out by checking the reservation key on the drive.  The fact we're not breaking it would suggest further that we didn't make it.

I did state above this won't be NBU - one exception to that.

If there are many drives whichg are SSO between many media servers (so say 40 media servers sharing say around 50 drives, all SSO) - I've seen this cause issues with drives going in to PEND and being unusable.   If you have a sensible number of drives per media server it won't be this.

It is very difficult ot find the cause of this, there is nothing we can do in NBU, there are no tools  that we have available.

On LInux, there is a 3rd part tool(s) called sg3_utils, this has the ability to actyually tread the reservation key off a drive:

Eg.

We see in bptm log, the reservation key:

14:29:20.123 [14098] <2> io_open: SCSI PERSISTENT RESERVE (Verified reservation with key 0x01d00006 001e8488)

Using sg3_utils we can read the reservation ket straight from the drive:

dr-media1:/usr/openv/netbackup/logs/bptm # sg_persist --read-reservation -d /dev/sg5
  IBM       ULT3580-TD5       0103
  Peripheral device type: tape
  PR generation=0xb, Reservation follows:
    Key=0x1d00006001e8488
    scope: LU_SCOPE,  type: Exclusive Access

I know however you are on Windows.

I beleieve there is a utilty now, I think called sc3_utils, which is a re-write of sg3_utils for windows.  I've not used it yet but this may have the ability to read the reservation kety off the drive.  If there is no key, it 100% wasn't NBU.  If there is a key, it can be compared to those logged into bptm log on each media server, again, if it doen't match, it's not us.

You can also try vmoprcmd -crawlreleasebyname - run this from say the master.

This will go round each media server and try and release the drive.  If it was SPC-2 it would tell you which media server made the reservation (but not which application), I'm not sure woth persistent reservation if vmoprcmd still displays the server that made the reservation, as any server can break it - but worth trying.

 

 

 

 

View solution in original post

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

It seems that the default SPC-2 is enabled on Solaris media servers and Persistent SCSI Reserve only enabled on clustered master server nodes.

If memory serves me right - in SSO environment ALL media servers should have the same option selected.

So, my advice is to enable Persistent SCSI Reserve on all media servers in SSO config.

One more thing - you seem to confuse Persistent SCSI Reserve with Persistent Binding.

These are totally different topics but both extremely important.

Have a look at these (very) old but still relevant TNs:

What is Persistent Binding? What are the advantages of Persistent Binding?​
http://www.veritas.com/docs/000026930

DOCUMENTATION: Requirements for using persistent binding with the NetBackup Shared Storage Option
http://www.veritas.com/docs/000025953

View solution in original post

mph999
Level 6
Employee Accredited

Just re-read the post ... (prompeted by Mariannes comment)

(both cluster Using SCSI persistent Reserve) 6 Media Servers all windows (Using SP2 SCSI reserve). 

For any drive that is shared (SSO), each machine that sees it MUST MUST MUST MUST MUST, use the same typre of scsi reservation.

If not, at some point, you will get data loss.

Soi as per mariannes suggestion, set everything to scsi persistent, and see if that improves things.

H_Sharma
Level 6

Hi Marianne,

Media servers are on Wintel and not in clusters. 6 different media servers on windows and SSO enabled and Masters are not configured with Drives attached.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

So this is a different environment and not the cluster environment where you have been experiencing problems for a while now?

Different or same environment Windows or Solaris (or combination of different O/S's) - requirements in SSO environment remain the same :

  1. Reservation type (SCSI-2 vs SCSI persistent Reserve) must be the same on all SSO media servers.
  2. Persistent binding is a must for SAN attached tape drives to ensure device names stay the same across reboots - not related to cluster failover (see the TNs in my previous post).

 

On which media server did you copy from errors file?
(You have used a Unix path in your post???)
What type of SCSI reservation is used on this server? 

H_Sharma
Level 6

Hi Marianne,

Yes this is different environment. All are in Windows.

Master Cluster(Configured as Persistent BindingSCSI) and Media Servers 6 (Configured as SPC2/SCSC Reserve on all 6 media servers) are on windows.

I am sorry path which i mentioned is taken from one of the Windows media servers. Typo error.

So do you recommend Media servers must be in configured as Persistent binding as Master are set as Persistent binding (Masters are not working as Media servers No drives were atthced to them)?

 

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
I have tried to explain that Persistent Binding and Persistent SCSI Reservation are 2 different concepts. You need to do BOTH on all media servers.

mph999
Level 6
Employee Accredited

Persistent binding means that the OS path to the tape drive stays the same after a reboot / scsi bus reset.

If this is not set, the path to the drives can change (usually on reboot) - this then means the config of the drive in NBU is incorrect.  Persistent binding is set on the HBA, using the HBA utility (HBAnywhere or SanSurfer depending on the brand of the HBA (Qlogic or Emulex).

SCSI persistent reserve, is a type of scsi reservation.  NBU offers 3 options:

Off (Really not recommended)

SPC-2 (Default, basic but usually works ok)

Persistent (More intellegent, required for a cluster not always supported by tape drives (most do these days though) and I've seen Solaris OS have a problem with this type).  Also, some VTLs don;t like persistent reserve.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I have also tried to explain in previous posts....

.... about to give up......

mph999
Level 6
Employee Accredited

:0)