05-08-2016 07:45 PM
Hello Experts,
We have Master Server in Cluster Windows Netbackup version 7.6.0.3 (both cluster Using SCSI persistent Reserve) 6 Media Servers all windows (Using SP2 SCSI reserve). I could see many tapes getting frozen with the below error. Tape Drive is IBM TS3500 which supports Persistent Binding.
Logs from /installationpath/db/media/errors
05/08/16 22:33:45 O00463 -1 RESERVE_ERROR Drive037 0 1 0 0
I could see this reserve error from different drives.
What do you suggest what is causing problem here. We dont have frequent cluster failover that may lead us to look into persistent binding etc
Thanks,
Solved! Go to Solution.
05-08-2016 11:40 PM
Persistent reserve is correct for a cluster, and is more intellegent compared with SPC-2
Issue is most likely to be either something outside NBU having access to the drives (another server) or if you are running some software that 'polls' drives. HBA issues can also cause this sort of thing as can faulty drives (though faulty drives is not likely if is happening on multiple at the same time). Drive firmware is another consideration.
One thing it isn't is NBU, all we are doing is issue a scsi reservation, only to find that 'something' the drive already has one. With persistent reservation a 'reservation key' is used, which is held on the drive itself. Additionally, persistent reservation allows any machine to break the reservation (which is why it's used in a cluster), but, the reservation should only be broken by an application, if the application made the reservation in the first place. It works this out by checking the reservation key on the drive. The fact we're not breaking it would suggest further that we didn't make it.
I did state above this won't be NBU - one exception to that.
If there are many drives whichg are SSO between many media servers (so say 40 media servers sharing say around 50 drives, all SSO) - I've seen this cause issues with drives going in to PEND and being unusable. If you have a sensible number of drives per media server it won't be this.
It is very difficult ot find the cause of this, there is nothing we can do in NBU, there are no tools that we have available.
On LInux, there is a 3rd part tool(s) called sg3_utils, this has the ability to actyually tread the reservation key off a drive:
Eg.
We see in bptm log, the reservation key:
14:29:20.123 [14098] <2> io_open: SCSI PERSISTENT RESERVE (Verified reservation with key 0x01d00006 001e8488)
Using sg3_utils we can read the reservation ket straight from the drive:
dr-media1:/usr/openv/netbackup/logs/bptm # sg_persist --read-reservation -d /dev/sg5
IBM ULT3580-TD5 0103
Peripheral device type: tape
PR generation=0xb, Reservation follows:
Key=0x1d00006001e8488
scope: LU_SCOPE, type: Exclusive Access
I know however you are on Windows.
I beleieve there is a utilty now, I think called sc3_utils, which is a re-write of sg3_utils for windows. I've not used it yet but this may have the ability to read the reservation kety off the drive. If there is no key, it 100% wasn't NBU. If there is a key, it can be compared to those logged into bptm log on each media server, again, if it doen't match, it's not us.
You can also try vmoprcmd -crawlreleasebyname - run this from say the master.
This will go round each media server and try and release the drive. If it was SPC-2 it would tell you which media server made the reservation (but not which application), I'm not sure woth persistent reservation if vmoprcmd still displays the server that made the reservation, as any server can break it - but worth trying.
05-09-2016 12:15 AM
It seems that the default SPC-2 is enabled on Solaris media servers and Persistent SCSI Reserve only enabled on clustered master server nodes.
If memory serves me right - in SSO environment ALL media servers should have the same option selected.
So, my advice is to enable Persistent SCSI Reserve on all media servers in SSO config.
One more thing - you seem to confuse Persistent SCSI Reserve with Persistent Binding.
These are totally different topics but both extremely important.
Have a look at these (very) old but still relevant TNs:
What is Persistent Binding? What are the advantages of Persistent Binding?
http://www.veritas.com/docs/000026930
DOCUMENTATION: Requirements for using persistent binding with the NetBackup Shared Storage Option
http://www.veritas.com/docs/000025953
05-08-2016 11:40 PM
Persistent reserve is correct for a cluster, and is more intellegent compared with SPC-2
Issue is most likely to be either something outside NBU having access to the drives (another server) or if you are running some software that 'polls' drives. HBA issues can also cause this sort of thing as can faulty drives (though faulty drives is not likely if is happening on multiple at the same time). Drive firmware is another consideration.
One thing it isn't is NBU, all we are doing is issue a scsi reservation, only to find that 'something' the drive already has one. With persistent reservation a 'reservation key' is used, which is held on the drive itself. Additionally, persistent reservation allows any machine to break the reservation (which is why it's used in a cluster), but, the reservation should only be broken by an application, if the application made the reservation in the first place. It works this out by checking the reservation key on the drive. The fact we're not breaking it would suggest further that we didn't make it.
I did state above this won't be NBU - one exception to that.
If there are many drives whichg are SSO between many media servers (so say 40 media servers sharing say around 50 drives, all SSO) - I've seen this cause issues with drives going in to PEND and being unusable. If you have a sensible number of drives per media server it won't be this.
It is very difficult ot find the cause of this, there is nothing we can do in NBU, there are no tools that we have available.
On LInux, there is a 3rd part tool(s) called sg3_utils, this has the ability to actyually tread the reservation key off a drive:
Eg.
We see in bptm log, the reservation key:
14:29:20.123 [14098] <2> io_open: SCSI PERSISTENT RESERVE (Verified reservation with key 0x01d00006 001e8488)
Using sg3_utils we can read the reservation ket straight from the drive:
dr-media1:/usr/openv/netbackup/logs/bptm # sg_persist --read-reservation -d /dev/sg5
IBM ULT3580-TD5 0103
Peripheral device type: tape
PR generation=0xb, Reservation follows:
Key=0x1d00006001e8488
scope: LU_SCOPE, type: Exclusive Access
I know however you are on Windows.
I beleieve there is a utilty now, I think called sc3_utils, which is a re-write of sg3_utils for windows. I've not used it yet but this may have the ability to read the reservation kety off the drive. If there is no key, it 100% wasn't NBU. If there is a key, it can be compared to those logged into bptm log on each media server, again, if it doen't match, it's not us.
You can also try vmoprcmd -crawlreleasebyname - run this from say the master.
This will go round each media server and try and release the drive. If it was SPC-2 it would tell you which media server made the reservation (but not which application), I'm not sure woth persistent reservation if vmoprcmd still displays the server that made the reservation, as any server can break it - but worth trying.
05-09-2016 12:15 AM
It seems that the default SPC-2 is enabled on Solaris media servers and Persistent SCSI Reserve only enabled on clustered master server nodes.
If memory serves me right - in SSO environment ALL media servers should have the same option selected.
So, my advice is to enable Persistent SCSI Reserve on all media servers in SSO config.
One more thing - you seem to confuse Persistent SCSI Reserve with Persistent Binding.
These are totally different topics but both extremely important.
Have a look at these (very) old but still relevant TNs:
What is Persistent Binding? What are the advantages of Persistent Binding?
http://www.veritas.com/docs/000026930
DOCUMENTATION: Requirements for using persistent binding with the NetBackup Shared Storage Option
http://www.veritas.com/docs/000025953
05-09-2016 03:08 AM
Just re-read the post ... (prompeted by Mariannes comment)
(both cluster Using SCSI persistent Reserve) 6 Media Servers all windows (Using SP2 SCSI reserve).
For any drive that is shared (SSO), each machine that sees it MUST MUST MUST MUST MUST, use the same typre of scsi reservation.
If not, at some point, you will get data loss.
Soi as per mariannes suggestion, set everything to scsi persistent, and see if that improves things.
05-09-2016 03:08 AM
Hi Marianne,
Media servers are on Wintel and not in clusters. 6 different media servers on windows and SSO enabled and Masters are not configured with Drives attached.
05-09-2016 03:29 AM
So this is a different environment and not the cluster environment where you have been experiencing problems for a while now?
Different or same environment Windows or Solaris (or combination of different O/S's) - requirements in SSO environment remain the same :
On which media server did you copy from errors file?
(You have used a Unix path in your post???)
What type of SCSI reservation is used on this server?
05-11-2016 07:42 PM
Hi Marianne,
Yes this is different environment. All are in Windows.
Master Cluster(Configured as Persistent BindingSCSI) and Media Servers 6 (Configured as SPC2/SCSC Reserve on all 6 media servers) are on windows.
I am sorry path which i mentioned is taken from one of the Windows media servers. Typo error.
So do you recommend Media servers must be in configured as Persistent binding as Master are set as Persistent binding (Masters are not working as Media servers No drives were atthced to them)?
05-11-2016 08:49 PM
05-12-2016 05:42 AM
Persistent binding means that the OS path to the tape drive stays the same after a reboot / scsi bus reset.
If this is not set, the path to the drives can change (usually on reboot) - this then means the config of the drive in NBU is incorrect. Persistent binding is set on the HBA, using the HBA utility (HBAnywhere or SanSurfer depending on the brand of the HBA (Qlogic or Emulex).
SCSI persistent reserve, is a type of scsi reservation. NBU offers 3 options:
Off (Really not recommended)
SPC-2 (Default, basic but usually works ok)
Persistent (More intellegent, required for a cluster not always supported by tape drives (most do these days though) and I've seen Solaris OS have a problem with this type). Also, some VTLs don;t like persistent reserve.
05-12-2016 06:08 AM
I have also tried to explain in previous posts....
.... about to give up......
05-12-2016 07:28 AM
:0)