Frozen Media
Hello Experts,
We have Master Server in Cluster Windows Netbackup version 7.6.0.3 (both cluster Using SCSI persistent Reserve) 6 Media Servers all windows (Using SP2 SCSI reserve). I could see many tapes getting frozen with the below error. Tape Drive is IBM TS3500 which supports Persistent Binding.
Logs from /installationpath/db/media/errors
05/08/16 22:33:45 O00463 -1 RESERVE_ERROR Drive037 0 1 0 0
I could see this reserve error from different drives.
What do you suggest what is causing problem here. We dont have frequent cluster failover that may lead us to look into persistent binding etc
Thanks,
Persistent reserve is correct for a cluster, and is more intellegent compared with SPC-2
Issue is most likely to be either something outside NBU having access to the drives (another server) or if you are running some software that 'polls' drives. HBA issues can also cause this sort of thing as can faulty drives (though faulty drives is not likely if is happening on multiple at the same time). Drive firmware is another consideration.
One thing it isn't is NBU, all we are doing is issue a scsi reservation, only to find that 'something' the drive already has one. With persistent reservation a 'reservation key' is used, which is held on the drive itself. Additionally, persistent reservation allows any machine to break the reservation (which is why it's used in a cluster), but, the reservation should only be broken by an application, if the application made the reservation in the first place. It works this out by checking the reservation key on the drive. The fact we're not breaking it would suggest further that we didn't make it.
I did state above this won't be NBU - one exception to that.
If there are many drives whichg are SSO between many media servers (so say 40 media servers sharing say around 50 drives, all SSO) - I've seen this cause issues with drives going in to PEND and being unusable. If you have a sensible number of drives per media server it won't be this.
It is very difficult ot find the cause of this, there is nothing we can do in NBU, there are no tools that we have available.
On LInux, there is a 3rd part tool(s) called sg3_utils, this has the ability to actyually tread the reservation key off a drive:
Eg.
We see in bptm log, the reservation key:
14:29:20.123 [14098] <2> io_open: SCSI PERSISTENT RESERVE (Verified reservation with key 0x01d00006 001e8488)
Using sg3_utils we can read the reservation ket straight from the drive:
dr-media1:/usr/openv/netbackup/logs/bptm # sg_persist --read-reservation -d /dev/sg5
IBM ULT3580-TD5 0103
Peripheral device type: tape
PR generation=0xb, Reservation follows:
Key=0x1d00006001e8488
scope: LU_SCOPE, type: Exclusive AccessI know however you are on Windows.
I beleieve there is a utilty now, I think called sc3_utils, which is a re-write of sg3_utils for windows. I've not used it yet but this may have the ability to read the reservation kety off the drive. If there is no key, it 100% wasn't NBU. If there is a key, it can be compared to those logged into bptm log on each media server, again, if it doen't match, it's not us.
You can also try vmoprcmd -crawlreleasebyname - run this from say the master.
This will go round each media server and try and release the drive. If it was SPC-2 it would tell you which media server made the reservation (but not which application), I'm not sure woth persistent reservation if vmoprcmd still displays the server that made the reservation, as any server can break it - but worth trying.
It seems that the default SPC-2 is enabled on Solaris media servers and Persistent SCSI Reserve only enabled on clustered master server nodes.
If memory serves me right - in SSO environment ALL media servers should have the same option selected.
So, my advice is to enable Persistent SCSI Reserve on all media servers in SSO config.
One more thing - you seem to confuse Persistent SCSI Reserve with Persistent Binding.
These are totally different topics but both extremely important.
Have a look at these (very) old but still relevant TNs:
What is Persistent Binding? What are the advantages of Persistent Binding?
http://www.veritas.com/docs/000026930DOCUMENTATION: Requirements for using persistent binding with the NetBackup Shared Storage Option
http://www.veritas.com/docs/000025953