Forum Discussion

AAlmroth's avatar
AAlmroth
Level 6
14 years ago

I/O fencing on Windows = Persistent Group Reservation (PGR)

Hi all,

 

I have seen quite a few posts on VCS and I/O fencing support on Windows, and most threads ends with the conclusion it is not supported.

That is perhaps correct, at least if we look at it in how it is solved in UNIX/Linux. To my knowledge, in Windows, and if the array is SCSI-3 compatible, SFW/HA can reserve the disk group by the means of Persistent Group Reservation (PGR). This, in my book, provides the same functionality from a VCS perspective as vxfen. I'm wrong in this assumption?

All it takes is to enable it on Windows 2003. On Windows 2008 it is by default enabled. Also, by using PGR, we can use DMP with more than one path.

 

/A

  • Keep in mind that A is asking about SFW-HA (Windows) there is no coordinator diskgroup or configurable I/O Fencing on Windows.

     

    SCSI-2 reservations place a reservation down a single path to the disk.  No other path can access the disk without breaking the reservation from the path that initially reserved it.  By other path I mean any other path on the same or different node - AKA Active/Passive only when reservations are used.

     

    SCSI-3 reservations place a reservation on the disks from one node down 1 path.   This process generates a PGR key which the reserving node places in its registry.  Any path to the storage that knows the PGR key can access the disk.  Since remote nodes do not know what the PGR key they can not use the disk without breaking the reservation.  However, any path on the same node that initiated the reservation has access to the PGR key and therefore can use the path - AKA Active/Active.

     

    Now the process for preventing split brain is the same with SCSI-2 and SCSI-3 and is the same process the Microsoft Cluster Server/Windows 2008 Failover Cluster uses.  It is call a Challenge/Defence process.

     

    The node (defending node) that has the disk group imported runs a reservation thread that maintain the reservation on the disks in the disk group.  The reservation thread lets say confirms that the disks are reserved once a second.  If the disk does not have the reservation on it then the thread places the reservation on the disk again. 

    Other nodes (challenging node) that try to import the disk group sees that there is a reservation on the disks in the disk group and they clear the reservation and then wait lets say 3 seconds.  After the three seconds pass then it checks the disks again to see if the reservation has returned.  If the reservation has returned then it know the disks are under an active nodes control and it stops the import operation of the disk group.  However, if the reservation was not replaced on the disk, then this node knows that there is a problem with the formally active node and it continues to import the disk group and places its own reservation thread to maintain its reservations on the disks in the disk group.

     

    As I mentioned before SFW by default will not import the disk group if there is not more than 50% of the disks in a disk group available.  In addition, an imported disk group will stay imported as long as it has access to at least 50% of the disks in the disk group.

     

    Thanks,

    Wally

  • Hello,

    SCSI 3 PGR is the technology used by vxfen (on unix ofcourse)  to place registrations & reservations on the disk ..

    I do not believe that this is the only limitation for IOFencing not being supported on windows flavour, there would be more to this. A developer can answer this better..

     

    Gaurav

  • Hi A,

     

    I'm not sure if PGR = I/O Fencing in windows or not.

     

    SFW-HA on windows 2003 and Widnows 2008 can use SCSI-3 to do Active/Active DMP configurations.  With SFW-HA 5.1 we can enable support for SCSI-3 either in the VEA control panel which would turn on native SCSI-3 for all arrays supported or via registry entry to enable SCSI-2 to SCSI-3 translation for a specific DSM.  This works for both Windows 2003 and Wndows 2008.

     

    I hope this helps with your question

     

    Thanks,

    Wally

  • On UNIX/Linux, when implementing I/O fencing we set up at least three dedicated LUNs which should all provide the "fencing", if a majority is owned by another host, a second node cannot import the disk group. This is an additional protection to split brain scenarios, I'm sure we all agree on this.

    I'm wondering about, why do we need dedicated LUNs on UNIX/Linux, when we "apparently" can do PGR on the actual volumes in Windows?

    Reading the 5.1SP2 guides, it is suggested that we only need to enable SCSI-3, and VCS can avoid split brain. Is this a correct understanding, or is SCSI-3 indeed only for managing DMP?

    I guess, I would have to trigger this someway in a cluster to actually se the behaviour, but if a host has registered a PGR on a LUN, no other host should be able to "just import by accident", right? It would take a manual override operation in VCS to import and clear a PGR?

    /A

  • Hi A,

     

    In windows you still need a majority (more than 50%) of the disk group in order to import the disk group.  This is regardless of if you are using SCSI-2 or SCSI-3 reservations.  The SCSI-2 and SCSI-3 reservations prevent other servers from accessing the disks that are imported on another node.

     

    Disk reservation is one method to prevent split brain.  Multiple redundant heartbeats is another.   Everything working together decreases the likelyhood of splitbrain.

    The VMDg resource has some built in processes to handle clearing the SCSI-2 and SCSI-3 reservations when needed.

     

    If you are asking about importing the DG with 50% or less of the disks then yes manual operations are needed but the manual operations can be set to be persistent (not recommended.)

     

    SFW-HA out of the box prevents splitbrain situations pretty well.  The only real time that I've seen problems have been with things are misconfigured or in Replcated Data Clusters were the same disks are not seen by all nodes.

    What exactly are you trying to do or what are you having problems with?

     

    Thanks,

    Wally

  • The reason I started this thread was more because when you search the archives, there seem to be a lot of confusion in regards to whether I/O fencing works or is supported on Windows. The main reason, I think, is because on UNIX/Linux we use dedicated LUNs for vxfen, and on Windows, you just enable SCSI-3... so what is the magic on Windows, that we cannot do on UNIX/Linux?

    I think it boils down to that we do not really understand how SFW-HA implements the measures to avoid split brain. Yes, we can use multiple low-latency LLT, as well as low-priority heartbeats, or just use UDP (multiple NICs though) to avoid the most common scenarios. But if a cluster goes havoc, can we rely on that the PGR reservation will disallow a node to import and mount the disks in Windows?

    I guess the question would be: Why do we not need dedicated SCSI-3 capable LUNs in Windows to implement I/O fencing, while achieving the same protection level for split brain scenarios?

    /A

  • SCSI 3 won't have a dependency on DMP or vice versa ... legacy products have used raw mode to place reservations... DMP is an added advantage now so that registrations/reservations can be done via DMP paths rather than individual raw disks which may be very huge in number..

    I believe the entire fencing concept has a base line of race to win coordinator disks... If we just enable PGR (no dedicated luns)... lets think how split brain will work ... for e.g .. lets say we have a 5 node cluster & in event of a split brain, cluster got split in 2 mini clusters one with 2 node & one cluster with 3 nodes...   & lets assume that service groups were active on one of the nodes of 2 node mini cluster.. (so reservations were made by node inside a 2 node minicluster)

    By default by algorithm set in module, mini cluster with more number of nodes will survive... in this case algorithm will fail to import diskgroup (if race is not there) ... I hope all agree

    Dedicated luns were thought to give accidental precautions... so someone doesn't re-initializes or uses the coordinator luns into data diskgroup which may prevent the split brain protection..... & from 5.0 onwards there was extra flag added to coordinator diskgroup.. i.e you can flag a particular diskgroup as coordinator diskgroup... in this case even if someone tries to destroy coordinator diskgroup, it will be denied unless flag is removed..

     

    Gaurav

  • As far as DMP with SCSI-3 is concerned, it is a requirement for Active/Active array settings in a cluster.

    Extract from 5.1 Admin Guide:

    For DMP DSMs in a cluster environment, either Active/Active or Active/Passive load balance settings can be used. DMP DSMs automatically set the load balancing to Active/Passive for disks under SCSI-2 reservation. For Active/Active load balancing in a cluster environment, the array must be enabled for SCSI-3 Persistent Group Reservations (SCSI-3 PGR).

  • Keep in mind that A is asking about SFW-HA (Windows) there is no coordinator diskgroup or configurable I/O Fencing on Windows.

     

    SCSI-2 reservations place a reservation down a single path to the disk.  No other path can access the disk without breaking the reservation from the path that initially reserved it.  By other path I mean any other path on the same or different node - AKA Active/Passive only when reservations are used.

     

    SCSI-3 reservations place a reservation on the disks from one node down 1 path.   This process generates a PGR key which the reserving node places in its registry.  Any path to the storage that knows the PGR key can access the disk.  Since remote nodes do not know what the PGR key they can not use the disk without breaking the reservation.  However, any path on the same node that initiated the reservation has access to the PGR key and therefore can use the path - AKA Active/Active.

     

    Now the process for preventing split brain is the same with SCSI-2 and SCSI-3 and is the same process the Microsoft Cluster Server/Windows 2008 Failover Cluster uses.  It is call a Challenge/Defence process.

     

    The node (defending node) that has the disk group imported runs a reservation thread that maintain the reservation on the disks in the disk group.  The reservation thread lets say confirms that the disks are reserved once a second.  If the disk does not have the reservation on it then the thread places the reservation on the disk again. 

    Other nodes (challenging node) that try to import the disk group sees that there is a reservation on the disks in the disk group and they clear the reservation and then wait lets say 3 seconds.  After the three seconds pass then it checks the disks again to see if the reservation has returned.  If the reservation has returned then it know the disks are under an active nodes control and it stops the import operation of the disk group.  However, if the reservation was not replaced on the disk, then this node knows that there is a problem with the formally active node and it continues to import the disk group and places its own reservation thread to maintain its reservations on the disks in the disk group.

     

    As I mentioned before SFW by default will not import the disk group if there is not more than 50% of the disks in a disk group available.  In addition, an imported disk group will stay imported as long as it has access to at least 50% of the disks in the disk group.

     

    Thanks,

    Wally

  • I now have a very good understanding on how SFW-HA does this.

     

    /A