Hi Paul,
You are wanting to failover virtual machines from one ESX / vsphere host servers, to another host when the guest runs out of resources (eg memory or HBA queue depth)? You are running SFW to give you multipathing on RDM devices within the guest.
The way that you failover the machines is to use MS Cluster to detect a failed OS drive (by removing access to the VMDK file) and so the cluster should bring up the guest on another ESX / vsphere host.
Assuming the design is correct that I've stated, isn't the problem that the virtual machine won't come up on a new host - or by the sound of it, that it won't fail on the old host? If the C: drive is not accessible to the guest OS, but you are saying I/O "starts queuing" with the vsphere host, how does this happen? The guest should have failed instantly as it does on ESX.
I'm wondering why you think this may be a multipathing issue? I haven't seen this in practice and assuming you are running fibre HBA I am thinking that the multipathing should be done at the ESX level. We do see some people have issues with the data disks being lost with guest failover but they are using a cluster disk group that is depending on SCSI reservation and so far hardware is not capable of transferring the reservation from host to host without the guest knowing about it (ie losing it and failing cluster disk resources). In your case I think you would be after data to be queued to the disk or failed to the application so that when the guest comes up again, the application recovers and continues.
Sounds an interesting project, if you have any more comments on what you're doing I'd be interested in following. Our dev team and Product Management teams might have things going on too as people change thinking on what is in a virtual machines over time and are requesting more enhancements.
James.