02-05-2009 10:59 PM
02-06-2009 01:51 AM
Hi,
> the resources of VMDg and MountV both that were unable to bring online successfully with some error
Could you please explain why VMDg and MountV resources fail to online?
The MountV resources depend on the VMDg resources. A simplified resource dependecy tree would look this way:
Application
|
MountV
|
VMDg
I hope this helps.
Regards
Manuel
02-06-2009 09:27 AM
Hi Manuel,
First of all, thanks for reply...
I am still finding out why was VMDg and MountV resources fail to online on passive node when TTL heartbeat connection has dropped (simulate a NIC faulty). The only error that I have got which said that Agent is calling clean for resource because the resource is not up even after online completed, when it happen the service group was change state to faulted.
I have tested the cluster failover by switch over service group between node manually and shutdown the active node to simulate the server down on active node. By doing so, the cluster has failover successful and bring up the service group on another node without any issue.
I understand the resource dependancy, MountV resource failed was properly due to dependency of VMDg resource did not come online in time manner... my question is when TTL heartbeat being disconnected, the active cluster is detecting the error and failover initial to bring online the service group on passive node, it should work that way, but in my case it did not work as expected...
What are the workaround that I could troubleshoot the problem to have service group failover successfully on another node when TTL is detected faulty? Please advise.
many thanks.
02-10-2009 01:11 PM
Hi,
I already asked some colleagues for help which are more VCS Windows savvy.
Sorry for not being a help.
Regards
Manuel
02-10-2009 11:03 PM
Hello,
I apologize for the delay on responding. Could you please set the log level on the VMDg and MountV resources to the DBG_AGDEBUG level? This will enable additional log information into the %VCS_HOME%/log log files for the two resources.
Has a support case been opened on this issue? If not, that maybe the best method to ensure the review of the configuration, attributes, and service group definition for your environment.
Please advise on testing with additional log level and whether support is going to be pursued.
Regards,
Paul
02-11-2009 04:53 AM
Hi Khlow,
Try setting these attributes:
for VMDg attributes:
ForceDeport: true
ForceImport: true
for MountV attributes:
force Unmount: ALL
Joost
03-03-2009 08:09 PM
Note that by disconnecting all network interfaces simultaneously you are simulating "Split-brain", or network partition, which is a condition that you should never allow your cluster to go into. VCS uses the term "concurrency violation" for any application that has gotten into this state of running simultaneously on more than one machine.
All clustering products hope to never get into this situation because it means that you might have possible data corruption, if the application is allowed to run on both sides of the cluster simultaneously.
Some clustering products use Quorum disks to help prevent this, however usually Quorum disks themselves become points of failure in the cluster, and usually cause more downtime then they actually prevent.
Some clustering products use disk heartbeats, but again, depending on how this is implemented it could also cause more downtown due to disk failure than it actually resolves.
Also both Quorum and disk heartbeats don't prevent certain types of split-brain, like those caused by intermitten system hangs (or on Solaris: "Stop-A" followed by "go" a minute later).
The only feature I've seen that absolutely prevents split-brain in all these conditions is I/O Fencing, which is a VCS feature, but only on UNIX platforms, and only if you have modern disk arrays that support SCSI-3 Persistent Reservations.
I recommend having as many heartbeats as possible, specifically having low priority LLT (VCS) heartbeats on the public interface. The chance that all your private and public networks fail simultaneously (but the fiber cables and storage remain) is next to zero.
In my opinion: public low-priority hearbeats are the simplest thing you could do, involving the least amount of work and risk, that cause the greatest benefit in preventing split-brain.
08-01-2009 06:14 AM
05-13-2010 08:24 AM
05-13-2010 02:34 PM