11-19-2012 03:03 PM
VCS v5.1 (Windows)
Hello,
Using VCS for NBU cluster..... We have mirrored catalog SAN disk at 2 sites in campus cluster.
For testing, SAN mirroring link was cut first, then 10 minutes later the IP network cut between the 2 sites. The failover started after the IP network was cut, but failed because the 2nd node 'failed to import cluster dynamic disk group' (SFW logs).
I have heard this may have occurred because the active nodes disk had the most up-to-date catalog information, and because the mirroring was cut the secondary nodes catalog disk was 'out of date' so not brought up in case of data loss. Basically we ended up with one node being offline, and the other one faulted.
I hope that made sense? Has anyone heard of this happening before?
Is there a way in SFH to force an 'out of date' catalog disk online after a failover has faulted?
Any help would appreciated.
11-19-2012 09:34 PM
Your problem is not with VCS. The problem is at Volume Manager level and with the way that " SAN mirroring link was cut".
Please tell us more about this step:
1. What exactly was done?
2. What was the purpose of this step?
3. What was the effect on both nodes?
Please save Event Viewer Application and System Logs as text files on both nodes, upload here as File Attachments. Let us know date and time that link was cut.
4. What visibility did 2nd node have of SAN disks at this point?
11-20-2012 01:14 AM
For this test to work you need to set the ForceImport attribute on the VmDg resource to 1 (it is 0 by default) - see extract from Bundled Agent guide:
Defines whether the agent forcibly importsthe disk group when exactly half the disksare available. The value 1 indicates theagent imports the configured disk groupwhen half the disks are available. Thevalue 0 indicates it does not. Default is 0.This means that the disk group will beimported only when SFW acquires controlover majority of the disks.Note: Set this attribute to 1 only afterverifying the integrity of your data. If duecaution is not exercised before setting thisattribute to 1, you risk a split-braincondition, leading to potential data loss.
Mike