I have a Microsoft Cluster, windows 2003, R2, Enterprise, SP2. We have Storage Foundation for Windows, 5.1, SP2 installed. We have several of the volumes in a single Volume Manager Disk Group that have the "dirty bit" set. Before installing SFW, MSCS would cause chkdsk to be run against any volumes with the "dirty bit" set while bringing the volume on line. Since installing SFW, chkdsk is no longer being run against a dirty volume. Why is this?
Unfortunately, there is no way to have the VMDg resource run an auto-chkdsk in an MSCS cluster when the dirty bit is set. This is an option available when using a Veritas Cluster Server (which will actually fail to bring dirty volumes online until a chkdsk is run), but this has never been implemented in any way on the MSCS side.
I will raise this 'concern' internally and see what traction it gets. At the least, we can send up an enhancement request for a future version to provide this functionality if possible, but as of now, this does not exist.
I stand corrected! This is possible :)
See technote: http://www.symantec.com/business/support/index?page=content&id=TECH127796 which provides details on the necessary registry change to make to get chkdsk to run. There are many different chkdsk options that can be set, and what each of those settings are is described in the following Microsoft article (this is also referenced in the above Technote Article):
Please let me know if you have any additional questions.
I have Win2k8 (with 5.1 SP1) and if I go into cluadmin and check the properties of the VMDg resource, on the 'Properties' tab, I have a DiskRunChkDsk attribute (currently set to 0). I do not currently have a Win2k3 configuration to see if this option is available in the GUI on that version as well, but I'd strongly recommended checking.
And per the Microsoft article the acceptable values for this are 0-6, with 0 being the default setting to mimic what a Physical Disk resource does. In my case, the resource was added automatically, but that may be only for Win2k8.
0 (Default) - Performs a quick check to verify volume is okay during online, runs chkdsk to recover from any corruption. Fails online if chkdsk is unable to repair.
1 - Performs a thorough check to verify volume is okay during online, runs chkdsk to recover from any corruption. Fails online if chkdsk is unable to repair.
2 - Always run chkdsk on volumes during online. Fail online if chkdsk fails due to any reason.
3 - Same behavior as option #0, except if the quick volume check returns okay, run chkdsk in readonly mode (i.e. chkdsk on a snapshot of the volume) and proceed with online.
4 - Don't perform any file system level checks during online. In this mode IsAlive or LooksAlive calls don't do any file system level verification.
5 - Perform a thorough check to verify volume during online. Fail online if the volume/FS checks fail due to any reason. Do not attempt recovery using chkdsk.
6 - Suppresses volume creation/online/mounting during disk resource online. Disk is in offline read write mode, i.e. the disk is readable/writable using raw block level IOs.
I just wanted to point this out as another option (as opposed to having to make the change in the registry). Below is a screenshot from Windows Failover Cluster:
We have Microsoft Cluster, Windows 2003, R2, Enterprise, SP2 and Windows 2008 R2 Enterprise. We have Storage Foundation for Windows, 5.1, SP2 installed.
We already have the property DiskRunChkDsk and it is set to 0. On 2003, we also created the property SkipChkdsk and set it to 0. If we set the dirty bit on the volume and take it off-line and back-one line, no chkdsk is run.
The Symantec tech note you reference appears to only be to fix an error message in a wizard. It does not fix the dirty filing system issue.
On 2008, if the volume is a drive letter and not a mount point, a prompt comes up asking if you want to repair the drive. A prompt is not an automatic repair.
For a failover cluster, we need to be sure that volumes can failover, mounted on the other node and not have a dirty filing system presented to our applications.
Do you have any other suggestions that can help us prevent corrupted data on our clustered volumes managed by SFW?
I too saw the issue where the volume remained dirty, even with the attribute configured in the registry and was unable to see success. I will perform some additional test and I will also need to engage Engineering as a possible issue in the product.
Assuming you have a Support contract, it would be best to open a Support case at this time so that we can track this issue and resolution accordingly. If you do go this route, let the TSE know you worked with Robert Hanley so they can engage me immediately.
Thank you for your patience on this.
I have not seen a response from you on this (and assuming you and JJ are not co-workers) feel free to open a case concerning this as well. I'll also plan on keeping this thread up to date on the progress for others interested in this functionality.
I'm sorry for not getting back to you sooner. JJ and I are co-workers and have been discussing this between us. I will be opening a case with Symantec to see if we can get a resolution.
What we are seeing is that on Windows 2003 Clustering, chkdsk does not run when a Symantec Dynamic volume is marked DIRTY. On Windows 2008 Failover Cluster, only the lettered drive will execute chkdsk, but only if you answer the prompt. For any mounted volumes (mounted through a directory), chkdsk will not run.
Basic disks will run chkdsk on Windows 2008 Failover Clustering, with the same prompt, but the prompt will time out. This includes both lettered drives and mounted volumes.