Forum Discussion

switchbone's avatar
switchbone
Level 3
12 years ago

Application HA clustering has dropped a disk... :-/

Hi,

After I completed the Application HA clustering of SQL2008 across two Windows 2008 R2 nodes, I found that the SQL installation's virtual Backup disk (M: for reference) remained on the first node after I'd initiated a switch to the second node via the High Availability tab in the vSphere Client. The other disks re-attached to the second node ok.

On closer inspection in Cluster Explorer on one of the nodes I discovered that the Mount Point for that disk was completely missing!

I attempted to manually create the Mount Point and brought the resource online in the cluster on the first node, but attempting the switch operation again failed as the virtual disk failed to re-attach to the second node.

How to fix without blatting the clustering / SQL installations?

Thanks!

 

 

  • Ok.  let me try this again.  My screenshots did not get added to my last post.  Here are two images to help you find the configuration information needed and where to put that information in the Cluster GUI.

     

    Capture1.PNG

     

    Capture2.PNG

     

    From your description, I'm also assuming that you have the other needed resource to mount the drive to M:. If not then we will need to modify or add other resources to get the drive completely accessible during failover.

    Thank you,

    Wally

  • Hi John (SB),

    Good to hear that you got around this issue.  Have you tested MSDTC to ensure that it is working as expected?  If not here is a quick way to test.

    1. Online the MSDTC service group on node 1.

    2. Online the SQL service group on node 1.

    3. Open SQL Query Analyzer (in SQL Management Studio) and execute the following T-SQL statement

      Begin Distributed Transaction

    If MSDTC is working correctly it should simply return the following line

       Command(s) completed successfully.

    If it returns an error then MSDTC is not working correctly.

    Test failing over the MSDTC and SQL groups to different nodes in the cluster to test each combination possible to ensure that all nodes are working as expected.

    Thank you,

    Wally

  • Hi Wally,

    Actually - I found a workaround:

    1. Create the VMDKs for node 1 on SCSI ID 1:x (as you recommended)
    2. Run the SQL Config Wizard but add only node 1 and configure as normal
    3. Once the MSDTC resource is created, manually edit the configuration in Cluster Explorer to include node 2, its public MAC address, signatures, disk paths, VMWareDisks resource, etc.

    Once I'd configured this I was successfully able to switch the MSDTC resource back and forth across node 1 and node 2.

    Thanks for all of your pointers and advice - much appreciated!

    Regards,

    John (SB).

  • Hi Wally,

    This doesn't work - I attempt to assign the same VMDK ([datastore] node1server/6.vmdk) to both nodes, I get the following error:

     

    Capture1.JPG

    The first node locks the file and when I attempt to attach it to the second node, it complains that the file is locked - which makes sense.

    Is there another way of creating the MSDTC resource?

    OR - as I can at least add 1 node using the SQL Config Wizard, is there a a way of manually introducing the second node via Cluster Manager?

    Thanks again,

    SB. 

  • Hi SB,

    Assign the VMDK file to SCSI id 1:0 on both nodesat the same time and try running the wizard.  The wizards on the nodes directly are looking for the same disks to be available to both nodes at the same time.

    Again, you would like direct help please open a support case.

    Thank you,

    Wally

  • Hi Wally - thanks for the swift reply.

    I've followed most of the steps but am only able to select the first of the two nodes via the SQL Wizard, so I've configured the MSDTC group only on the first node. If I add the second node, I get the same failure.

    I did put the new VMDK on 1:0 on the first node which seemed to work for that first node.

    Can you clarify this step: Temporarily assign the disk to all servers in the cluster but only have the drive letter assigned on one system where you will run the wizard.

    Thanks,

    SB

     

  • Hi SB,

    The wizards that come with the base VCS product directly on the nodes are not VMWare aware like the wizards that are that launch from the vCenter console plugin.  As such, you will need to do a workaround to get the bulk of the group created and then make some manual configuration changes to the VMWare components.

    Here are the basics of what you need to do.

    1. Temporarily assign the disk to all servers in the cluster but only have the drive letter assigned on one system where you will run the wizard.  The disks need to be added to a virtual scsi adaptor other than the one that the OS drive is on.  AKA scsi ID 1:x would be fine.

    2. Run the wizard to create the MSDTC service group.

    3. Remove/unassign the disks from all systems except the one where the wizard was run on.

    4. Create a VMwareDisks resource in the MSDTC service group and link it below the other disk resources in this cluster.

    5. Test online and failover to ensure that the MSDTC group works as expected.

    Note: The MSDTC service group will automatically show up in the vCenter's Symantec HA Console tab since you already have a SQL service group created on these nodes.

    If you have any problems and need more direct assistance, please open a case with Symantec Technical Support.  We will be able to assist you with this configuration as needed.

    Thank you,

    Wally

     

  • Actually Wally - I've hit another snag, which hopefully you can help with.blush

    Following the Veritas™ Cluster Server Implementation Guide for Microsoft SQL Server 2008 and 2008 R2, I'm attempting to create the MSDTC resource group as explained in Chapter 7.

    I have a separate disk (I: MSDTC) ready on the node I'm configuring the resource on, and I'm using SQL Server Agent Configuration Wizard to create the resource group with. 

    After specifiying the name of the group and selecting both nodes of the cluster to configre the group on, I click Next and then receive the following error message:

    Capture.JPG

    I've done some research on the error but haven't found anything other than a patch: https://sort.symantec.com/patch/detail/7842 which includes the hotfix Hotfix_6_0_10011_3247363 and mentions the above error, but is intended to fix an issue with the "Windows FileShare" and "SQL Server 2008" agent configuration wizards.

    Can you offer any advice on the error, please?

    Thanks,

    SB.

     

  • Wally - thank you very much. That did the trick!!

    The path to the VMdisk wasn't defined in the VMWareDisks resource, as you mentioned. So I offlined the disk mount, configured the path, onlined it again and probed the VMWare Disk resource.

    I used the Switch link from the HA Tab in the vSphere client and the SQL application + backup disk moved across to the second node and back again without issue.

     

    Regards,

    SB.

  • Ok.  let me try this again.  My screenshots did not get added to my last post.  Here are two images to help you find the configuration information needed and where to put that information in the Cluster GUI.

     

    Capture1.PNG

     

    Capture2.PNG

     

    From your description, I'm also assuming that you have the other needed resource to mount the drive to M:. If not then we will need to modify or add other resources to get the drive completely accessible during failover.

    Thank you,

    Wally

  • Hi switchbone,

    From you description, it sounds like the disk being used for the Backup drive is not part of the VMWareDisks resource.  This resource is the one that assigns and unassigns disks to the VMs on failover.  You need to add the details for this disk to the "Disk Paths" attribute of the VMWareDisks resource.  You can get the details from the VM server settings page.

     

    The Java GUI should be installed on the Windows Guest VM.  You can use this to edit the "Disk Paths" attribute as needed.

     

    Don't worry about the UUID for the disk being used.  The VMWareDisks resource will get that from VMWare and update the resource configuration on it next resource probe/monitor operation.

    The 1:0 in my example is the SCSI ID that the disk should be assigned to on the VM.  It is in "Virtual Device Node" of the VM edit setting screen in vCenter.

    I would make these changes with the resource offline and then online it after the changes are done.  Then test failover as needed.

    If you have any problems please reach out to Symantec Technical Support.  We are here to assist you 24 x 7 x 365.

     

    Thank you,

    Wally