cancel
Showing results for 
Search instead for 
Did you mean: 

Application HA clustering has dropped a disk... :-/

switchbone
Level 3

Hi,

After I completed the Application HA clustering of SQL2008 across two Windows 2008 R2 nodes, I found that the SQL installation's virtual Backup disk (M: for reference) remained on the first node after I'd initiated a switch to the second node via the High Availability tab in the vSphere Client. The other disks re-attached to the second node ok.

On closer inspection in Cluster Explorer on one of the nodes I discovered that the Mount Point for that disk was completely missing!

I attempted to manually create the Mount Point and brought the resource online in the cluster on the first node, but attempting the switch operation again failed as the virtual disk failed to re-attach to the second node.

How to fix without blatting the clustering / SQL installations?

Thanks!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Wally_Heim
Level 6
Employee

Ok.  let me try this again.  My screenshots did not get added to my last post.  Here are two images to help you find the configuration information needed and where to put that information in the Cluster GUI.

 

Capture1.PNG

 

Capture2.PNG

 

From your description, I'm also assuming that you have the other needed resource to mount the drive to M:. If not then we will need to modify or add other resources to get the drive completely accessible during failover.

Thank you,

Wally

View solution in original post

10 REPLIES 10

Wally_Heim
Level 6
Employee

Hi switchbone,

From you description, it sounds like the disk being used for the Backup drive is not part of the VMWareDisks resource.  This resource is the one that assigns and unassigns disks to the VMs on failover.  You need to add the details for this disk to the "Disk Paths" attribute of the VMWareDisks resource.  You can get the details from the VM server settings page.

 

The Java GUI should be installed on the Windows Guest VM.  You can use this to edit the "Disk Paths" attribute as needed.

 

Don't worry about the UUID for the disk being used.  The VMWareDisks resource will get that from VMWare and update the resource configuration on it next resource probe/monitor operation.

The 1:0 in my example is the SCSI ID that the disk should be assigned to on the VM.  It is in "Virtual Device Node" of the VM edit setting screen in vCenter.

I would make these changes with the resource offline and then online it after the changes are done.  Then test failover as needed.

If you have any problems please reach out to Symantec Technical Support.  We are here to assist you 24 x 7 x 365.

 

Thank you,

Wally

Wally_Heim
Level 6
Employee

Ok.  let me try this again.  My screenshots did not get added to my last post.  Here are two images to help you find the configuration information needed and where to put that information in the Cluster GUI.

 

Capture1.PNG

 

Capture2.PNG

 

From your description, I'm also assuming that you have the other needed resource to mount the drive to M:. If not then we will need to modify or add other resources to get the drive completely accessible during failover.

Thank you,

Wally

switchbone
Level 3

Wally - thank you very much. That did the trick!!

The path to the VMdisk wasn't defined in the VMWareDisks resource, as you mentioned. So I offlined the disk mount, configured the path, onlined it again and probed the VMWare Disk resource.

I used the Switch link from the HA Tab in the vSphere client and the SQL application + backup disk moved across to the second node and back again without issue.

 

Regards,

SB.

switchbone
Level 3

Actually Wally - I've hit another snag, which hopefully you can help with.blush

Following the Veritas™ Cluster Server Implementation Guide for Microsoft SQL Server 2008 and 2008 R2, I'm attempting to create the MSDTC resource group as explained in Chapter 7.

I have a separate disk (I: MSDTC) ready on the node I'm configuring the resource on, and I'm using SQL Server Agent Configuration Wizard to create the resource group with. 

After specifiying the name of the group and selecting both nodes of the cluster to configre the group on, I click Next and then receive the following error message:

Capture.JPG

I've done some research on the error but haven't found anything other than a patch: https://sort.symantec.com/patch/detail/7842 which includes the hotfix Hotfix_6_0_10011_3247363 and mentions the above error, but is intended to fix an issue with the "Windows FileShare" and "SQL Server 2008" agent configuration wizards.

Can you offer any advice on the error, please?

Thanks,

SB.

 

Wally_Heim
Level 6
Employee

Hi SB,

The wizards that come with the base VCS product directly on the nodes are not VMWare aware like the wizards that are that launch from the vCenter console plugin.  As such, you will need to do a workaround to get the bulk of the group created and then make some manual configuration changes to the VMWare components.

Here are the basics of what you need to do.

1. Temporarily assign the disk to all servers in the cluster but only have the drive letter assigned on one system where you will run the wizard.  The disks need to be added to a virtual scsi adaptor other than the one that the OS drive is on.  AKA scsi ID 1:x would be fine.

2. Run the wizard to create the MSDTC service group.

3. Remove/unassign the disks from all systems except the one where the wizard was run on.

4. Create a VMwareDisks resource in the MSDTC service group and link it below the other disk resources in this cluster.

5. Test online and failover to ensure that the MSDTC group works as expected.

Note: The MSDTC service group will automatically show up in the vCenter's Symantec HA Console tab since you already have a SQL service group created on these nodes.

If you have any problems and need more direct assistance, please open a case with Symantec Technical Support.  We will be able to assist you with this configuration as needed.

Thank you,

Wally

 

switchbone
Level 3

Hi Wally - thanks for the swift reply.

I've followed most of the steps but am only able to select the first of the two nodes via the SQL Wizard, so I've configured the MSDTC group only on the first node. If I add the second node, I get the same failure.

I did put the new VMDK on 1:0 on the first node which seemed to work for that first node.

Can you clarify this step: Temporarily assign the disk to all servers in the cluster but only have the drive letter assigned on one system where you will run the wizard.

Thanks,

SB

 

Wally_Heim
Level 6
Employee

Hi SB,

Assign the VMDK file to SCSI id 1:0 on both nodesat the same time and try running the wizard.  The wizards on the nodes directly are looking for the same disks to be available to both nodes at the same time.

Again, you would like direct help please open a support case.

Thank you,

Wally

switchbone
Level 3

Hi Wally,

This doesn't work - I attempt to assign the same VMDK ([datastore] node1server/6.vmdk) to both nodes, I get the following error:

 

Capture1.JPG

The first node locks the file and when I attempt to attach it to the second node, it complains that the file is locked - which makes sense.

Is there another way of creating the MSDTC resource?

OR - as I can at least add 1 node using the SQL Config Wizard, is there a a way of manually introducing the second node via Cluster Manager?

Thanks again,

SB. 

switchbone
Level 3

Hi Wally,

Actually - I found a workaround:

  1. Create the VMDKs for node 1 on SCSI ID 1:x (as you recommended)
  2. Run the SQL Config Wizard but add only node 1 and configure as normal
  3. Once the MSDTC resource is created, manually edit the configuration in Cluster Explorer to include node 2, its public MAC address, signatures, disk paths, VMWareDisks resource, etc.

Once I'd configured this I was successfully able to switch the MSDTC resource back and forth across node 1 and node 2.

Thanks for all of your pointers and advice - much appreciated!

Regards,

John (SB).

Wally_Heim
Level 6
Employee

Hi John (SB),

Good to hear that you got around this issue.  Have you tested MSDTC to ensure that it is working as expected?  If not here is a quick way to test.

1. Online the MSDTC service group on node 1.

2. Online the SQL service group on node 1.

3. Open SQL Query Analyzer (in SQL Management Studio) and execute the following T-SQL statement

  Begin Distributed Transaction

If MSDTC is working correctly it should simply return the following line

   Command(s) completed successfully.

If it returns an error then MSDTC is not working correctly.

Test failing over the MSDTC and SQL groups to different nodes in the cluster to test each combination possible to ensure that all nodes are working as expected.

Thank you,

Wally