vxio: Cluster software communication timeout. Reservation refresh has been suspended
Hi, We are experiencing this error on one of our clusters. It's a two-node campus cluster with the following specifications SiteA Node1 is a Windows Server 2008 R2 virtual machine residing on a ESXi 5.1 host in this site Disk1 and 3 are LUNs in an enclosure in this site SiteB Node2 is a Windows Server 2008 R2 virtual machine residing on a ESXi 5.1 host in this site Disk2 and 4 are LUNs in an enclosure in this site We have created two VMDGs, one contains Disk 1 and 2, while the other contains Disk 3 and 4. On these VMDGs, we have created mirrored dynamic volumes. The VMDGs are then presented to the failover cluster. The quorum type on the failover cluster is a file share witness, on another server. We are also running Microsoft System Center Configuration Manager to install updates and patches on Node 1 and 2. Whenever patches are installed on a node, it gets restarted. Whenever that occurs, failover from Node 1 to Node 2 occurs for the cluster resource group. Everything seems to failover just fine, and the VMDG is imported successfully (according to the log). But 10 minutes after the VMDG has been imported, the following error is logged on Node 2 http://s28.postimg.org/ubh8skfh9/vmdg2.png If I check the status of the VMDGs in VEA its Deported for both VMDGs. http://s3.postimg.org/72ort9683/vmdg3.png But even if the disks and VMDGs seem to be offline on the active node, failover does not occur, as in Failover Cluster Manager, the VMDG is online, but there are no volumes enumerated on it. http://s12.postimg.org/p31vncct9/vmdg1.png Has anyone else experienced the same, and knows why the status of the disks change to deported, without failover occuring?Solved2.6KViews2likes6CommentsWindows 2012 R2 Failover cluster and SF Volume Manager Disk Group
Hi, I just found this article and its quite the problem I'm facing right now: https://www-secure.symantec.com/connect/forums/one-cooperation-issue-about-windows-2012-r2-failover-cluster-and-sf-cluster-shared-disk-group I created a new Windows Server 2012 R2 Failover Cluster with several mirrored Disks in DynamicDiskGroups in StorageFoundation. As a next step, I wanted to add the Diskgroups from SF as a Ressource to my Failover Roles. Actual scenario: Cluster Role Name: C0002Z DiskGroupName in SF: DG_C0002Z_1 Then I added a Ressource to the Cluster Role via Add Resource -> More Resources -> Volume Manager Disk Group I changed the Name of the VMDG in "General" to "DG_C0002Z_1", apply, ok. Then I wanted to add the Name of the DiskGroup in Properties -> Properties -> DiskGroupName of the VMDG to "DG_C0002Z_1". And thats where I dont get any further. It doesn't matter which Name I want to enter into DiskGroupName, theres always this Error: There was an error saving properties for 'DG_C0002Z_1'. Failed to execute control code '20971654'. Error Code: 0x800700a0 One or more arguments are not correct I also tried to enter Names without special characters but that doesnt help eather. Do you have any Idea how to fix or bypass this issue? Thanks2.5KViews0likes1CommentAutomatically running chkdsk on Microsoft Cluster
I have a Microsoft Cluster, windows 2003, R2, Enterprise, SP2. We have Storage Foundation for Windows, 5.1, SP2 installed. We have several of the volumes in a single Volume Manager Disk Group that have the "dirty bit" set. Before installing SFW, MSCS would cause chkdsk to be run against any volumes with the "dirty bit" set while bringing the volume on line. Since installing SFW, chkdsk is no longer being run against a dirty volume. Why is this? Thanks Lynn2.3KViews0likes8CommentsVeritas Storage Foundation for Windows 5.1 Volume Manager Disk Group fails to come online
We are trying to implement a 2 Node SQL server cluster on Windows 2008 R2 for last few days. We have so far successfully completed Windows cluster part and have tested quorum failover and single basic disk failover. Our challenge is that for shared storage between two nodes where we have got 30 LUNs from SAN but we need to have single drive. Since merging LUNs at SAN level is not an option for us and MSCS does not understand dynamic disk group created using winodws disk manager we have to use Veritas Storage Foundation for windows 5.1. We have successfully installed SFW 5.1 and created Cluster Dynamic Disk group and Volume on this disk group. This volume is visible in Windows Explorer without any issue. To add this disk group in cluster we followed below steps 1. In "Failover Cluster Manager" we created "Empty Service or Application" 2. Added "Volume Manager Disk Group" resource in the application. 3. After right clicking on the resource and selecting "Bring this resource online option" the disk group was brought online. To test the failover we rebooted the first node. The disk group failed over to the second node without any issue. To bring back disk group on node one again we restarted the second node, however this time on the first node the disk group could not come online with status was shown as "Failed". Since then we have tried rebooting both nodes alternatively, refreshed , rescaned disks but everything has failed to bring back the resource online. We have again followed all the steps from the start but same result. Can anybody suggest forward path?2.2KViews0likes4CommentsVolume drive letters are not enumerated
Hello, I have a two node failover cluster which is running SQL Server 2008R2. The OS on the nodes is Server 2008 R2, and we are utilizing Veritas Storage Foundation for Windows 6.0.1 The issue is that at failover from Node01 to Node02, the Dynamic Disk Groups (of which there are 4) are imported just fine, but the volumes are not enumerated with Drive Letters. Therefore the SQL Instance fails, and no one is able to connect to the SQL database. In Failover Cluster Manager, the VMDG is imported just fine, and it is online, but the volumes have no drive letter, instead it says "failed to obtain the partition information" To rectify the issue, I have tried to install CP3 for VSFW 6.0.1 This CP inlucdes the following hotfix [11] Hotfix name: Hotfix_6_0_10012_308_3347495 Symptom: After a failover, VEA sometimes does not show the drive letter or mounted folder paths of a successfully-mounted volume. Description: This issue may occur after a failover when VEA sometimes does not show the drive letter or mounted folder paths of a volume even though the volume is successfully mounted with the expected drive letter or folder paths. During a failover, when a disk group gets imported, SFW mounts all volumes of the disk group by querying the mount points using Microsoft API GetVolumePathNamesForVolumeName(). Sometimes, this API fails to return the correct drive letter or mounted folder paths because of which VEA fails to update the same. Resolution: NOTE: Please note that using the following workaround has a performance impact on the service group offline and failover operations. This happens because, during the service group offline or failover operation, the performance of the disk group deport operation is impacted by "n/2" seconds maximum, where "n" is the number of volumes in the disk group. To resolve this issue, the operation needs to be retried after a few milliseconds so that the Microsoft API GetVolumePathNamesForVolumeName() returns correct information. As a workaround, a new retry logic is added to the GetVolumePathNamesForVolumeName() API so that it retries the operation in case the mount path returned is empty. It will retry after every 100 milliseconds for "n" number of attempts (5 by default), which can be configured using the registry. This retry logic is disabled by default. To use the workaround, do the following: 1. Enable the retry logic by changing the value of the registry entry "RetryEnumMountPoint" from 0 to 1 under the registry key - HKEY_LOCAL_MACHINE\SOFTWARE\VERITAS\VxSvc\CurrentVersion\VolumeManager 2. Configure the number of retry attempts by changing the value of the registry entry "RetryEnumMPAttempts" under the registry key - HKEY_LOCAL_MACHINE\SOFTWARE\VERITAS\VxSvc\CurrentVersion\VolumeManager Binary / Version: mount.dll / 6.0.10012.308 After installing the hotfix, two Dword Values were added to HKEY_LOCAL_MACHINE\SOFTWARE\VERITAS\VxSvc\CurrentVersion\VolumeManager They are called "RetryEnumMountPoint" and "RetryEnumMPAttempts", value of the former is 0, while value of the latter is 5. I did not change these values. At the next failover (to Node02) the drive letters were enumerated just fine. So I thought I had fixed the issue, but at the subsequent failover (to Node02), I had the same problem again. So from what I can gather, I have to change the value of "RetryEnumMountPoint" from 0 to 1. Which leads to my question. Has anyone ever experienced this issue, and if you have, could you please share your experience?Solved2.1KViews0likes5CommentsCluster service hangs on starting, 2nd attempts succeeds
Situation Windows 2003 R2 SP2 x64 on a DL580 G5 using SFW 5.1 with cluster extension. MSCS is installed using the SFW manual. EVA connected through HP Branded Brocade's. SFW Cluster validation passed with no errors MSCS Cluster validation passed with no errors Whenever I use LDM to create a Quorum, the cluster boots and fails over with no problem. I created a dynamic quorum using the manual. The cluster service hangs on starting, it succeeds when booting manually or through the recover tab in the services console. As I test I then created a simple quorum, no difference. When I create a quorum through LDM it works. This is the relevant excerpt of my cluster.log 0000e58.00000e70::2010/03/19-13:14:07.644 INFO Physical Disk <quorumtemp>: [DiskArb] DisksOpenResourceFileHandle: Attaching to disk with signature c74e7e9e 00000e58.00000e70::2010/03/19-13:14:07.644 INFO Physical Disk <quorumtemp>: [DiskArb] DisksOpenResourceFileHandle: Disk unique id present trying new attach 00000e58.00000e70::2010/03/19-13:14:07.927 INFO Physical Disk <quorumtemp>: [DiskArb] DisksOpenResourceFileHandle: Retrieving disk number from ClusDisk registry key 00000e58.00000e70::2010/03/19-13:14:07.927 ERR Physical Disk <quorumtemp>: Arbitrate: Unable to open ClusDisk signature key c74e7e9e. Error: 2. Any pointers? The registry key is existing under(of the top of my head, not sure, but you know what I mean) HLKM\system\currencontrolset\services\clusdisk\signatures\%disksignature%Solved2.1KViews0likes4CommentsStorage Foundation
Hi im new to the product and the field, so really any advice would be great. where exactly do you install VOM in your SAN environment and does the SF s/w collide with the storage vendor management s/w. And how do you control a heterogenous storage with SF Also can someone explain the licensing.SolvedFile-Server | physical or over VMware vSphere? Need advice
Dear all, maybe you have a suggestion for me. We are running two SF 5.1 SP1 [Enterprise with Option MS Cluster Enterprise] on two MS Server 2003 Enterprise R2 x64 Cluster Nodes right know. There are some storage-arrays [DAS] directly on the server-machines - clustering is possible, because of two controllers in the disk-arrays. And there is some real SAN-storage over FC. My vision/planning for the future is to build up more storage in the SAN over FC or iSCSI to get away from DAS-storage and to manage all of it over SF (again). We also planning to virtualize our servers (up to 6-7 servers (Exchange, DFS, DC/DNS, Lizenzmanager-Server, etc.)) to run all of them on one or two physical servers; maybe a third standby-ESX. So there is the question to virtualize both of the enterprise cluster nodes and let them run as virtual machines or let them stay as physical machines? My pros: • "double failsafe", because the enterprise servers are cluster nodes right know [failsafe 1] and if the physical ESX is going down this virtual machine(s) will switch automatically to the second ESX [failsafe 2]. • Faster server-hardware change in the future. Build up a new ESX on more performant hardware; move the vm-server to it; shut down the old ESX which isn't performant enough anymore - no downtime. ...ok, ok, I have a cluster, so changing hardware would be possible without downtime, too. • Maybe better possibility to trunk the networkports. I'm not using vSphere right now, I only remembering some stuff from VMware webcasts. • Better utilization of the physical hardware! That's why you will virtualize all servers. • Making snapshots before upgrade software on the servers. Faster "recovery" of the old status. "Never touch a running system" is the past. • Better allocation of the physical RAM to different servers. My cons: • Will SF run as good as it does on the physical environment right now? Or are there any restrictions - maybe because of drivers, the hypervisor, anything else? • Will SF be (more) performant than right now or will it slow down everything, especially if I use VxCache with 8GB RAM or more? • I need lots of RAM Would be nice to here some comments or another ideas from you. Thanks in advance.2KViews0likes4CommentsBlue Disk Icon in SFWHA against disk
I would like to know whether the blue information icon that I am seeing on the passive node of a sfwha cluster will clear when a failover is initiated. The issue has arisen following a recent disk expansion which all went well apart from the odd issue described, which was as follows: Break the Mirror on EMC CX4 Mirrorview Group on Unisphere Remove secondary image lun on Unisphere Delete the Mirrorview Group on Unisphere Expand lun attached to live server on CX4 within Unisphere Go into VEA on the active server and do a rescan Then run the resize wizard to expand the disk, which in turn will show accordingly in windows explorer (Done on the fly without downtime to production system) Remove the secondary lun from the storage Group on the passive node of the CX4 array Expand the secondary lun on CX4 within Unispshere (encountered error on cx4 and ended up deleting lun and recreating at the new size) Recreate the Mirrorvirew Group using the active lun from the active node of the array on Unisphere Add the secondary lun back in to the Mirrorview Group on Unisphere This will start the mirroring of luns from active to passive luns Lastly add the secondary lun back into the Storage group within Unisphere on the passive node of the array Once the above was done a final rescan was done on the passive server within VEA to ensure the disk could be seen. It is visible be with a blue informational icon against it and the only options you have is to reactivate. This is the same against all the other disks on in VEA in the passive node as well. The only thing I can think of is the we had an issue on cx4 when trying to expand the lun, so i ended up deleting the lun and recreating at the new size. It is not asking for a signature to be written, but I am wondering if the issue will write itself once a failover has been initiated in VCS from active to passive server, since there must be information within the dynamic disk cluster group, which this disk is apart of on the live server. Information on this in the administrator guides is as shown per attached. Do the server (passive node need a restart), I am assured there is no data loss etc, but I want to be sure thaty I can failover in a emergency. Any guidance would be appreciated. ThankyouSolved1.9KViews0likes5CommentsNetwork connectivity loss solution
Hi all, Some time ago we experienced a total network failure between our to Datacenters. This failure only took place for a couple of seconds but it resulted in a active node and a node that was trying to start all instances. When connectivity was restored both nodes stopt working (to prevent splitbrain) My question: Is there a way to increase the timeinterval that dictates the nodes to take action in such a case? e.g. That the inactive node only starts when connectivity is lost for more than 5 minutes. Ivo1.8KViews1like4Comments