Forum Discussion

Para's avatar
Para
Level 2
13 years ago

Can't get file share working - DiskRes problem? (VCS6, HP 3PAR, MS MIO)

I'm new to VCS so please bear with me.

I'm running VCS 6.0 on W2K8 Server Ent R2 using HP 3PAR SAN using Microsoft Multipath I/O.  I'm trying to get a file share working but I can't get it online.

I have a 3-node cluster and I ran the Add Resource Group wizard and chose the FileShareGroup template.  I am able to bring the NIC, IP and LANMAN resources online.  I can bring these resources up on any node and back and it works fine.  The problem is on the DiskRes.

We are using an HP 3PAR SAN and the Microsoft Multipath I/O that comes with W2K8.  When I add the signature of one of the exported volumes, I can bring it online on a node but then less than a minute later it reboots itself.  After reboot, the Cluster Explorer shows DISKRES as Offline on the first node but Faulted on the other two nodes.

These are no recent log entries on DiskRes_A.txt.

Here's the entry on engine_A.txt that shows that it was online at one point:
2012/06/19 14:11:32 VCS NOTICE V-16-1-10301 Initiating Online of Resource FS_DISKRES (Owner: Unspecified, Group: MYGROUP) on System MYNODE1
2012/06/19 14:11:32 VCS INFO V-16-1-10298 Resource FS_DISKRES (Owner: Unspecified, Group: MYGROUP) is online on MYNODE1 (VCS initiated)

I can post the logs after it rebooted if that will help.

On the node that just rebooted, running C:\Program Files\Veritas\cluster server\bin\getdrive.bat I get:
Could not gather all the disk info. Error : 170

Sure enough, the disk that was skipped was the one I was trying to get online.  When I bring up the Windows Disk Manager, it says that disk has to be initialized but when I do, it says the resource is in use.

When I run getdrive.bat on the second node, the output for that drive is:
Harddisk Number  = 1
Harddisk Type    = Basic Disk
Disk Signature   = 2264237497
Valid Partitions = 1
Access Test      = FAILED
 

What am I doing wrong?
 

  • Here's what I needed to do:

    In Windows DiskManagement on each node:

    • Put the SAN disk online
    • Set up the local mount point
    • Remove the drive letter
    • Make partition active if it isn't already
    • Take it offline - IMPORTANT!

    For DiskRes:

    • Run C:\Program Files\Veritas\cluster server\bin\getdrive.bat
    • Note the signature and enter it in properties

    For Mount:

    • Since we are using MountPath (instead of drive letters), enter the mount path
    • Change PartitionNo from 0 to 1
    • Enter same signature as DiskRes

    For FileShare:

    • Since we are using MountPaths (instead of drive letters), use a subdirectory in the SAN drive as the PathName.  I was trying to use "\" as the PathName but that doesn't work for MountPaths.
    • Set the ShareName

     

    While troubleshooting, I ended up un-exporting the SAN volume from all nodes and then checked that the settings above were correct.  But it didn't seem to work until I tried to restart VCS on all nodes via (run command prompt as admin):

    hastop -all

    It wouldn't let me stop the nodes until I save and close the config so I did via (run command prompt as admin):

    haconf -dump -makero

    I was then able to stop the nodes then restart them via (run command prompt as admin)"

    hastart -all

     

    I'm not sure if it's the restart or saving/closing of the config that got it to work but it's working now.

  • Hi Rim,

    The first thing that you might want to do is to run getdrive from all 3 nodes to ensure that each one sees the disks with the same Disk Signature.  if not you should do a rescan on them in Disk Manager so all three servers see the disks the same.

    From there is it is matter of how do you want to proceed.  I would recommend simplifying the environment but going down to a single path on all nodes and removing the Microsoft MPIO feature.  Then test to make sure that you can mount the partition on each server one at a time.  Do not mount the partition on more than one server at a time as this case lead to data corruption.

    Once you are able mount the partition on each node manually.  Then move to VCS to use the DiskRes and Mount resource to see if you can put a reservation on the drive (DiskRes) and Mount the partition from VCS. 

    If everything is working so far, then move forward with adding Microsoft's MPIO back into the configuration.

    If all else fails, we are available in Symantec Technical Support  to assist you 24x7x365.

    Thank you,

    Wally

  • I removed MS MPIO and rescanned the disks and getdrive results were consistent between the three nodes.  
    I remembered that I had set the SAN policy on each node to bring all disks online on boot so I removed that via DISKPART and then "san policy=offlineshared". I also manually set all the SAN disks as offline on each node.  On each node, I was able to put it online (one node at a time) as both a drive letter E: or a local path C:\MYPATH.

    I moved to VCS and I was able to mount the SAN disk with DiskRes on each node.  

    When I tried to Online the MOUNT to NODE1, it FAULTED.  I offlined DiskRes on NODE1 and onlined it to NODE2 and then I was able to online Mount to NODE2 as well.  But when I tried to online FileShare on NODE2, it FAULTED:
    Resource FS_FILESHARE (Owner: Unspecified, Group: MYGROUP) is FAULTED on sys NODE2

    Now, when I try to online any of the resources, I get: "Cannot online: resource's group is frozen waiting for dependency to be satisfied".  But all the resources are Offline and Not Waiting (except NIC which is Online).

    How do I unfreeze the resource?
     

  • Hi Rim,

    Each resource type has a log that is stored in the %vcs_home%\log folder.  The most recent one is named <agent type>_A.txt.  For example, the Mount resource's log would be Mount_A.txt and the FileShare resources log would be FileShare_A.txt.  These logs will have debug information in them that will point you to why the resource was not able to online during its online process.  Most of the messages are in readable format that you can determine what the problem is in most cases.

    In the same location is a cluster wide log call engine_A.txt.  It logs all cluster operations and would show the "Cannot online: resource group is frozen" type of messages.

    You can also run "hastatus -sum" from command prompt to get a summary of the current cluster state,  This can point to you a resource that is not probing or some other issue.

    To unfreeze a service group you can right click on it in the Java GUI and select Unfreeze from the popup menu.

    If you need more one on one assistance, please open a Symantec Technical Support case.  We are here to help you with any problems with our software that you may have.

    Thank you,

    Wally

  • When I right-click on the Resource Group, the Unfreeze option is greyed out which seems contradictory to the error message I was getting.  Also, hastatus -sum shows all three nodes as running and not frozen and the resource group as having been probed on all three nodes (but offline on all nodes).

    Mount_A.txt shows:

    2012/06/19 19:03:22 VCS ERROR V-16-10051-8018 Mount:FS_MOUNT:monitor:Failed to create the Volume object for DiskNo = 5, PartitionNo = 1. Error : 110
    2012/06/19 19:03:22 VCS DBG_21 V-16-50-0 Mount:FS_MOUNT:monitor:*** Start of debug information dump for troubleshooting ***
        LibLogger.cpp:VLibThreadLogQueue::Dump[206]
    2012/06/19 19:03:22 VCS DBG_21 V-16-50-0 Mount:FS_MOUNT:monitor:Number of valid partition on Disk (5) are 1.
        LibDisk.cpp:VLibDisk::GetNumberOfValidPartitions[942]
    2012/06/19 19:03:22 VCS DBG_21 V-16-50-0 Mount:FS_MOUNT:monitor:Mount path C: is not a reparse point
        LibStorage.cpp:VLibStorage::IsSuitablePath[859]
    2012/06/19 19:03:22 VCS DBG_21 V-16-50-0 Mount:FS_MOUNT:monitor:(2) IOCTL_MOUNTMGR_QUERY_POINTS failed
        LibStorage.cpp:VLibStorage::QueryMountManager[660]
    2012/06/19 19:03:22 VCS DBG_21 V-16-50-0 Mount:FS_MOUNT:monitor:QueryMountManager() failed. Invalid volume information specified.
        LibVolume.cpp:VLibVolume::Open[268]
    2012/06/19 19:03:22 VCS DBG_21 V-16-50-0 Mount:FS_MOUNT:monitor:*** End of debug information dump for troubleshooting ***
        LibLogger.cpp:VLibThreadLogQueue::Dump[217]
    2012/06/19 19:11:42 VCS INFO V-16-10051-30003 Mount:FS_MOUNT:imf_register:Un-registering with IMF for offline monitoring
    2012/06/19 19:11:57 VCS ERROR V-16-10051-8018 Mount:FS_MOUNT:monitor:Failed to create the Volume object for DiskNo = 5, PartitionNo = 1. Error : 110
     

    Thanks for your help.  I will submit a case.
     

  • Hi Rim,

    Error 110 is a windows error which means:

       C:\>net helpmsg 110

       The system cannot open the device or file specified.
     

    Can you provide the Mount resource configuration from the main.cf?  The main.cf is in the %vcs_home%\conf\config\ folder.

    I'm thinking that you have the partition number incorrectly defined.  Its been awhile since I've touched basic disk resources in a cluster but I seem to remember that the partition numbers start at 0 and not 1.  So if you only have 1 partition on the drive then the PartitionNo attribute should be set to 0.

    thank you,

    Wally

  • From main.cf:

        Mount FS_MOUNT (
            MountPath = "C:\\MY\\PATH"
            PartitionNo = 1
            Signature = 2264237497
            )
     

    According to Veritas Cluster Server Bundled Agents Reference Guide:

    "The partition on the disk configured for mounting.  Note that the base index for the partition number is 1. Default is 0."

    I'm pretty sure I could not bring that resource online until I set the PartitionNo = 1.

  • Here's what I needed to do:

    In Windows DiskManagement on each node:

    • Put the SAN disk online
    • Set up the local mount point
    • Remove the drive letter
    • Make partition active if it isn't already
    • Take it offline - IMPORTANT!

    For DiskRes:

    • Run C:\Program Files\Veritas\cluster server\bin\getdrive.bat
    • Note the signature and enter it in properties

    For Mount:

    • Since we are using MountPath (instead of drive letters), enter the mount path
    • Change PartitionNo from 0 to 1
    • Enter same signature as DiskRes

    For FileShare:

    • Since we are using MountPaths (instead of drive letters), use a subdirectory in the SAN drive as the PathName.  I was trying to use "\" as the PathName but that doesn't work for MountPaths.
    • Set the ShareName

     

    While troubleshooting, I ended up un-exporting the SAN volume from all nodes and then checked that the settings above were correct.  But it didn't seem to work until I tried to restart VCS on all nodes via (run command prompt as admin):

    hastop -all

    It wouldn't let me stop the nodes until I save and close the config so I did via (run command prompt as admin):

    haconf -dump -makero

    I was then able to stop the nodes then restart them via (run command prompt as admin)"

    hastart -all

     

    I'm not sure if it's the restart or saving/closing of the config that got it to work but it's working now.