Highlighted

Unknown Dg appears in Disk Groups with some or all disks


Hello,
Since support from Symantec is slow (not every case, but this time), I am asking this on forum.
I am doing a new installation and suddenly I have this new DG called "Unknown Dg". There are disks with status foreign and type Dynamic. I cannot do anything with these disk (oh powerfull SFW). Usually I can destroy DG or remove disk from DG etc, but not in this case.
Reason for having this situation:
- I made two cluster Disk Groups with same name on different servers with different disks and suddenly disks from this Disk groups are in disk group Unknown Dg
- I reinstalled SFW and suddenly all disks from all dynamic Disk groups are in Unknown Dg ;(

I searched documentation and found this:
If the cluster software attempts to fail over a disk group to another node that has a disk group with the same name or if you move a disk group to another node that has a disk group with the same name, unpredictable results can occur. (well now we know the result).

I searched internet and found this: http://seer.entsupport.symantec.com/docs/287444.htm (there is part telling you to reinstall SFW - but this makes it even worse)

I found also this:
vxtool is used to clear the SFW Private Region from a disk and would be used in instances when you were attempting to destroy a Disk Group or remove a problem Disk that has a corrupted private region on it. The resetmbr clears the private region and writes a new signature to the disk.

Do you think that vxtool can help me with this ?

my config: w2k3 r2 sp2, VSFW 5.1 sp1, fiber HBA qlogic QLE2460, and IBM DS8300 array


1 Solution

Accepted Solutions
Highlighted
Accepted Solution!

So I worked on this with


So I worked on this with Symantec support. We tried vxtool but with no avail, because we got mesages saying that disc is not usable !

but after I managed to have all disks back. What I did is:

- installed all recommended Windows fixes (2 recommendation from qlogic - HBA manufacturer, 2 recommendation from IBM manufacturer of SDD DSM)
- updated driver of HBA - STOR Miniport (to recommended level)
- updated firmware, bios, fcode and EFI of Qlogic HBA (everything is in one download package - I did it via Sansurfer FC HBA GUI Manager)
- installed Subsystem Device Driver DSM Version 2.4.2.1-2 from IBM
- configured all two HBAs from Sansurfer FC HBA GUI Manager

this helped me to get back all disks, but one - quorum disk was still missing in VEA

so removed both cluster nodes from cluster with command "cluster node /forcecleanup"

and than when I did a rescan, I was able to import Disc group with no missing discs. Then I destroyed all disc groups (except BasicDG with boot disc). And I am going to configure this cluster again.
So I think, that problem was because the servers were part of MSCS cluster.

But first I will create a quorum disc group and then create MSCS cluster. I hope it will work out this way.

I will let you know about the outcome.

View solution in original post

11 Replies
Highlighted

Hi hraju, UnknownDG is

Hi hraju,

UnknownDG is created when SFW detects a problem with the private region that it can not handle.  This can be from many different reasions included corrupt private region or in your case it is a duplicate diskgroup name with a different DGGUID. 

If you know which disks you put in which disk group you can simply offline (windows 2008) or disable the disks from one disk group.  Then a rescan should allow the other disk group as OK.  Change the name of the disk group when importing via VEA (adjust any cluster resources to match.)  Now you should be able to online/enable the disks that you disabled before and the disk group on those disks will be available with the original name.

Thanks,
Wally

FYI - if you are truely in a down situation as you mentioned with no access to your data/disk groups, open the case as a Servirty 1 case and ask to be live transferred to the next available TSE.  This will get you placed at the top of the support queue.  We have a 30 minute SLA for Sev1 live transfer cases.
Highlighted

I was able to "recover" one


I was able to "recover" one of those disks, but the secondone is still at Unknown Dg. And this is the only one in this group.

I have a different setup (different servers) where all disks are in Unknown Dg (this is after I reinstalled SFW) . And each time I do rescan, the displayed status of disks change randomly, i.e.:
This is from the same VEA before and after rescan:



Here is the list of disk inside Unknown Dg: (disk with OS is only usable disk in this case):


Highlighted

This is new installation, so

This is new installation, so it is not that critical. I was thinking about clearing private region of disks, since there are no data yet.
Highlighted

Hi hraju, The changing of the

Hi hraju,

The changing of the way the disks are displayed are due to the "unpredictable results can occur" comment that you mentioned before.  When a rescan is issues the Object Bus is cleared out and recreated based on what SFW finds when it scans the disks on the server.  Depending on the order that the scan is done and how fast each device responds the results can be different when there is conflicting information as in your case.

The vxtool utility can be used the wipe the private region from the disks that you are having problems with.  The issue is that vxtool is a tool that is intended for Symantec's use only.  Because of the power of this tool we recommend that customers only use it when directed by Symantec Support.

Thanks,
Wally
Highlighted

I see. I found this tool on

I see. I found this tool on symantec web, but it is password protected (so I figured out that only symantec employees can use it). All I can do is to wait for symantec support, now. Thank you for help anyway.
Highlighted
Accepted Solution!

So I worked on this with


So I worked on this with Symantec support. We tried vxtool but with no avail, because we got mesages saying that disc is not usable !

but after I managed to have all disks back. What I did is:

- installed all recommended Windows fixes (2 recommendation from qlogic - HBA manufacturer, 2 recommendation from IBM manufacturer of SDD DSM)
- updated driver of HBA - STOR Miniport (to recommended level)
- updated firmware, bios, fcode and EFI of Qlogic HBA (everything is in one download package - I did it via Sansurfer FC HBA GUI Manager)
- installed Subsystem Device Driver DSM Version 2.4.2.1-2 from IBM
- configured all two HBAs from Sansurfer FC HBA GUI Manager

this helped me to get back all disks, but one - quorum disk was still missing in VEA

so removed both cluster nodes from cluster with command "cluster node /forcecleanup"

and than when I did a rescan, I was able to import Disc group with no missing discs. Then I destroyed all disc groups (except BasicDG with boot disc). And I am going to configure this cluster again.
So I think, that problem was because the servers were part of MSCS cluster.

But first I will create a quorum disc group and then create MSCS cluster. I hope it will work out this way.

I will let you know about the outcome.

View solution in original post

Highlighted

Being a cluster should be irrelevant for missing disks

That this is MSCS or any other cluster has no impact on missing disks in nearly all cases - those it does point to various cluster nodes importing / onlining disk groups with only some of the disks from the disk group at the same time.  This should be fairly easily determinable, and assuming cluster serivces are operational it means the cluster nodes are operating independently (if they were talking cluster logic wouldn't allow the same resource online on different nodes).

As the reference you give initially mentions, the missing disk is either physically missing (such as zoning, deletion or not reservable) or it logically can't be put into a disk group (multipathing problem or mismatched configurations from disks in the disk group). 

If vxtool can't read the disk, it should mean that the disk is reserved on another node, or is a basic disk which should be seen in the VEA GUI.  It might also mean the LUN is not yet usable by the OS (though from information here unlikley) and some people find the LUN is configured read only.

In addition to previous post, another possibility for rescan of disks giving random results is that the multipathing is not working correctly, and rescanning is putting different combinations of duplicate disks in the disk group and leaving the others as missing.  In this scenario, the course of action you took resolves many of these cases.  It should be fairly easy to find this out by checking the numbers of disks on the system is the same as expected.

Software updates for drivers etc, firmware updates etc all assist as well when nodes are unable to reserve / release reservations correctly.

As you rebuild this cluster some MSCS cluster users are coming across this issue so be aware of this:
http://seer.entsupport.symantec.com/docs/340754.htm

James.




Highlighted

My experience is, if I put a

My experience is, if I put a dynamic DG as resource into MSCS cluster, it is no more possible to operate with this DG inside VEA, but only in cluster (importing, deporting).
And if from some reason the cluster service stop working you are unable to delete this resource from cluster and still you cannot work with DG or disks in this DG inside VEA or with vxtool. Therefor I had to delete whole cluster nodes and disable one path to array in device manager. Then I was able to destroy one disk group. Second disk group I was able to delete from second node with two paths enabled! Tthis is the way I did it.

Anyway I had to configure MSCS cluster before installation of SFW, because of SFW resources integration into MSCS. Then I configured quorum dynamic disk group with disk that is not used inside cluster. Then I integrated it in cluster and configured as quorum. And I had to use two symantec hot-fixes (1959339, 1961245) to start cluster.

Highlighted

Disk groups for use in

Disk groups for use in cluster are Cluster Disk Groups, once configured as resource in a cluster (SFW HA and MSCS) then the cluster will should be used to online and offline the disk group.  If the relevant cluster is not running you should be able to online and offline the disk group. You may find too that this works in the later versions of the SFW even if the cluster is running, but volume mount information will happen via the cluster. 

Destroying the cluster to get access back to the disks is unnecessary, you should have been able to use vxtool to access the disks (if the disks are present and there is no reservation issues) if the VEA gui and cli are not working.  Generally though the gui and cli will still respond with disk information of some sort.  I expect the steps you took in updating the drivers etc from what is posted here is the most likely solution in that the disks are appearing as expected and operated as expected.

Going to one path for disk operations is also unnecessary in instances where the multipathing solution in use is working correctly. 

Hope this makes sense,
James.

 

Highlighted

As you can see on screenshot,

As you can see on screenshot, when Dynamic DG is onlined in cluster Administrator, almost all options for DG are greyed out in VEA:

Highlighted

hraju - in MSCS, the

hraju - in MSCS, the diskgroups that are created as a resource have most of the right-click options grayed out in the VEA and that is normal.

There are 2 ways to work with the diskgroups that are part of the cluster while the cluster service is not running:
1. Go to device manager, show hidden devices and disable the Cluster Disk Driver. (may need to reboot). After this, MSCS no longer has its hands on the diskgroups. So you can do everything in the VEA gui. Once done, you will need to re-enable the cluster disk driver.
2. Just use the CLI. vxdg command can still be used to import/deport the diskgroups by using the -f parameter. (force)