Forum Discussion

mikebounds's avatar
mikebounds
Level 6
11 years ago

udid_mismatch using RHEL5u5 SF6.1 on Virtual Box

For a lab enviroment for testing on my laptop, I have used Virtual Box with RHEL 5.5 with SFHA6.1 and I have shared a disk by creating a disk of "fixed size" and setting type to "Shareable", but I get udid_mismatch as the udid seems to be host (not device) specific:

Host a:

[root@r55v61a ~]# vxdisk list sdc | egrep "guid|udid"
guid:      {1ac21a9c-a550-11e3-a7fd-30ba236498b6}
udid:      VBOX%5FHARDDISK%5FOTHER%5FDISKS%5Fr55v61a.localdomain%5F%2Fdev%2Fsdc
 
Host b:
[root@r55v61b ~]# vxdisk list sdc | egrep "guid|udid"
flags:     online ready private autoconfig udid_mismatch clone_disk
guid:      {1ac21a9c-a550-11e3-a7fd-30ba236498b6}
udid:      VBOX%5FHARDDISK%5FOTHER%5FDISKS%5Fr55v61b.localdomain%5F%2Fdev%2Fsdc
 
So here you can see that the udid is different between the hosts as it contains the hostname (and device path /dev/sdc which could be different too).
It is the same disk as you can see the guid is the the same and the uuid is the same in the .vbox config file and if I create a diskgroup on one node then I am able to see the diskgroup and mount and write to the filesystem on both nodes.
 
As I understand, "vxdisk list" is showing me the udid in the Device Discovery Layer and SF writes this udid to the private region and so if for exampe I run "vxdisk updateudid" on one node, then udid_mismatch is fixed on that node, but then of course the other node gets the udid_mismatch.
 
Note it is the hostname in the udid, NOT the virtual machine name and so if I change the hostname of the server so that both nodes are the same, then this fixes the issue which is sort of a workaround as VCS uses /etc/VRTSvcs/sysname, not the hostname, but having duplicate hostnames may cause other issues.
 
 
The UDID is a tuple( 4 field) consisting of : VendorId , Product ID, Cabinet Serial number , Lun Serial Number
 
So it would seem the Cabinet Serial number is resolving to the hostname and the Lun Serial Number is resolving to the O/S device path.
Vbox seems to emulate all disks, local AND shared to be in a single enclosue within the host (i.e internal disks), so I guess this is why the issue is arising as Vbox is reporting a single shared disk is contained in both hosts (and sometimes reporting a different Lun Serial Number if the device path for the disk is not the same on each host, which occurs if you have a different number of disks on each system or the controllers are not discovered in the same order).
I had a look on VMWare workstation and this does the same - it seems to emulate all disks, local AND shared to be in a single enclosue within the host
 
Does anyone know how to work-a-round this?
 
Thanks
 
Mike

 

  • Thanks to Tony Griffiths and Carlos Carrero on post https://www-secure.symantec.com/connect/forums/cvm-wont-start-remote-node-fss-diskgroup for helping with this who told me that for the disk to show the properly contructed UDID of VendorId , Product ID, Cabinet Serial number , Lun Serial Number, on a virtual host which cannot present scsi3-compliant disks, then:

    1. The UUID (not UDID) of the disk needs to be exposed to the virtual host
    2. There needs to be an ASL for the disk.

    For VMWare you can use enableUUID=true to expose UUID to virtual host and in Virtual Box, disks attached to SATA controller automatically have their UUID exposed to the virtual host and disks attached to SCSI or SAS controllers do not, and cannot, have their UUID exposed to the virtual host.

    For VMWare vmdk there is an ASL, but even if you use vmdk in virtual box, the vendor still shows as VBOX so is not recognised by ASL.

    If 1 and 2 are not met, which is the case with Virtual Box as there is not ASL for VBOX, then the enclosure is discovered as OTHER_DISKS and a fake UDID value is generated using the hostname and device name.

    Most of the time the udidmismatch is just a costmetic issue, but it can cause issues especially with FSS diskgroups (see https://www-secure.symantec.com/connect/forums/cvm-wont-start-remote-node-fss-diskgroup#comment-10139131) and therefore the workaround I suggest for this in Virtual Box is:

    1. Make the hostname of each cluster node the same by editing the /etc/sysconfig/network file, so for example make them both NodeA.  The 2 nodes are still identified separated by their IP so you can ping NodeA and NodeB and pinging NodeB will ping the node you know as NodeB, but actually has hostname of NodeA.  VCS uses /etc/VRTSvcs/conf/sysname, so again this file will still reflect the names NodeA and NodeB
       
    2. Ensure the disks have the same device paths on both nodes.  To acheive this it is easlier if the nodes have the same local and shared disks on the same ports so they are discovered in the same order, but I also found on Vbox, that SATA is always discovered first, but SCSI and SAS can be discovered either way round so it is best to only use one of either SCSI or SAS, in addition, if you want, with SATA.

    The problem with the above is that is makes all disks have the same UUID, so for example, sdc will have the same UUID as sdc on another and this is great for shared disks which means you won't get udidmismatch message and you will be able to export shared disks, but it will also make local disks have the same UUID, but this only seems to cause a problem if you export a local disk.  So if for example sdc was local and you wanted to export it, this would cause as issue as vxvm would think the local sdc disk on another system was the same disk as it would have the same UUID, so to get round this you would have to exclude sdc from vxvm view from the remote system (use vxdiskadm option 17, then option 4).

    Mike

     

  • I did some investigation using "haparm -I /dev/sdX" and "scsi_id -g -s /block/sdX" and for disks attached to the SATA controller this gives me a disk Serial number, but disks on SCSI and SAS controller yield no information.  So for instance, one disk on my SCSI controller gives:

    [root@r55v61b ~]# hdparm -I /dev/sdc | grep Serial
        Serial Number:      VB3bce1c2b-c3d02292 
    [root@r55v61b ~]# scsi_id -g -s /block/sdc
    SATA     VBOX HARDDISK  VB3bce1c2b-c3d02292 

    and this serial number corresponds to the UUID of the disk in the .vbox configuration file for the host:

    <HardDisk uuid="{3bce1c2b-7530-4e97-97b9-6ab49222d0c3}"

    So you can see the LUN serial number is "VB" followed by the first 4 Hex bytes of the UUDI "3b ce 1c 2b" followed by the last 4 hex bytes of the UUID "92 22 d0 c3" in reverse order, so "c3 d0 22 92".

    But vxdisk list shows a UDID of:

    [root@r55v61b ~]# vxdisk list sdc | grep udid
    udid:      ATA%5FVBOX%20HARDDISK%5FOTHER%5FDISKS%5Fr55v61b.localdomain%5F%2Fdev%2Fsdc

    So why is the "Lun Serial Number" field of the UDID tuple showing percent encode of /dev/sdc, rather than the LUN serial number of "VB3bce1c2b-c3d02292" ?

    Mike