Solved: CVM won't start on remote node with an FSS diskgro...

mikebounds · ‎05-13-2014

I am testing FSS (Flexible Shared Storage) on SF 6.1 on RH 5.5 in a Virtual Box VM and when I try to start CVM on the remote node I get:

VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster reason: Disk for disk group not found: retry to add a node failed

Here is my setup:

Node A is master with a local (sdd) and remote disk (B_sdd)

[root@r55v61a ~]# vxdctl -c mode
mode: enabled: cluster active - MASTER
master: r55v61a
[root@r55v61a ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
B_sdd        auto:cdsdisk    -            -            online remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

Node B is the slave, and sees local (sdd) and remote disk (A_sdd)

[root@r55v61b ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
A_sdd        auto:cdsdisk    -            -            online remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

On node A, I add an FSS diskgroup, so on node A the disk is local

[root@r55v61a ~]# vxdg -s -o fss=on init fss-dg fd1_La=sdd
[root@r55v61a ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
B_sdd        auto:cdsdisk    -            -            online remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    fd1_La       fss-dg       online exported shared

And on node B the disk in fss-dg is remote

[root@r55v61b ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
A_sdd        auto:cdsdisk    fd1_La       fss-dg       online shared remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

I then stop and start VCS on node B which is when I see the issue:

2014/05/13 12:05:23 VCS INFO V-16-2-13716 (r55v61b) Resource(cvm_clus): Output of the completed operation (online) 
==============================================
ERROR: 
==============================================

2014/05/13 12:05:24 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed

If I destroy fss-dg diskgroup on node A, then CVM will start on node B, so issue is the FSS diskgroup where it seems CVM cannot find the remote disk in the diskgroup

I can also get round issue by stopping VCS on node A and then CVM will start on node B:

[root@r55v61b ~]# hagrp -online cvm -sys r55v61b
[root@r55v61b ~]# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

If I then start VCS on node A, then B is able to see the FSS diskgroup:

[root@r55v61b ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
A_sdd        auto:cdsdisk    fd1_La       fss-dg       online shared remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

I can stop and start VCS on each node when disks are just exported and VCS is able to see disk from other node, but when I create the FSS diskgroup, CVM won't start on the system that has the remote disk - does anybody have any ideas as to why?

Mike

mikebounds · ‎05-21-2014

The issue of this post which was CVM would not start if an FSS diskgroup was present, giving error message:

CVMCluster:cvm_clus:monitor:node - state: out of cluster reason: Disk for disk group not found: retry to add a node failed

was resolved by recreating separate diskgroup which was purely CVM (no exported disks). The likely issues was UDID mismatches or conflicts as it would appear with non-FSS failover and CVM diskgroups, all that is required is that VxVM read the private region, but with FSS diskgroups, my theory is the UDID is required to be used to ensure that if you export a disk then it only shows as a remote disk on other systems if the same disk can NOT be seen on the SAN of the remote system and it needs to use the UDID to determine this.

Hence in Virtual box, the same disk will normally show as having different UDID when viewed from different systems, and if this disk is shared then I did indeed see a single disk presented on one server BOTH via the SAN and as a remote disk, but when I made the UDID the same by changing the hostname of one of the nodes so both nodes had the same hostname and hence the same constucted UDID, then VxVM correctly identied the remote disk was available via the SAN and hence ONLY showed the disk as a SAN attached disk and not also a remote disk.

Although in my opening post I was not exporting any shared SAN disks (only local disks), I believe the UDID checking when autoimporting the diskgroups caused the issue.

Mike

View solution in original post

novonil_choudhu · ‎05-13-2014

Can you please paste the o/p of the following commands.

From Node A :

# vxdisk list sdd

# lltstat -nvv|more

# vxprint

From Node B :

# vxdisk list A_sdd

Can you try exporting and then importing the dgs with VCS not in picture :

# hastop -all -force

# vxdg export fss-dg

# vxdg -s -o fss=on import fss-dg

mikebounds · ‎05-13-2014

Node A:

[root@r55v61a ~]# vxdctl -c mode
mode: enabled: cluster active - SLAVE
master: r55v61b
[root@r55v61a ~]#  vxdisk list sdd
Device:    sdd
devicetag: sdd
type:      auto
clusterid: r55v61c1
disk:      name=fd1_La id=1399977098.91.r55v61a
group:     name=fss-dg id=1399978977.116.r55v61a
info:      format=cdsdisk,privoffset=256,pubslice=3,privslice=3
flags:     online ready private autoconfig exported shared autoimport imported
pubpaths:  block=/dev/vx/dmp/sdd3 char=/dev/vx/rdmp/sdd3
guid:      {c4025770-da89-11e3-a9f1-d5ec014b3593}
udid:      ATA%5FVBOX%20HARDDISK%5FOTHER%5FDISKS%5Fr55v61a.localdomain%5F%2Fdev%2Fsdd
site:      -
version:   3.1
iosize:    min=512 (bytes) max=1024 (blocks)
public:    slice=3 offset=2304 len=517888 disk_offset=0
private:   slice=3 offset=256 len=2048 disk_offset=0
update:    time=1399982408 seqno=0.32
ssb:       actual_seqno=0.0
headers:   0 240
configs:   count=1 len=1280
logs:      count=1 len=192
Defined regions:
 config   priv 000048-000239[000192]: copy=01 offset=000000 enabled
 config   priv 000256-001343[001088]: copy=01 offset=000192 enabled
 log      priv 001344-001535[000192]: copy=01 offset=000000 enabled
 lockrgn  priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths:   1
sdd          state=enabled
connectivity: r55v61a 
[root@r55v61a ~]# lltstat -nvv
LLT node information:
    Node                 State    Link  Status  Address
   * 0 r55v61a           OPEN    
                                  eth1   UP      08:00:27:C9:3A:39
                                  eth2   UP      08:00:27:1A:E8:66
     1 r55v61b           OPEN    
                                  eth1   UP      08:00:27:2F:39:51
                                  eth2   UP      08:00:27:15:CD:E8
     2                   CONNWAIT
                                  eth1   DOWN    
                                  eth2   DOWN    
     3                   CONNWAIT
                                  eth1   DOWN    
                                  eth2   DOWN    
[root@r55v61a ~]# vxprint -g fss-dg
TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
dg fss-dg       fss-dg       -        -        -        -        -       -

dm fd1_La       sdd          -        517888   -        -        -       -
[root@r55v61a ~]#

Node B:

[root@r55v61b config]# vxdctl -c mode
mode: enabled: cluster active - MASTER
master: r55v61b
[root@r55v61b config]# vxdisk list A_sdd
Device:    A_sdd
type:      auto
clusterid: r55v61c1
disk:      name=fd1_La id=1399977098.91.r55v61a
group:     name=fss-dg id=1399978977.116.r55v61a
info:      format=cdsdisk,privoffset=256,pubslice=3,privslice=3
flags:     online ready private autoconfig remote exported shared autoimport imported
guid:      {c4025770-da89-11e3-a9f1-d5ec014b3593}
udid:      ATA%5FVBOX%20HARDDISK%5FOTHER%5FDISKS%5Fr55v61a.localdomain%5F%2Fdev%2Fsdd
site:      -
version:   3.1
iosize:    min=512 (bytes) max=1024 (blocks)
public:    slice=3 offset=2304 len=517888 disk_offset=0
private:   slice=3 offset=256 len=2048 disk_offset=0
update:    time=1399982408 seqno=0.32
ssb:       actual_seqno=0.0
headers:   0 240
configs:   count=1 len=1280
logs:      count=1 len=192
Defined regions:
 config   priv 000048-000239[000192]: copy=01 offset=000000 enabled
 config   priv 000256-001343[001088]: copy=01 offset=000192 enabled
 log      priv 001344-001535[000192]: copy=01 offset=000000 enabled
 lockrgn  priv 001536-001679[000144]: part=00 offset=000000
connectivity: r55v61a 
[root@r55v61b config]# hastop -all -force
[root@r55v61b config]# vxdg deport fss-dg
[root@r55v61b config]# vxdg -s -o fss=on import fss-dg

The import hangs, so after minute I used <ctrl><c> and then I see:

[root@r55v61b config]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
A_sdd        auto:cdsdisk    fd1_La       fss-dg       online shared remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

[root@r55v61b config]# vxprint -g fss-dg
VxVM vxprint ERROR V-5-1-582 Disk group fss-dg: No such disk group
[root@r55v61b config]# vxdg deport fss-dg
VxVM vxdg ERROR V-5-1-2275 vxdg: Disk group fss-dg: No such disk group
[root@r55v61b config]#

So vxdisk shows fss-dg is imported, but vxprint and vxdg think is is deported.
Node A shows:

[root@r55v61a ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
B_sdd        auto:cdsdisk    -            -            online remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

Mike

novonil_choudhu · ‎05-13-2014

Mike ,

Did you observe any errors / messages in syslog while importing the dg.

What is the CVM protocol and DG version. Please paste the below commands o/ps.

# vxdctl protocolversion

# vxdg list fss-dg

mikebounds · ‎05-13-2014

In /var/log/messages on node B (there are no logs on node A at time of deport/import) I see:

May 13 16:39:53 r55v61b vxvm:vxconfigd: V-5-1-16252 Disk group deport of fss-dg succeeded.
May 13 16:40:07 r55v61b vxvm:vxconfigd: V-5-1-16765 Selecting configuration database copy from A_sdd from disks: A_sdd
May 13 16:40:07 r55v61b vxvm:vxconfigd: V-5-1-16766 Trying to import the disk group fss-dg using configuration database copy from A_sdd

This is a fresh install of 6.1 with newly create diskgroups so versions are the latest:

[root@r55v61b log]# vxdctl protocolversion
Cluster running at protocol 130
[root@r55v61b log]# vxdg list fss-dg
Group:     fss-dg
dgid:      1399978977.116.r55v61a
import-id: 33792.135
flags:     shared cds
version:   190
alignment: 8192 (bytes)
local-activation: shared-write
cluster-actv-modes: r55v61b=sw r55v61a=sw
ssb:        on
autotagging:    on
detach-policy: local
dg-fail-policy: obsolete
ioship: on
fss: on
storage-sources:  r55v61a 
copies:    nconfig=default nlog=default
config:    seqno=0.1039 permlen=1280 free=1277 templen=2 loglen=192
config disk A_sdd copy 1 len=1280 state=clean online
log disk A_sdd copy 1 len=192
[root@r55v61b log]#

I am using fencing in disabled mode and I have a normal cvm diskgroup (I truncated output of vxdisk list) and and before creating FSS diskgroup, CVM would start normally and mount CFS mount on both nodes.

Mike

mikebounds · ‎05-13-2014

I noticed that after I <ctrl><c> the import, after a while the diskgroup imported so I check the messages log which showed:

May 13 16:39:53 r55v61b vxvm:vxconfigd: V-5-1-16252 Disk group deport of fss-dg succeeded.
May 13 16:40:07 r55v61b vxvm:vxconfigd: V-5-1-16765 Selecting configuration database copy from A_sdd from disks: A_sdd
May 13 16:40:07 r55v61b vxvm:vxconfigd: V-5-1-16766 Trying to import the disk group fss-dg using configuration database copy from A_sdd
May 13 16:41:02 r55v61b Had[624]: VCS CRITICAL V-16-1-50086 Mem usage on r55v61b is 91%
May 13 16:43:02 r55v61b Had[624]: VCS CRITICAL V-16-1-50086 CPU usage on r55v61b is 100%
May 13 16:43:38 r55v61b vxvm:vxconfigd: V-5-1-16254 Disk group import of fss-dg succeeded.

So it is eventually importing and it may be taking so long due to not enough CPU power, but having said that, I have installed a very light-weight O/S without X-windows which runs at 99% idle without VCS running.
With VCS running (just CVM/CFS stuff) the CPU runs at 90% idle
If I online the cvm service group on node B first then it takes 20 seconds for cvm_clus resource to online (this would be importing the other normal diskgroup on just node B) and it also takes 20 seconds for cvm_clus resource to then online on Node A (this would be importing the other normal diskgroup on node A and the fss diskgroup on both systems)

I did an import again on node B and watched the CPU and it was maxed out for nearly 5 mins with vxconfigd taking 90%, so to me this indicates that vxconfigd is doing something wrong as normal operations complete in a reasonable time.

If this does just take excessive CPU to import a diskgroup containing a remote disk, then are there any timeouts I can set for the cvm_clus resource. The only timeouts I can see see are the type OnlineTimeout which is by default 400 seconds and the resource CVMTimeout which is 200 seconds, but the resource seems to be timing out a lot earlier than this:

2014/05/13 19:04:23 VCS NOTICE V-16-1-10301 Initiating Online of Resource cvm_clus (Owner: Unspecified, Group: cvm) on System r55v61b
2014/05/13 19:04:46 VCS WARNING V-16-20006-1002 (r55v61b) CVMCluster:cvm_clus:online:CVMCluster start failed on this node.
2014/05/13 19:04:47 VCS INFO V-16-2-13716 (r55v61b) Resource(cvm_clus): Output of the completed operation (online)
==============================================
ERROR:
==============================================

2014/05/13 19:04:48 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
2014/05/13 19:05:48 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
2014/05/13 19:06:48 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
2014/05/13 19:06:48 VCS ERROR V-16-2-13066 (r55v61b) Agent is calling clean for resource(cvm_clus) because the resource is not up even after online completed.
2014/05/13 19:06:49 VCS INFO V-16-2-13068 (r55v61b) Resource(cvm_clus) - clean completed successfully.
2014/05/13 19:06:50 VCS INFO V-16-2-13072 (r55v61b) Resource(cvm_clus): Agent is retrying online (attempt number 1 of 2).
2014/05/13 19:07:13 VCS WARNING V-16-20006-1002 (r55v61b) CVMCluster:cvm_clus:online:CVMCluster start failed on this node.
2014/05/13 19:07:13 VCS INFO V-16-2-13716 (r55v61b) Resource(cvm_clus): Output of the completed operation (online)
==============================================
ERROR:
==============================================

2014/05/13 19:07:14 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
2014/05/13 19:08:13 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
2014/05/13 19:09:10 VCS INFO V-16-10031-20903 (r55v61b) CFSfsckd:vxfsckd:imf_register:/opt/VRTSamf/bin/amfregister -ipf -ouid=0,euid=0,gid=0,egid=0 -r CFSfsckd -g vxfsckd "/usr/lib/fs/vxfs/vxfsckd" -- "-p /var/adm/cfs/vxfsckd-pid"
2014/05/13 19:09:13 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
2014/05/13 19:09:14 VCS ERROR V-16-2-13066 (r55v61b) Agent is calling clean for resource(cvm_clus) because the resource is not up even after online completed.
2014/05/13 19:09:15 VCS INFO V-16-2-13068 (r55v61b) Resource(cvm_clus) - clean completed successfully.
2014/05/13 19:09:15 VCS INFO V-16-2-13072 (r55v61b) Resource(cvm_clus): Agent is retrying online (attempt number 2 of 2).
2014/05/13 19:09:38 VCS WARNING V-16-20006-1002 (r55v61b) CVMCluster:cvm_clus:online:CVMCluster start failed on this node.
2014/05/13 19:09:39 VCS INFO V-16-2-13716 (r55v61b) Resource(cvm_clus): Output of the completed operation (online)
==============================================
ERROR:
==============================================

2014/05/13 19:09:39 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
2014/05/13 19:10:39 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
2014/05/13 19:11:39 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
2014/05/13 19:11:40 VCS ERROR V-16-2-13066 (r55v61b) Agent is calling clean for resource(cvm_clus) because the resource is not up even after online completed.
2014/05/13 19:11:41 VCS INFO V-16-2-13068 (r55v61b) Resource(cvm_clus) - clean completed successfully.
2014/05/13 19:11:41 VCS INFO V-16-2-13071 (r55v61b) Resource(cvm_clus): reached OnlineRetryLimit(2).
2014/05/13 19:11:42 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
2014/05/13 19:11:42 VCS ERROR V-16-1-54031 Resource cvm_clus (Owner: Unspecified, Group: cvm) is FAULTED on sys r55v61b

So you can see here that I get a "CVMCluster start failed on this node" error after 23 seconds and then get "node - state: out of cluster
reason: Disk for disk group not found"

repeated at 60 second intervals.

Mike

mikebounds · ‎05-13-2014

Done some more investigation:

If I try to start CVM manually on node B with FSS diskgroup imported on node A, then I get:

# ./vxclustadm nodestate; ./vxclustadm -m vcs -t gab startnode; while true
> do
> date
> ./vxclustadm nodestate
> sleep 1
> done
state: out of cluster
reason: Disk for disk group not found: user initiated stop
VxVM vxclustadm INFO V-5-2-9687 vxclustadm: Fencing driver is in disabled mode
Tue May 13 20:40:50 BST 2014
state: joining
reconfig: initialized
Tue May 13 20:40:51 BST 2014
state: joining
reconfig: initialized
Tue May 13 20:40:52 BST 2014
state: joining
reconfig: initialized
Tue May 13 20:40:54 BST 2014
state: joining
reconfig: initialized
Tue May 13 20:40:56 BST 2014
state: joining
reconfig: vxconfigd in join
Tue May 13 20:40:58 BST 2014
state: joining
reconfig: vxconfigd in join
Tue May 13 20:40:59 BST 2014
state: joining
reconfig: vxconfigd in join
Tue May 13 20:41:00 BST 2014
state: joining
reconfig: vxconfigd in join
Tue May 13 20:41:01 BST 2014
state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
Tue May 13 20:41:02 BST 2014
state: out of cluster
reason: Disk for disk group not found: retry to add a node failed

So it is in joining state for just over 10 seconds and then complains "Disk for disk group not found"

If I then destroy FSS diskgroup on node A and rerun manual CVM start again as above I get:

[root@r55v61b bin]# ./vxclustadm nodestate; ./vxclustadm -m vcs -t gab startnode; while true; do date; ./vxclustadm nodestate; sleep 1; done
state: out of cluster
reason: Disk for disk group not found: retry to add a node failed
VxVM vxclustadm INFO V-5-2-9687 vxclustadm: Fencing driver is in disabled mode
Tue May 13 20:46:21 BST 2014
state: joining
reconfig: initialized
Tue May 13 20:46:22 BST 2014
state: joining
reconfig: initialized
Tue May 13 20:46:23 BST 2014
state: joining
reconfig: initialized
Tue May 13 20:46:24 BST 2014
state: joining
reconfig: initialized
Tue May 13 20:46:25 BST 2014
state: joining
reconfig: initialized
Tue May 13 20:46:27 BST 2014
state: joining
reconfig: vxconfigd in join
Tue May 13 20:46:28 BST 2014
state: joining
reconfig: vxconfigd in join
Tue May 13 20:46:29 BST 2014
state: joining
reconfig: vxconfigd in join
Tue May 13 20:46:30 BST 2014
state: joining
reconfig: vxconfigd in join
Tue May 13 20:46:31 BST 2014
state: cluster member
Tue May 13 20:46:33 BST 2014
state: cluster member

So this again is in joining state for about 10 seconds and then successfully joins CVM.

So it seems CVM is quite quickly determining that with an FSS diskgroup it can't see a disk (presumbly the remote disk).

Do you know why this might be happening?

Thanks

Mike

novonil_choudhu · ‎05-13-2014

Mark,

Please refer to SFHA 6.1 solutions guide http://www.symantec.com/docs/DOC6982 for the FSS limitations. Check whether your configuration comply with FSS requirements. Are your storage scsi3 complaint ?

Here are below limitations :

■ FSS is only supported on clusters of up to 8 nodes.
■ Disk initialization operations should be performed only on nodes with local
connectivity to the disk.
■ FSS does not support the use of boot disks, opaque disks, and non-VxVM disks
for network sharing.
■ Hot-relocation is disabled on FSS disk groups.
■ The vxresize operation is not supported on volumes and file systems from the
slave node.
■ FSS does not support non-SCSI3 disks connected to multiple hosts.
■ Dynamic Lun Expansion (DLE) is not supported.
■ FSS only supports instant data change object (DCO), created using the vxsnap
operationorby specifying"logtype=dcodcoversion=20"attributesduringvolume
creation.
■ By default creating a mirror between SSD and HDD is not supported through
vxassist, as the underlying mediatypes are different. To workaround this issue,
you can create a volume with one mediatype, for instance the HDD, which is
the default mediatype, and then later add a mirror on the SSD.
For example:
# vxassist -g diskgroup make volume size init=none
# vxassist -g diskgroup mirror volume mediatype:ssd
Optimizing storage with Flexible Storage Sharing 135
About Flexible Storage Sharing# vxvol -g diskgroup init active volume
See the "Administering mirrored volumes using vxassist" section in the Symantec
Storage Foundation Cluster File System High Availability Administrator's Guide or
the Symantec Storage Foundation for Oracle RAC Administrator's Guide.

Also check your private interconnects as all the metadata exchanges happen over there in a FSS configuration. I recommend you to check if your llt links and their settings like speed, autoneg , mtu , switch settings etc etc..are desirable.

Please paste the below o/p :

# ifconfig -a

mikebounds · ‎05-14-2014

Hi Novonil,

My disks are not SCSI3, but they are not "connected to multiple hosts" - I guess this is the point of FSS - you don't need disks in an array - you can use local disks (which probably are not going to be SCSI3).

Just to clarify, as in opening post, I am testing FSS in Virtual box, so the 2 VMs are running on my laptop using virtual networks and virtual disks. So I am not expecting this to be supported as this is not supported from a redundancy poiint of view (my laptop is a SPOF) and a data protection point of view (I am using fencing in disabled mode which gives me no protection against split brain, so in a real environment I should be using I/O fencing or CPS).

FSS was demonstated to me at Vision and this was running on virtual hardware, which I believe was VMWare ESX which I presume was using vmdks which are not SCSI3.

I installed SFCFS HA on Node A and then cloned it to create Node B, so they should be identically configured - below is output of ifconfig -a:
Node A:

[root@r55v61a log]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 08:00:27:7A:FF:1E  
          inet addr:192.168.56.51  Bcast:192.168.56.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe7a:ff1e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9744 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7611 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:806345 (787.4 KiB)  TX bytes:1783862 (1.7 MiB)

eth0:0    Link encap:Ethernet  HWaddr 08:00:27:7A:FF:1E  
          inet addr:192.168.56.55  Bcast:192.168.56.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1      Link encap:Ethernet  HWaddr 08:00:27:C9:3A:39  
          inet6 addr: fe80::a00:27ff:fec9:3a39/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:113984 errors:0 dropped:0 overruns:0 frame:0
          TX packets:121992 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:15288510 (14.5 MiB)  TX bytes:18173014 (17.3 MiB)

eth2      Link encap:Ethernet  HWaddr 08:00:27:1A:E8:66  
          inet6 addr: fe80::a00:27ff:fe1a:e866/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:114602 errors:0 dropped:0 overruns:0 frame:0
          TX packets:121423 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:15335104 (14.6 MiB)  TX bytes:17804072 (16.9 MiB)

eth3      Link encap:Ethernet  HWaddr 08:00:27:B8:54:AE  
          inet addr:192.168.57.51  Bcast:192.168.57.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:feb8:54ae/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:720 (720.0 b)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:17 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1796 (1.7 KiB)  TX bytes:1796 (1.7 KiB)

sit0      Link encap:IPv6-in-IPv4  
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Node B:

[root@r55v61b bin]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 08:00:27:68:CC:26  
          inet addr:192.168.56.52  Bcast:192.168.56.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe68:cc26/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:23233 errors:0 dropped:0 overruns:0 frame:0
          TX packets:21826 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1899658 (1.8 MiB)  TX bytes:10908440 (10.4 MiB)

eth1      Link encap:Ethernet  HWaddr 08:00:27:2F:39:51  
          inet6 addr: fe80::a00:27ff:fe2f:3951/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:140787 errors:0 dropped:0 overruns:0 frame:0
          TX packets:132047 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:21324780 (20.3 MiB)  TX bytes:19628051 (18.7 MiB)

eth2      Link encap:Ethernet  HWaddr 08:00:27:15:CD:E8  
          inet6 addr: fe80::a00:27ff:fe15:cde8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:140200 errors:0 dropped:0 overruns:0 frame:0
          TX packets:132810 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:21113691 (20.1 MiB)  TX bytes:19811643 (18.8 MiB)

eth3      Link encap:Ethernet  HWaddr 08:00:27:00:F7:01  
          inet addr:192.168.57.52  Bcast:192.168.57.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe00:f701/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:360 (360.0 b)  TX bytes:720 (720.0 b)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:38 errors:0 dropped:0 overruns:0 frame:0
          TX packets:38 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:4040 (3.9 KiB)  TX bytes:4040 (3.9 KiB)

sit0      Link encap:IPv6-in-IPv4  
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

FSS and CVM are working in terms of I can mount a filesystem on both nodes and write to filesystems from both nodes, - it is just the starting of CVM that is not working for FSS. See below:

Node A:

[root@r55v61a ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2              8022104   1531008   6077024  21% /
/dev/sda1               101086     11792     84075  13% /boot
tmpfs                   206120         0    206120   0% /dev/shm
/dev/vx/dsk/fss-dg/fssvol1
                         16384      5488     10478  35% /fss1
/dev/vx/dsk/cvm-dg/volshare1
                          5120      3820      1300  75% /share1
[root@r55v61a ~]# 
[root@r55v61a ~]# echo test > /fss1/created_from_a
[root@r55v61a ~]# echo test > /share1/created_from_a

Node B:

[root@r55v61b /]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2              8022104   1534184   6073848  21% /
/dev/sda1               101086     11792     84075  13% /boot
tmpfs                   190632         0    190632   0% /dev/shm
/dev/vx/dsk/fss-dg/fssvol1
                         16384      5488     10478  35% /fss1
/dev/vx/dsk/cvm-dg/volshare1
                          5120      3820      1300  75% /share1
[root@r55v61b /]# echo test > /fss1/created_from_b
[root@r55v61b /]# echo test > /share1/created_from_b
[root@r55v61b /]# ls -l /fss1 /share1
/fss1:
total 2
-rw-r--r-- 1 root root  5 May 13 23:25 created_from_a
-rw-r--r-- 1 root root  5 May 13 23:26 created_from_b
drwxr-xr-x 2 root root 96 May 13 22:55 lost+found

/share1:
total 2
-rw-r--r-- 1 root root  5 May 13 23:25 created_from_a
-rw-r--r-- 1 root root  5 May 13 23:26 created_from_b
drwxr-xr-x 2 root root 96 May 10 14:13 lost+found

Node A:

[root@r55v61a ~]# ls -l /fss1 /share1
/fss1:
total 2
-rw-r--r-- 1 root root  5 May 13 23:25 created_from_a
-rw-r--r-- 1 root root  5 May 13  2014 created_from_b
drwxr-xr-x 2 root root 96 May 13 22:55 lost+found

/share1:
total 2
-rw-r--r-- 1 root root  5 May 13 23:25 created_from_a
-rw-r--r-- 1 root root  5 May 13  2014 created_from_b
drwxr-xr-x 2 root root 96 May 10 14:13 lost+found

Mike

novonil_choudhu · ‎05-14-2014

Mike,

Please follow this guide for configuring SFHA deployment guide on VMware vmdks. Just to ensure you are not missing anything.

https://www-secure.symantec.com/connect/articles/storage-foundation-cluster-file-system-ha-vmware-vmdk-deployment-guide

mikebounds · ‎05-15-2014

I have had a look at this, but it is most not applicable because:

I am using Oracle VirtualBox with vmdk, not VMWare ESX and I am not trying to do things like vmotion, so really the Symantec software does not need to be aware it is in a virtual server (as was the case prior 6.0 where SFHA was VMWare unaware)
I do not have an issue with the more complicated writing data from multiple hosts, as FSS is only writing to local disks and actually writing to volumes from multiple hosts is working in regular CVM.

Actually FSS, in terms of wrting to the disks is working too, what is not working is CVM membership - how is this meant to work with FSS. Here is my understanding of what happens when a system tries to join the membership:

In regular CVM, the joining system asks CVM master what disks it has in its shared disk groups and membership is only successful if the joining system see all these disks.

For FSS, with exported disks on the Master, but NO shared diskgroups, when joining systems asks CVM master what disks it has in its shared disk groups, it replies it has no shared diskgroups, so then it joins and ONLY after it joins does it see the exported disks.

For FSS, with exported disks on the Master in a shared diskgroup, when joining systems asks CVM master what disks it has in its shared disk groups, it replies it has a disk (which is an exported disk), but the joining system won't see this disk as it is not a member yet so what appears to be happening is joining system reports it can't see this disk and won't join.

Clearly this can't be how FSS works, but this appears to be what is happening as the joining system IS reporting "Disk for disk group not found", so can you elaborate how the joining process works?

Mike

mikebounds · ‎05-15-2014

I have made some progress:

If you look at my opening post you will see I only have 1 disk in the fss-dg diskgroup which is from node A, so I added the exported /dev/sdd disk on node B to the fss-dg diskgroup so that the diskgroup now contains a disk from each system.

Initially this did not help, so when restarting CVM on node B, it still couldn't join. I then stopped CVM on node A, and when I restarted CVM on node A, I expected it not to start, because fss-dg had an exported disk from node B, but CVM did start without any errors, but it doesn't import fss-dg as I guess it does not have all the disks (it doesn't have the exported disk from node B). Then I started CVM on node B and it starts as I guess fss-dg is not imported. After CVM starts on node B, both systems can now see all disks in fss-dg and fss-dg auto imports. I then tried restarting CVM on node B and it won't start as before.

I then stopped CVM on both nodes and started CVM on node A and diskgroup does not import as above. I then tried to online CFS service group (created using cfsmntadm) and this does not online, as I expected, as the CVMVolDg resource only activates the shared diskgroup - it does not import it. This is an issue for a campus cluster as if only one node comes up when the cluster starts, you won't be able to access your storage, but FSS is supposed to work with campus clusters as per the Solutions guide:

FSS lets you configure an Active/Active campus cluster configuration with nodes
across the site. Network sharing of local storage and mirroring across sites provides
a disaster recovery solution without requiring the cost and complexity of Fibre
Channel connectivity across sites.

So I need to understand how this is meant to work, so I can figure out what is going wrong in my set-up, so I'll start a new post to find out how campus clusters work with FSS.

I've opened a new post:

https://www-secure.symantec.com/connect/forums/how-does-fss-work-campus-cluster

Mike

Sudhakar_Kasina · ‎05-15-2014

Here is how FSS works in the case node B is joining and it earlier had REMOTE disk out of exported disk from Node A.

step 1: During Join process on node B, node B receives the exported disks list from node A. (if there are more nodes in the cluster, master consolidates the list from all other nodes and sends to joiner)

step 2: Node B then forms remote disks on itself for the exported disks sent by node A.

Step 3: Other nodes in the cluster (apart from Joiner ie node B) will also form remote disks, if there are any preexported disks from node B.

step 4: Node B then goes with the regular import process. As it has the remote disk when it reached this stage, it suffices the requirement for Join and then imports the dg.

----

What you are seeing on the setup seems to be not expected behaviour. I mean that node B join should happen successfully in your case.

The suspected reason for join to be failing is because remote disk creation on node B is somehow not happening.

mikebounds · ‎05-16-2014

This is now working, but not really sure why:

I took system down and removed all shared disks and booted with just the local disks and problem went away. I then took system down and just added the single shared disk containing the "normal" cvm diskgroup and problem came back.

I then destroyed the cvm diskgroup and problem went away. I then re-created cvm diskgroup and problem did not come back. I then took system down and added all the disks back in and problem did not come back.

So re-creating the "normal" cvm diskgroup (not the fss diskgroup) seems to have fixed problem, but I suspect it is related to https://www-secure.symantec.com/connect/forums/udidmismatch-using-rhel5u5-sf61-virtual-box where Virtual Box has device (not host) specific udid which can cause duplicate udid if the disk controllers are discovered in different order after a reboot, so maybe the disk in the fss diskgroup got the same udid as the cvm diskgroup.

Mike

ccarrero · ‎05-16-2014

Hi Mike,

I did some testing and I read now your latest post. When usign VMware, something to do is to set the enableUUID flag to true. It is clear in your case it is not set because disk are listed as sdb instead of disk_0. Please set disk.EnableUUID to "TRUE" and give it another try.

I have been using that kind of environment very frequently with no issues. In fact, it is similar to what you used last week in the FSS lab in VISION. As you stated, there is no need for SCSI3 here and also no need for multi-write flag at all. But there is a need for the enableUUID.

I tried to reproduce the issue in my lab just in case, but I had no success. This is what I did:

Down is my CVM master node (this should be irrelevant, but just in case):

down~> vxdctl -c mode

mode: enabled: cluster active - MASTER

master: down

And I have a three node cluster:

down~> vxclustadm nidmap

Name                             CVM Nid    CM Nid     State

down                             1          0          Joined: Master

strange                          2          2          Joined: Slave

up                               0          1          Joined: Slave

I am going to export one of the local disks for down server:

down~> vxdisk export disk_18

It is exported. Also I have visibility from local disks from other nodes:

down~> vxdisk list | grep disk_18

disk_18      auto:cdsdisk    -            -            online exported

up_disk_18   auto:cdsdisk    -            -            online remote

down~>

I am going to create a DG using only that disk as you did:

down~> vxdg -s -o fss init mikedg fd1_La=disk_18

down~>

Here the DG:

down~> vxdisk -g mikedg list

DEVICE       TYPE            DISK         GROUP        STATUS

disk_18      auto:cdsdisk    fd1_La       mikedg       online exported shared

down~>

I also have the visibility from the up node:

up~> vxdisk -g mikedg list

DEVICE       TYPE            DISK         GROUP        STATUS

down_disk_18 auto:cdsdisk    fd1_La       mikedg       online shared remote

up~>

Now I stop VCS on the up node:

up~> hastop –local

This is the current situation:

down~> vxclustadm nidmap

Name                             CVM Nid    CM Nid     State

down                             1          0          Joined: Master

strange                          2          2          Joined: Slave

up                               0          1          Out of Cluster

down~>

Because up is now out of the cluster, it only has local storage visibility:

up~> vxdisk list

DEVICE       TYPE            DISK         GROUP        STATUS

disk_0       auto:cdsdisk    -            -            online exported shared

disk_1       auto:cdsdisk    -            -            online exported shared

disk_2       auto:cdsdisk    -            -            online exported shared

disk_3       auto:cdsdisk    -            -            online exported shared

disk_4       auto:cdsdisk    -            -            online exported shared

disk_5       auto:cdsdisk    -            -            online exported shared

disk_6       auto:cdsdisk    -            -            online exported shared

disk_7       auto:cdsdisk    -            -            online exported shared

disk_8       auto:cdsdisk    -            -            online exported shared

disk_9       auto:cdsdisk    -            -            online exported

disk_10      auto:cdsdisk    -            -            online exported

disk_11      auto:cdsdisk    -            -            online exported

disk_12      auto:cdsdisk    -            -            online exported

disk_13      auto:cdsdisk    -            -            online exported

disk_14      auto:cdsdisk    -            -            online exported

disk_15      auto:cdsdisk    -            -            online exported

disk_16      auto:cdsdisk    -            -            online exported

disk_17      auto:cdsdisk    -            -            online exported

disk_18      auto:cdsdisk    -            -            online exported

disk_19      auto:cdsdisk    -            -            online exported

disk_20      auto:cdsdisk    -            -            online exported

disk_21      auto:cdsdisk    disk_21      gold02       online

disk_22      auto:cdsdisk    disk_22      gold02       online

disk_24      auto:cdsdisk    disk_24      gold02       online

fusionio0_0  auto:cdsdisk    -            -            online ssdtrim exported

sda          auto:none       -            -            online invalid

up~>

Now we start the cluster again and the up node got the visibility for the mikedg again with no issues:

up~> vxdisk -g mikedg list

DEVICE       TYPE            DISK         GROUP        STATUS

down_disk_18 auto:cdsdisk    fd1_La       mikedg       online shared remote

up~>

And I can create a volume with no issues:

up~> vxassist -g mikedg make vol1 100m

up~>

If you still can reproduce the issue after setting the UUID, please send me an email so we can collect some debug logs from vxconfigd

mikebounds · ‎05-16-2014

Hi Carlos,

I can't reproduce issue since recreating the "normal" cvm diskgroup. I am using Virtualbox, not VMWare, so I don't know if there is an equivalent to disk.EnableUUID in Virtual Box, but I can't find one in the user manual. I am also confused by the different identifiers - I have created a shared disk for cvm as follows:

This disk is shown in the .vbox configuration file for the host (equivalent of .vmx file) for both by cluster nodes

So this is showing a UUID and the vbox manual says:

VirtualBox assigns a unique identity number (UUID) to each disk image, which is also stored inside the image.

newer Linux distributions identify the boot hard disk from the ID of the drive. The ID VirtualBox reports for a drive is determined from the UUID of the virtual disk image

I am using RH5.5, so I don't know if this constitutes as a newer Linux distributions as 5.5 is quite old.

In Linux, I can't find this UUID anywhere, but I have a GUID and a UDID:

From Node A:

[root@r55v61a ~]# vxdisk list sdk   | grep id
clusterid: r55v61c1
disk:      name=cd1_X id=1400188876.60.r55v61a
group:     name=cvm-dg id=1400194712.39.r55v61a
flags:     online ready private autoconfig udid_mismatch shared autoimport imported clone_disk
guid:      {d95c0d0c-dc76-11e3-a7b3-9a5a5203d998}
udid:      VBOX%5FHARDDISK%5FOTHER%5FDISKS%5Fr55v61a.localdomain%5F%2Fdev%2Fsdk
[root@r55v61a ~]# 
[root@r55v61a ~]# vxprint -l cd1_X
Disk group: cvm-dg

Disk:     cd1_X
info:     diskid=1400188876.60.r55v61a
assoc:    device=sdk type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/sdk3
devinfo:  publen=27392 privlen=1024
mediatype: hdd
udid:     VBOX%5FHARDDISK%5FOTHER%5FDISKS%5Fr55v61b.localdomain%5F%2Fdev%2Fsdk

Node B:

[root@r55v61b ~]# vxdisk list sdk   | grep id
clusterid: r55v61c1
disk:      name=cd1_X id=1400188876.60.r55v61a
group:     name=cvm-dg id=1400194712.39.r55v61a
guid:      {d95c0d0c-dc76-11e3-a7b3-9a5a5203d998}
udid:      VBOX%5FHARDDISK%5FOTHER%5FDISKS%5Fr55v61b.localdomain%5F%2Fdev%2Fsdk
[root@r55v61b ~]# 
[root@r55v61b ~]# vxprint -l cd1_X
Disk group: cvm-dg

Disk:     cd1_X
info:     diskid=1400188876.60.r55v61a
assoc:    device=sdk type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/sdk3
devinfo:  publen=27392 privlen=1024
mediatype: hdd
udid:     VBOX%5FHARDDISK%5FOTHER%5FDISKS%5Fr55v61b.localdomain%5F%2Fdev%2Fsdk

So I have UUID defined in the host vbox config file which is the same for a given shared disks for both hosts, and I have a GUID shown in the host which is the same for a shared disk, but the UUID and GUID are differerent ids. The UDID is yet another identifier, but as in post https://www-secure.symantec.com/connect/forums/udidmismatch-using-rhel5u5-sf61-virtual-box this is a tuple consisting of : VendorId , Product ID, Cabinet Serial number , Lun Serial Number, so I wouldn't expect this to be the same as the UUID. In the above output you can see that the UDID in the private region (shown by vxprint) is the UDID for node B which is why node A has the "udid_mismatch" flag

How is UUID related to UDID - i.e how is enabling the UUID in VMWare making the Cabinet Serial number and Lun Serial Number unique for a given disk viewed from different hosts? I looked at a "vxdisk list" from VMWare workstation (not ESX) and this too had a UDID containing the hostname and the disk device path, so I don't understanding how enabling a random generated UUID will change the Cabinet Serial number from host associated to enclosue associated and the Lun Serial Number from disk device path to the LUN id.

Do you know of any Linux or Veritas commands to view the UUID (not UDID) on the host for a disk?

Thanks

Mike

mikebounds · ‎05-16-2014

I've found out, that for at least disks on the SATA controller in my Virtual box VM, the disk UDID is being presented to the host as it is used to construct the disk serial number on the host, but vxdisk list does not use this serial number in the UDID - see https://www-secure.symantec.com/connect/forums/udidmismatch-using-rhel5u5-sf61-virtual-box#comment-1...

Mike

ccarrero · ‎05-19-2014

Hi Mike,

I am not an expert on UUIDs and UDIDs at all. What I got from some folks is that with VirtualBox the disks are claimed using OTHER_DISKS cathegory for which we genreate a fake UDID value using the hostname and device name. Since on Linux the device name changes across reboot, this UDID is not reliable and can change.

When you set enableUUID=true on VMware ESXi, the virtual disk get assigned a SCSI3 unique ID.

Carlos.

TonyGriffiths · ‎05-20-2014

Hi Mike, Carlos

Regarding the UDID aspect, I think this is due to the OTHER_DISKS classification.

When VxVM does its discovery process, it tries to claim devices against the various ASLs and classify them under the relevant enclosure. If none of the ASLs are applicable (rare these days), it will claim any SCSI3 devices in the JBOD DISKS classification. If it cannot detect SCSI3 devices, it will use the OTHER_DISKS classification.

In the OTHER_DISKS classification, the UDID that is constructed, uses the hostname of the system as you indicated. As the device is being shared, each node will have a different perspective on it (different nodenames).

Is it possible to configure the virtual machines to have scsi3-compliant disks, ?

cheers

tony

mikebounds · ‎05-20-2014

Thanks Tony - this is what I though as I have seen OTHER_DISKS in several live production servers for the internal disks, so with Virtual Box, even if you use VMDKs, they get presented to server as VBOX disks, and so its not recognised by any ASL. I don't believe you can configure scsi3-compliant virtual disks in Virtual box or VMWare. Anyway, despite the cosmetic udid_mismatch and clone disks, everything seems to work, unless you get conflicts with udid if you reboot and disks come up in a different order.

One problem I have seen is that you can't export a shared disk properly, if the udid doesn't match for a given disk that is shared between 2 hosts, so I had to change the hostname of a server so they were both the same so disks got the same udid and as VCS uses /etc/VRTSvcs/conf/sysname, this doesn't effect VCS.

So it looks like FSS is an alternative to an RDC using VVR in synchronus mode which is quite neat - Carlos is this a valid Use Case?

Mike

VOX

CVM won't start on remote node with an FSS diskgroup