Forum Discussion

mikebounds's avatar
mikebounds
Level 6
11 years ago

CVM won't start on remote node with an FSS diskgroup

I am testing FSS (Flexible Shared Storage) on SF 6.1 on RH 5.5 in a Virtual Box VM and when I try to start CVM on the remote node I get:

VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster reason: Disk for disk group not found: retry to add a node failed

Here is my setup:

Node A is master with a local (sdd) and remote disk (B_sdd)

[root@r55v61a ~]# vxdctl -c mode
mode: enabled: cluster active - MASTER
master: r55v61a
[root@r55v61a ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
B_sdd        auto:cdsdisk    -            -            online remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

 

Node B is the slave, and sees local (sdd) and remote disk (A_sdd)

[root@r55v61b ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
A_sdd        auto:cdsdisk    -            -            online remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

 

On node A, I add an FSS diskgroup, so on node A the disk is local

[root@r55v61a ~]# vxdg -s -o fss=on init fss-dg fd1_La=sdd
[root@r55v61a ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
B_sdd        auto:cdsdisk    -            -            online remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    fd1_La       fss-dg       online exported shared

 

And on node B the disk in fss-dg is remote

[root@r55v61b ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
A_sdd        auto:cdsdisk    fd1_La       fss-dg       online shared remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

 

I then stop and start VCS on node B which is when I see the issue:

2014/05/13 12:05:23 VCS INFO V-16-2-13716 (r55v61b) Resource(cvm_clus): Output of the completed operation (online) 
==============================================
ERROR: 
==============================================

2014/05/13 12:05:24 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: Disk for disk group not found: retry to add a node failed

 

If I destroy fss-dg diskgroup on node A, then CVM will start on node B, so issue is the FSS diskgroup where it seems CVM cannot find the remote disk in the diskgroup

I can also get round issue by stopping VCS on node A and then CVM will start on node B:

[root@r55v61b ~]# hagrp -online cvm -sys r55v61b
[root@r55v61b ~]# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

 

 

If I then start VCS on node A, then B is able to see the FSS diskgroup:

[root@r55v61b ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
A_sdd        auto:cdsdisk    fd1_La       fss-dg       online shared remote
sda          auto:none       -            -            online invalid
sdb          auto:none       -            -            online invalid
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online exported

 

 

I can stop and start VCS on each node when disks are just exported and VCS is able to see disk from other node, but when I create the FSS diskgroup, CVM won't start on the system that has the remote disk - does anybody have any ideas as to why?

 

Mike

  • The issue of this post which was CVM would not start if an FSS diskgroup was present, giving error message:

    CVMCluster:cvm_clus:monitor:node - state: out of cluster reason: Disk for disk group not found: retry to add a node failed

    was resolved by recreating separate diskgroup which was purely CVM (no exported disks).  The likely issues was UDID mismatches or conflicts as it would appear with non-FSS failover and CVM diskgroups, all that is required is that VxVM read the private region, but with FSS diskgroups, my theory is the UDID is required to be used to ensure that if you export a disk then it only shows as a remote disk on other systems if the same disk can NOT be seen on the SAN of the remote system and it needs to use the UDID to determine this.

    Hence in Virtual box, the same disk will normally show as having different UDID when viewed from different systems, and if this disk is shared then I did indeed see a single disk presented on one server BOTH via the SAN and as a remote disk, but when I made the UDID the same by changing the hostname of one of the nodes so both nodes had the same hostname and hence the same constucted UDID, then VxVM correctly identied the remote disk was available via the SAN and hence ONLY showed the disk as a SAN attached disk and not also a remote disk.

    Although in my opening post I was not exporting any shared SAN disks (only local disks), I believe the UDID checking when autoimporting the diskgroups caused the issue.

    Mike

21 Replies