cancel
Showing results for 
Search instead for 
Did you mean: 

I/O fencing not working on VCS5.1 on Ldoms2.0

Fugitive
Level 4

Hello,

 

I've a

1. Sun T5240 server configured with 4 Ldoms each with 8G Memory & 12 vCPUs.

2. Running Solaris10 u9 on all Primary domain and all the guest domains.

3. 2 Guest domains are running VCS5.1 and couple of oracleSGs .

 

Everything is running fine but when i configured the I/O fencing on the nodes it does not seems to work. The co-ordinator disks are from Clariion array 5Gb each.  So my question is does any one have I/O fencing working on Ldoms ?  

 

And if yes what i 'm doing wrong.

 

Following is the o/p .. rest i can give as asked 

 

 

vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
c0d0s2       auto:none       -            -            online invalid
emc_clariion0_17 auto:cdsdisk    emc_clariion0_17  vxfendg      online
emc_clariion0_18 auto:cdsdisk    emc_clariion0_18  vxfendg      online
emc_clariion0_19 auto:cdsdisk    emc_clariion0_19  vxfendg      online
emc_clariion0_20 auto:cdsdisk    -            -            online
emc_clariion0_21 auto:cdsdisk    -            -            online
 
 
cat /etc/vxfentab
#
# /etc/vxfentab:
# DO NOT MODIFY this file as it is generated by the
# VXFEN rc script from the file /etc/vxfendg.
#
/dev/vx/rdmp/emc_clariion0_17s2
/dev/vx/rdmp/emc_clariion0_18s2
/dev/vx/rdmp/emc_clariion0_19s2
 
 
 
I have freeze the oraSG 
 
hastatus -sum
 
-- SYSTEM STATE
-- System               State                Frozen
 
A  Node1       RUNNING              0
A  Node2       RUNNING              0
 
-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State
 
B  oraSG           Node1       Y          N               OFFLINE
B  oraSG           Node2       Y          N               ONLINE
 
-- GROUPS FROZEN
-- Group
 
C  oraSG
 
-- RESOURCES DISABLED
-- Group           Type            Resource
 
H  oraSG           DiskGroup       oraDG
H  oraSG           IP              oraIP
H  oraSG           Mount           oraMNT
H  oraSG           NIC             oraNIC
H  oraSG           Netlsnr         oraLSN
H  oraSG           Oracle          OraSER
H  oraSG           Volume          oraVOL
 
 
1 ACCEPTED SOLUTION

Accepted Solutions

Gaurav_S
Moderator
Moderator
   VIP    Certified

This is documented in 5.1SP1 guide too.... so this doesn't seems to be fixed yet...

 

https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/sfha_virtualization_51sp1_sol.pdf

 

Guest LDom node shows only 1 PGR key instead of 2 after
rejecting the other node in the cluster
For configuration information concerning the guest LDom node shows only 1 PGR
key instead of 2 after rejecting the other node in the cluster:
See Figure 5-4 on page 76.
This was observed while performing a series of reboots of the primary and alternate
I/O domains on both the physical hosts housing the two guests. At some point
one key is reported missing on the coordinator disk.
This issue is under investigation. The vxfen driver can still function as long as
there is 1 PGR key. This is a low severity issue as it will not cause any immediate
Storage Foundation and High Availability Solutions support for VM Server for SPARC (Logical Domains)
Known issues
96
interruption. Symantec will update this issue when the root cause is found for
the missing key

View solution in original post

11 REPLIES 11

Gaurav_S
Moderator
Moderator
   VIP    Certified

Is this first time you are trying to configure or it was working before ?  is PR bit enabled from storage end ?

Have you tried running "vxfentsthdw" utility & see if all tests are passed ?  please note, this utility should be run on a blank disk (not on data disk) as it may erase the data...

see below post on how vxfentsthdw works :

 

https://www-secure.symantec.com/connect/forums/test-disk-io-fencing-vxfentsthdw

 

Gaurav

Fugitive
Level 4

Yes this is the first time m trying to configure fencing on Ldoms

I tried the vxfentsthdw  got following result 

 

########################################################################################

 

Testing Node1 /dev/vx/rdmp/emc_clariion0_17s2 Node2 /dev/vx/rdmp/emc_clariion0_17s2
 
Evaluate the disk before testing  ........................... Pre-existing keys
 
There are VERITAS I/O Fencing keys on the disk, please make sure that
I/O Fencing is shut down on all nodes of the cluster before continuing.
 
         ******** WARNING!!!!!!!! ********
 
THIS SCRIPT CAN ONLY BE USED IF THERE ARE NO OTHER ACTIVE NODES IN
THE CLUSTER!  VERIFY ALL OTHER NODES ARE POWERED OFF OR INCAPABLE OF
ACCESSING SHARED STORAGE.
 
If this is not the case, data corruption will result.
 
Do you still want to continue : [y/n] (default: n) y
RegisterIgnoreKeys on disk /dev/vx/rdmp/emc_clariion0_17s2 from node Node1  Passed
Clear PGR on node Node1 ....................................... Passed
RegisterIgnoreKeys on disk /dev/vx/rdmp/emc_clariion0_17s2 from node Node1  Passed
Verify registrations for disk /dev/vx/rdmp/emc_clariion0_17s2 on node Node1  Passed
RegisterIgnoreKeys on disk /dev/vx/rdmp/emc_clariion0_17s2 from node Node2  Passed
Verify registrations for disk /dev/vx/rdmp/emc_clariion0_17s2 on node Node2  Passed
Unregister keys on disk /dev/vx/rdmp/emc_clariion0_17s2 from node Node1  Passed
Verify registrations for disk /dev/vx/rdmp/emc_clariion0_17s2 on node Node2  Failed
 
Unregistration test for disk  failed on node Node2.
         Unregistration from one node is causing unregistration of keys from the other node.
        Disk  is not SCSI-3 compliant on node Node2.
        Execute the utility vxfentsthdw again and if failure persists contact
        the vendor for support in enabling SCSI-3 persistent reservations
 
diskpath.2011-01-27_09.50.12.Node1                                                                                                    100%   32     0.0KB/s   00:00
disklist.2011-01-27_09.50.12.Node1                                                                                                    100%   51     0.1KB/s   00:00
diskpathlist.2011-01-27_09.50.12.Node1                                                                                                100%   96     0.1KB/s   00:00
diskpath.2011-01-27_09.50.12.Node2                                                                                                    100%   32     0.0KB/s   00:00
 
Removing test keys and temporary files, if any...
 
 
########################################################################################
 
 
 
 
 
And the other thing is that after the fencing is configured it shows only one nodes reservation on the co-ordinator disks 
Do you know why is it so ? 
 
 
 
 
vxfenadm -s all -f /etc/vxfentab
 
Device Name: /dev/vx/rdmp/emc_clariion0_19s2
Total Number Of Keys: 1
key[0]:
        [Numeric Format]:  86,70,48,48,48,49,48,48
        [Character Format]: VF000100
   *    [Node Format]: Cluster ID: 1     Node ID: 0   Node Name: Node1
 
Device Name: /dev/vx/rdmp/emc_clariion0_18s2
Total Number Of Keys: 1
key[0]:
        [Numeric Format]:  86,70,48,48,48,49,48,48
        [Character Format]: VF000100
   *    [Node Format]: Cluster ID: 1     Node ID: 0   Node Name: Node1
 
Device Name: /dev/vx/rdmp/emc_clariion0_17s2
Total Number Of Keys: 1
key[0]:
        [Numeric Format]:  86,70,48,48,48,49,48,48
        [Character Format]: VF000100
   *    [Node Format]: Cluster ID: 1     Node ID: 0   Node Name: Node1
 
 
 
 
 
 

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hello,

To me it doesn't look like any supportability issue... you can see that keys are already existing...  I am not sure what -s is reading (will find it out though) .. can u give output of these:

# vxfenadm -g all -f /etc/vxfentab    ( -g will read registrations & -r will read out reservations, coordinator disks will always have registrations only)..

also, what is output of:

# gabconfig -a   (from both nodes)

# cat /etc/vxfenmode (from both nodes)

# cat /etc/vxfendg (from both nodes)

# modinfo |grep -i vx

 

Gaurav

Fugitive
Level 4

 

Node1#  vxfenadm -g all -f /etc/vxfentab
VXFEN vxfenadm WARNING V-11-2-2414 This option is deprecated and would be removed with the next release.
Please use the -s option.
 
Device Name: /dev/vx/rdmp/emc_clariion0_19s2
Total Number Of Keys: 1
key[0]:
        Key Value [Numeric Format]:  86,70,48,48,48,49,48,49
        Key Value [Character Format]: VF000101
 
Device Name: /dev/vx/rdmp/emc_clariion0_18s2
Total Number Of Keys: 1
key[0]:
        Key Value [Numeric Format]:  86,70,48,48,48,49,48,49
        Key Value [Character Format]: VF000101
 
Device Name: /dev/vx/rdmp/emc_clariion0_17s2
Total Number Of Keys: 1
key[0]:
        Key Value [Numeric Format]:  86,70,48,48,48,49,48,49
        Key Value [Character Format]: VF000101
 
Node1# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   9ae107 membership 01
Port b gen   9ae111 membership 01
Port h gen   9ae114 membership 01
Node1# cat /etc/vxfenmode
#
# vxfen_mode determines in what mode VCS I/O Fencing should work.
#
# available options:
# scsi3      - use scsi3 persistent reservation disks
# customized - use script based customized fencing
# sybase     - use scsi3 disks in kernel but coordinate membership with Sybase ASE
# disabled   - run the driver but don't do any actual fencing
#
vxfen_mode=scsi3
 
#
# scsi3_disk_policy determines the way in which I/O Fencing communicates with
# the coordination disks.
#
# available options:
# dmp - use dynamic multipathing
# raw - connect to disks using the native interface
#
scsi3_disk_policy=dmp
 
Node1# cat /etc/vxfendg
vxfendg
Node1# modinfo | grep -i vx
 47 7bf5c000  4d358 303   1  vxdmp (VxVM 5.1 DMP Driver)
 49 7ba00000 1f8df8 304   1  vxio (VxVM 5.1 I/O driver)
 51 7bfa1070    d38 305   1  vxspec (VxVM 5.1 control/status driver)
217 7bbf5210    bf0 306   1  vxportal (VxFS 5.1_REV-7Oct2009 portal dr)
218 7a600000 1ca6f0  21   1  vxfs (VxFS 5.1_REV-7Oct2009 SunOS 5.1)
221 7aaec000  64270 310   1  vxfen (VRTS Fence 5.1)
236 7a7c4000   a0a0 307   1  fdd (VxQIO 5.1_REV-7Oct2009 Quick I/)
Node1#
 
 
*************************************************************
*************************************************************
 
From Node2 
 
 
Node2# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   9ae107 membership 01
Port b gen   9ae111 membership 01
Port h gen   9ae114 membership 01
Node2# cat /etc/vxfenmode
#
# vxfen_mode determines in what mode VCS I/O Fencing should work.
#
# available options:
# scsi3      - use scsi3 persistent reservation disks
# customized - use script based customized fencing
# sybase     - use scsi3 disks in kernel but coordinate membership with Sybase ASE
# disabled   - run the driver but don't do any actual fencing
#
vxfen_mode=scsi3
 
#
# scsi3_disk_policy determines the way in which I/O Fencing communicates with
# the coordination disks.
#
# available options:
# dmp - use dynamic multipathing
# raw - connect to disks using the native interface
#
scsi3_disk_policy=dmp
 
Node2# cat /etc/vxfendg
vxfendg
Node2#
# modinfo | grep -i vx
 47 7bf5c000  4d358 303   1  vxdmp (VxVM 5.1 DMP Driver)
 49 7ba00000 1f8df8 304   1  vxio (VxVM 5.1 I/O driver)
 51 7bfa1070    d38 305   1  vxspec (VxVM 5.1 control/status driver)
217 7bbf5210    bf0 306   1  vxportal (VxFS 5.1_REV-7Oct2009 portal dr)
218 7a600000 1ca6f0  21   1  vxfs (VxFS 5.1_REV-7Oct2009 SunOS 5.1)
221 7b722000  64270 310   1  vxfen (VRTS Fence 5.1)
236 7a7c4000   a0a0 307   1  fdd (VxQIO 5.1_REV-7Oct2009 Quick I/)
 
 

Gaurav_S
Moderator
Moderator
   VIP    Certified

This is more concerning from above output:

Unregistration test for disk  failed on node Node2.
         Unregistration from one node is causing unregistration of keys from the other node.
        Disk  is not SCSI-3 compliant on node Node2.
 
Since keys from one node are existing on coordinator disks, it is hard to doubt on storage.. only thing to note then is, all zonings from SAN/switches & settings are same for both the nodes.. Still I would suggest for you to double check on storage settings for both the nodes..
 
outputs are ok except keys on coordinator disks..... what is node id as per /etc/llthosts ?

# cat /etc/llthosts

doesn't looks right that even registration is not happening on coordinator disks...

what I would probably suggest is to take down cluster services (from node whose keys are not registered), stop the fencing module & restart it ... want to see if fencing module throws any error while starting up... registrations are made on coordinator disks once fencing module starts.. Once I have llthosts output, can tell you which node you can try this on..

# hastop -local force (if you want applications to keep running) or else hastop -all  (if you are ok to shutdown all services)

# /etc/init.d/vxfen stop  (this should remove port b from gabconfig -a  output)

# /etc/init.d/vxfen start   (want to see if any errors appear here & coordinator disks should see keys from both the nodes)

 

 

Gaurav

Fugitive
Level 4

Gaurav,

 

Since this is a Ldom environment .. there is no zoning from the switches .. the storage is shared to both the domains from the primary domain .. just like the oradg disk group ..  and after restarting the fencing on both the nodes .. node1's registeration keys were kicked out and the node2 now shows like this 

 

 

vxfenadm -s all -f /etc/vxfentab
 
Device Name: /dev/vx/rdmp/emc_clariion0_17s2
Total Number Of Keys: 1
key[0]:
        [Numeric Format]:  86,70,48,48,48,49,48,49
        [Character Format]: VF000101
   *    [Node Format]: Cluster ID: 1     Node ID: 1   Node Name: Node2
 
Device Name: /dev/vx/rdmp/emc_clariion0_18s2
Total Number Of Keys: 1
key[0]:
        [Numeric Format]:  86,70,48,48,48,49,48,49
        [Character Format]: VF000101
   *    [Node Format]: Cluster ID: 1     Node ID: 1   Node Name: Node2
 
Device Name: /dev/vx/rdmp/emc_clariion0_19s2
Total Number Of Keys: 1
key[0]:
        [Numeric Format]:  86,70,48,48,48,49,48,49
        [Character Format]: VF000101
   *    [Node Format]: Cluster ID: 1     Node ID: 1   Node Name: Node2
 
 
cat /etc/llthosts
0 Node1
1 Node2
 
 
 
and the node1 panic and rebooted 
 
 
 
And one more thing do you where i can get lltlink_disable & lltlink_enable scripts .. i have lost them .. so to test the fencing and link failures in Ldom environment i need that .. 
 
Thanks 

Fugitive
Level 4

 

Can anyone help me on this ?  

Gaurav_S
Moderator
Moderator
   VIP    Certified

I missed to answer this one.. This was seen as known issue in 5.1

 

http://sfdoccentral.symantec.com/sf/5.1/solaris/html/sfha_virtualization/ch04s15s01s02.htm

 

Gaurav

Gaurav_S
Moderator
Moderator
   VIP    Certified

This is documented in 5.1SP1 guide too.... so this doesn't seems to be fixed yet...

 

https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/sfha_virtualization_51sp1_sol.pdf

 

Guest LDom node shows only 1 PGR key instead of 2 after
rejecting the other node in the cluster
For configuration information concerning the guest LDom node shows only 1 PGR
key instead of 2 after rejecting the other node in the cluster:
See Figure 5-4 on page 76.
This was observed while performing a series of reboots of the primary and alternate
I/O domains on both the physical hosts housing the two guests. At some point
one key is reported missing on the coordinator disk.
This issue is under investigation. The vxfen driver can still function as long as
there is 1 PGR key. This is a low severity issue as it will not cause any immediate
Storage Foundation and High Availability Solutions support for VM Server for SPARC (Logical Domains)
Known issues
96
interruption. Symantec will update this issue when the root cause is found for
the missing key

Fugitive
Level 4

 

 

Thanks for the update Gaurav. Marking it Solution .. :) 

 

And do you know where i can get the llt_disable/enable scripts . . 

Gaurav_S
Moderator
Moderator
   VIP    Certified

I don't have those scripts either... don't remember but are those scripts propreitery or intellectual property of Symantec ? If yes then it won't be shared...

 

Gaurav