Solved: Gaurav, Since this is a

Fugitive · ‎01-27-2011

Hello,

I've a

1. Sun T5240 server configured with 4 Ldoms each with 8G Memory & 12 vCPUs.

2. Running Solaris10 u9 on all Primary domain and all the guest domains.

3. 2 Guest domains are running VCS5.1 and couple of oracleSGs .

Everything is running fine but when i configured the I/O fencing on the nodes it does not seems to work. The co-ordinator disks are from Clariion array 5Gb each. So my question is does any one have I/O fencing working on Ldoms ?

And if yes what i 'm doing wrong.

Following is the o/p .. rest i can give as asked

vxdisk list

DEVICE TYPE DISK GROUP STATUS

c0d0s2 auto:none - - online invalid

emc_clariion0_17 auto:cdsdisk emc_clariion0_17 vxfendg online

emc_clariion0_18 auto:cdsdisk emc_clariion0_18 vxfendg online

emc_clariion0_19 auto:cdsdisk emc_clariion0_19 vxfendg online

emc_clariion0_20 auto:cdsdisk - - online

emc_clariion0_21 auto:cdsdisk - - online

cat /etc/vxfentab

#

# /etc/vxfentab:

# DO NOT MODIFY this file as it is generated by the

# VXFEN rc script from the file /etc/vxfendg.

#

/dev/vx/rdmp/emc_clariion0_17s2

/dev/vx/rdmp/emc_clariion0_18s2

/dev/vx/rdmp/emc_clariion0_19s2

I have freeze the oraSG

hastatus -sum

-- SYSTEM STATE

-- System State Frozen

A Node1 RUNNING 0

A Node2 RUNNING 0

-- GROUP STATE

-- Group System Probed AutoDisabled State

B oraSG Node1 Y N OFFLINE

B oraSG Node2 Y N ONLINE

-- GROUPS FROZEN

-- Group

C oraSG

-- RESOURCES DISABLED

-- Group Type Resource

H oraSG DiskGroup oraDG

H oraSG IP oraIP

H oraSG Mount oraMNT

H oraSG NIC oraNIC

H oraSG Netlsnr oraLSN

H oraSG Oracle OraSER

H oraSG Volume oraVOL

Gaurav_S · ‎02-02-2011

This is documented in 5.1SP1 guide too.... so this doesn't seems to be fixed yet...

https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/sfha_virtualization_51sp1_sol.pdf

Guest LDom node shows only 1 PGR key instead of 2 after
rejecting the other node in the cluster
For configuration information concerning the guest LDom node shows only 1 PGR
key instead of 2 after rejecting the other node in the cluster:
See Figure 5-4 on page 76.
This was observed while performing a series of reboots of the primary and alternate
I/O domains on both the physical hosts housing the two guests. At some point
one key is reported missing on the coordinator disk.
This issue is under investigation. The vxfen driver can still function as long as
there is 1 PGR key. This is a low severity issue as it will not cause any immediate
Storage Foundation and High Availability Solutions support for VM Server for SPARC (Logical Domains)
Known issues
96
interruption. Symantec will update this issue when the root cause is found for
the missing key

View solution in original post

Gaurav_S · ‎01-27-2011

Is this first time you are trying to configure or it was working before ? is PR bit enabled from storage end ?

Have you tried running "vxfentsthdw" utility & see if all tests are passed ? please note, this utility should be run on a blank disk (not on data disk) as it may erase the data...

see below post on how vxfentsthdw works :

https://www-secure.symantec.com/connect/forums/test-disk-io-fencing-vxfentsthdw

Gaurav

Fugitive · ‎01-27-2011

Yes this is the first time m trying to configure fencing on Ldoms

I tried the vxfentsthdw got following result

########################################################################################

Testing Node1 /dev/vx/rdmp/emc_clariion0_17s2 Node2 /dev/vx/rdmp/emc_clariion0_17s2

Evaluate the disk before testing ........................... Pre-existing keys

There are VERITAS I/O Fencing keys on the disk, please make sure that

I/O Fencing is shut down on all nodes of the cluster before continuing.

******** WARNING!!!!!!!! ********

THIS SCRIPT CAN ONLY BE USED IF THERE ARE NO OTHER ACTIVE NODES IN

THE CLUSTER! VERIFY ALL OTHER NODES ARE POWERED OFF OR INCAPABLE OF

ACCESSING SHARED STORAGE.

If this is not the case, data corruption will result.

Do you still want to continue : [y/n] (default: n) y

RegisterIgnoreKeys on disk /dev/vx/rdmp/emc_clariion0_17s2 from node Node1 Passed

Clear PGR on node Node1 ....................................... Passed

RegisterIgnoreKeys on disk /dev/vx/rdmp/emc_clariion0_17s2 from node Node1 Passed

Verify registrations for disk /dev/vx/rdmp/emc_clariion0_17s2 on node Node1 Passed

RegisterIgnoreKeys on disk /dev/vx/rdmp/emc_clariion0_17s2 from node Node2 Passed

Verify registrations for disk /dev/vx/rdmp/emc_clariion0_17s2 on node Node2 Passed

Unregister keys on disk /dev/vx/rdmp/emc_clariion0_17s2 from node Node1 Passed

Verify registrations for disk /dev/vx/rdmp/emc_clariion0_17s2 on node Node2 Failed

Unregistration test for disk failed on node Node2.

Unregistration from one node is causing unregistration of keys from the other node.

Disk is not SCSI-3 compliant on node Node2.

Execute the utility vxfentsthdw again and if failure persists contact

the vendor for support in enabling SCSI-3 persistent reservations

diskpath.2011-01-27_09.50.12.Node1 100% 32 0.0KB/s 00:00

disklist.2011-01-27_09.50.12.Node1 100% 51 0.1KB/s 00:00

diskpathlist.2011-01-27_09.50.12.Node1 100% 96 0.1KB/s 00:00

diskpath.2011-01-27_09.50.12.Node2 100% 32 0.0KB/s 00:00

Removing test keys and temporary files, if any...

########################################################################################

And the other thing is that after the fencing is configured it shows only one nodes reservation on the co-ordinator disks

Do you know why is it so ?

vxfenadm -s all -f /etc/vxfentab

Device Name: /dev/vx/rdmp/emc_clariion0_19s2

Total Number Of Keys: 1

key[0]:

[Numeric Format]: 86,70,48,48,48,49,48,48

[Character Format]: VF000100

* [Node Format]: Cluster ID: 1 Node ID: 0 Node Name: Node1

Device Name: /dev/vx/rdmp/emc_clariion0_18s2

Total Number Of Keys: 1

key[0]:

[Numeric Format]: 86,70,48,48,48,49,48,48

[Character Format]: VF000100

* [Node Format]: Cluster ID: 1 Node ID: 0 Node Name: Node1

Device Name: /dev/vx/rdmp/emc_clariion0_17s2

Total Number Of Keys: 1

key[0]:

[Numeric Format]: 86,70,48,48,48,49,48,48

[Character Format]: VF000100

* [Node Format]: Cluster ID: 1 Node ID: 0 Node Name: Node1

Gaurav_S · ‎01-27-2011

Hello,

To me it doesn't look like any supportability issue... you can see that keys are already existing... I am not sure what -s is reading (will find it out though) .. can u give output of these:

# vxfenadm -g all -f /etc/vxfentab ( -g will read registrations & -r will read out reservations, coordinator disks will always have registrations only)..

also, what is output of:

# gabconfig -a (from both nodes)

# cat /etc/vxfenmode (from both nodes)

# cat /etc/vxfendg (from both nodes)

# modinfo |grep -i vx

Gaurav

Fugitive · ‎01-27-2011

Node1# vxfenadm -g all -f /etc/vxfentab

VXFEN vxfenadm WARNING V-11-2-2414 This option is deprecated and would be removed with the next release.

Please use the -s option.

Device Name: /dev/vx/rdmp/emc_clariion0_19s2

Total Number Of Keys: 1

key[0]:

Key Value [Numeric Format]: 86,70,48,48,48,49,48,49

Key Value [Character Format]: VF000101

Device Name: /dev/vx/rdmp/emc_clariion0_18s2

Total Number Of Keys: 1

key[0]:

Key Value [Numeric Format]: 86,70,48,48,48,49,48,49

Key Value [Character Format]: VF000101

Device Name: /dev/vx/rdmp/emc_clariion0_17s2

Total Number Of Keys: 1

key[0]:

Key Value [Numeric Format]: 86,70,48,48,48,49,48,49

Key Value [Character Format]: VF000101

Node1# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen 9ae107 membership 01

Port b gen 9ae111 membership 01

Port h gen 9ae114 membership 01

Node1# cat /etc/vxfenmode

#

# vxfen_mode determines in what mode VCS I/O Fencing should work.

#

# available options:

# scsi3 - use scsi3 persistent reservation disks

# customized - use script based customized fencing

# sybase - use scsi3 disks in kernel but coordinate membership with Sybase ASE

# disabled - run the driver but don't do any actual fencing

#

vxfen_mode=scsi3

#

# scsi3_disk_policy determines the way in which I/O Fencing communicates with

# the coordination disks.

#

# available options:

# dmp - use dynamic multipathing

# raw - connect to disks using the native interface

#

scsi3_disk_policy=dmp

Node1# cat /etc/vxfendg

vxfendg

Node1# modinfo | grep -i vx

47 7bf5c000 4d358 303 1 vxdmp (VxVM 5.1 DMP Driver)

49 7ba00000 1f8df8 304 1 vxio (VxVM 5.1 I/O driver)

51 7bfa1070 d38 305 1 vxspec (VxVM 5.1 control/status driver)

217 7bbf5210 bf0 306 1 vxportal (VxFS 5.1_REV-7Oct2009 portal dr)

218 7a600000 1ca6f0 21 1 vxfs (VxFS 5.1_REV-7Oct2009 SunOS 5.1)

221 7aaec000 64270 310 1 vxfen (VRTS Fence 5.1)

236 7a7c4000 a0a0 307 1 fdd (VxQIO 5.1_REV-7Oct2009 Quick I/)

Node1#

*************************************************************

From Node2

Node2# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen 9ae107 membership 01

Port b gen 9ae111 membership 01

Port h gen 9ae114 membership 01

Node2# cat /etc/vxfenmode

#

# vxfen_mode determines in what mode VCS I/O Fencing should work.

#

# available options:

# scsi3 - use scsi3 persistent reservation disks

# customized - use script based customized fencing

# sybase - use scsi3 disks in kernel but coordinate membership with Sybase ASE

# disabled - run the driver but don't do any actual fencing

#

vxfen_mode=scsi3

#

# scsi3_disk_policy determines the way in which I/O Fencing communicates with

# the coordination disks.

#

# available options:

# dmp - use dynamic multipathing

# raw - connect to disks using the native interface

#

scsi3_disk_policy=dmp

Node2# cat /etc/vxfendg

vxfendg

Node2#

# modinfo | grep -i vx

47 7bf5c000 4d358 303 1 vxdmp (VxVM 5.1 DMP Driver)

49 7ba00000 1f8df8 304 1 vxio (VxVM 5.1 I/O driver)

51 7bfa1070 d38 305 1 vxspec (VxVM 5.1 control/status driver)

217 7bbf5210 bf0 306 1 vxportal (VxFS 5.1_REV-7Oct2009 portal dr)

218 7a600000 1ca6f0 21 1 vxfs (VxFS 5.1_REV-7Oct2009 SunOS 5.1)

221 7b722000 64270 310 1 vxfen (VRTS Fence 5.1)

236 7a7c4000 a0a0 307 1 fdd (VxQIO 5.1_REV-7Oct2009 Quick I/)

Gaurav_S · ‎01-27-2011

This is more concerning from above output:

Unregistration test for disk failed on node Node2.

Unregistration from one node is causing unregistration of keys from the other node.

Disk is not SCSI-3 compliant on node Node2.

Since keys from one node are existing on coordinator disks, it is hard to doubt on storage.. only thing to note then is, all zonings from SAN/switches & settings are same for both the nodes.. Still I would suggest for you to double check on storage settings for both the nodes..

outputs are ok except keys on coordinator disks..... what is node id as per /etc/llthosts ?

# cat /etc/llthosts

doesn't looks right that even registration is not happening on coordinator disks...

what I would probably suggest is to take down cluster services (from node whose keys are not registered), stop the fencing module & restart it ... want to see if fencing module throws any error while starting up... registrations are made on coordinator disks once fencing module starts.. Once I have llthosts output, can tell you which node you can try this on..

# hastop -local force (if you want applications to keep running) or else hastop -all (if you are ok to shutdown all services)

# /etc/init.d/vxfen stop (this should remove port b from gabconfig -a output)

# /etc/init.d/vxfen start (want to see if any errors appear here & coordinator disks should see keys from both the nodes)

Gaurav

Fugitive · ‎01-27-2011

Gaurav,

Since this is a Ldom environment .. there is no zoning from the switches .. the storage is shared to both the domains from the primary domain .. just like the oradg disk group .. and after restarting the fencing on both the nodes .. node1's registeration keys were kicked out and the node2 now shows like this

vxfenadm -s all -f /etc/vxfentab

Device Name: /dev/vx/rdmp/emc_clariion0_17s2

Total Number Of Keys: 1

key[0]:

[Numeric Format]: 86,70,48,48,48,49,48,49

[Character Format]: VF000101

* [Node Format]: Cluster ID: 1 Node ID: 1 Node Name: Node2

Device Name: /dev/vx/rdmp/emc_clariion0_18s2

Total Number Of Keys: 1

key[0]:

[Numeric Format]: 86,70,48,48,48,49,48,49

[Character Format]: VF000101

* [Node Format]: Cluster ID: 1 Node ID: 1 Node Name: Node2

Device Name: /dev/vx/rdmp/emc_clariion0_19s2

Total Number Of Keys: 1

key[0]:

[Numeric Format]: 86,70,48,48,48,49,48,49

[Character Format]: VF000101

* [Node Format]: Cluster ID: 1 Node ID: 1 Node Name: Node2

cat /etc/llthosts

0 Node1

1 Node2

and the node1 panic and rebooted

And one more thing do you where i can get lltlink_disable & lltlink_enable scripts .. i have lost them .. so to test the fencing and link failures in Ldom environment i need that ..

Thanks

Fugitive · ‎02-02-2011

Can anyone help me on this ?

Gaurav_S · ‎02-02-2011

I missed to answer this one.. This was seen as known issue in 5.1

http://sfdoccentral.symantec.com/sf/5.1/solaris/html/sfha_virtualization/ch04s15s01s02.htm

Gaurav

Gaurav_S · ‎02-02-2011

This is documented in 5.1SP1 guide too.... so this doesn't seems to be fixed yet...

https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/sfha_virtualization_51sp1_sol.pdf

Guest LDom node shows only 1 PGR key instead of 2 after
rejecting the other node in the cluster
For configuration information concerning the guest LDom node shows only 1 PGR
key instead of 2 after rejecting the other node in the cluster:
See Figure 5-4 on page 76.
This was observed while performing a series of reboots of the primary and alternate
I/O domains on both the physical hosts housing the two guests. At some point
one key is reported missing on the coordinator disk.
This issue is under investigation. The vxfen driver can still function as long as
there is 1 PGR key. This is a low severity issue as it will not cause any immediate
Storage Foundation and High Availability Solutions support for VM Server for SPARC (Logical Domains)
Known issues
96
interruption. Symantec will update this issue when the root cause is found for
the missing key

Fugitive · ‎02-02-2011

Thanks for the update Gaurav. Marking it Solution .. :)

And do you know where i can get the llt_disable/enable scripts . .

Gaurav_S · ‎02-02-2011

I don't have those scripts either... don't remember but are those scripts propreitery or intellectual property of Symantec ? If yes then it won't be shared...

Gaurav

VOX

I/O fencing not working on VCS5.1 on Ldoms2.0