01-27-2011 05:24 AM
Hello,
I've a
1. Sun T5240 server configured with 4 Ldoms each with 8G Memory & 12 vCPUs.
2. Running Solaris10 u9 on all Primary domain and all the guest domains.
3. 2 Guest domains are running VCS5.1 and couple of oracleSGs .
Everything is running fine but when i configured the I/O fencing on the nodes it does not seems to work. The co-ordinator disks are from Clariion array 5Gb each. So my question is does any one have I/O fencing working on Ldoms ?
And if yes what i 'm doing wrong.
Following is the o/p .. rest i can give as asked
Solved! Go to Solution.
02-02-2011 04:40 AM
This is documented in 5.1SP1 guide too.... so this doesn't seems to be fixed yet...
https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/sfha_virtualization_51sp1_sol.pdf
Guest LDom node shows only 1 PGR key instead of 2 after
rejecting the other node in the cluster
For configuration information concerning the guest LDom node shows only 1 PGR
key instead of 2 after rejecting the other node in the cluster:
See Figure 5-4 on page 76.
This was observed while performing a series of reboots of the primary and alternate
I/O domains on both the physical hosts housing the two guests. At some point
one key is reported missing on the coordinator disk.
This issue is under investigation. The vxfen driver can still function as long as
there is 1 PGR key. This is a low severity issue as it will not cause any immediate
Storage Foundation and High Availability Solutions support for VM Server for SPARC (Logical Domains)
Known issues
96
interruption. Symantec will update this issue when the root cause is found for
the missing key
01-27-2011 07:48 AM
Is this first time you are trying to configure or it was working before ? is PR bit enabled from storage end ?
Have you tried running "vxfentsthdw" utility & see if all tests are passed ? please note, this utility should be run on a blank disk (not on data disk) as it may erase the data...
see below post on how vxfentsthdw works :
https://www-secure.symantec.com/connect/forums/test-disk-io-fencing-vxfentsthdw
Gaurav
01-27-2011 08:37 AM
Yes this is the first time m trying to configure fencing on Ldoms
I tried the vxfentsthdw got following result
########################################################################################
01-27-2011 08:44 AM
Hello,
To me it doesn't look like any supportability issue... you can see that keys are already existing... I am not sure what -s is reading (will find it out though) .. can u give output of these:
# vxfenadm -g all -f /etc/vxfentab ( -g will read registrations & -r will read out reservations, coordinator disks will always have registrations only)..
also, what is output of:
# gabconfig -a (from both nodes)
# cat /etc/vxfenmode (from both nodes)
# cat /etc/vxfendg (from both nodes)
# modinfo |grep -i vx
Gaurav
01-27-2011 09:25 AM
01-27-2011 09:35 AM
This is more concerning from above output:
# cat /etc/llthosts
doesn't looks right that even registration is not happening on coordinator disks...
what I would probably suggest is to take down cluster services (from node whose keys are not registered), stop the fencing module & restart it ... want to see if fencing module throws any error while starting up... registrations are made on coordinator disks once fencing module starts.. Once I have llthosts output, can tell you which node you can try this on..
# hastop -local force (if you want applications to keep running) or else hastop -all (if you are ok to shutdown all services)
# /etc/init.d/vxfen stop (this should remove port b from gabconfig -a output)
# /etc/init.d/vxfen start (want to see if any errors appear here & coordinator disks should see keys from both the nodes)
Gaurav
01-27-2011 09:57 AM
Gaurav,
Since this is a Ldom environment .. there is no zoning from the switches .. the storage is shared to both the domains from the primary domain .. just like the oradg disk group .. and after restarting the fencing on both the nodes .. node1's registeration keys were kicked out and the node2 now shows like this
02-02-2011 04:21 AM
Can anyone help me on this ?
02-02-2011 04:31 AM
I missed to answer this one.. This was seen as known issue in 5.1
http://sfdoccentral.symantec.com/sf/5.1/solaris/html/sfha_virtualization/ch04s15s01s02.htm
Gaurav
02-02-2011 04:40 AM
This is documented in 5.1SP1 guide too.... so this doesn't seems to be fixed yet...
https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/sfha_virtualization_51sp1_sol.pdf
Guest LDom node shows only 1 PGR key instead of 2 after
rejecting the other node in the cluster
For configuration information concerning the guest LDom node shows only 1 PGR
key instead of 2 after rejecting the other node in the cluster:
See Figure 5-4 on page 76.
This was observed while performing a series of reboots of the primary and alternate
I/O domains on both the physical hosts housing the two guests. At some point
one key is reported missing on the coordinator disk.
This issue is under investigation. The vxfen driver can still function as long as
there is 1 PGR key. This is a low severity issue as it will not cause any immediate
Storage Foundation and High Availability Solutions support for VM Server for SPARC (Logical Domains)
Known issues
96
interruption. Symantec will update this issue when the root cause is found for
the missing key
02-02-2011 04:46 AM
Thanks for the update Gaurav. Marking it Solution .. :)
And do you know where i can get the llt_disable/enable scripts . .
02-02-2011 05:10 AM
I don't have those scripts either... don't remember but are those scripts propreitery or intellectual property of Symantec ? If yes then it won't be shared...
Gaurav