11-26-2013 07:50 AM
Hi to all
I have redhat linux 5.9 64bit with SFHA 5.1 SP1 RP4 with fencing enable ( our storage device is IBM .Storwize V3700 SFF scsi3 compliant
[root@mitoora1 ~]# vxfenadm -d
I/O Fencing Cluster Information:
================================
Fencing Protocol Version: 201
Fencing Mode: SCSI3
Fencing SCSI3 Disk Policy: dmp
Cluster Members:
* 0 (mitoora1)
1 (mitoora2)
RFSM State Information:
node 0 in state 8 (running)
node 1 in state 8 (running)
********************************************
in /etc/vxfenmode (scsi3_disk_policy=dmp and vxfen_mode=scsi3)
vxdctl scsi3pr
scsi3pr: on
[root@mitoora1 etc]# more /etc/vxfentab
#
# /etc/vxfentab:
# DO NOT MODIFY this file as it is generated by the
# VXFEN rc script from the file /etc/vxfendg.
#
/dev/vx/rdmp/storwizev70000_000007
/dev/vx/rdmp/storwizev70000_000008
/dev/vx/rdmp/storwizev70000_000009
******************************************
[root@mitoora1 etc]# vxdmpadm listctlr all
CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME
=====================================================
c0 Disk ENABLED disk
c10 StorwizeV7000 ENABLED storwizev70000
c7 StorwizeV7000 ENABLED storwizev70000
c8 StorwizeV7000 ENABLED storwizev70000
c9 StorwizeV7000 ENABLED storwizev70000
main.cf
cluster drdbonesales (
UserNames = { admin = hlmElgLimHmmKumGlj }
ClusterAddress = "10.90.15.30"
Administrators = { admin }
UseFence = SCSI3
)
**********************************************
I configured the coordinator fencing so I have 3 lun in a veritas disk group ( dmp coordinator )
All seems works fine but I noticed a lot of reservation conflict in the messages
of both nodes
On the log of the server I am constantly these messages: /var/log/messages
Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 10:0:0:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 9:0:1:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 9:0:0:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:3: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:3: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:3: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:3: reservation conflict
You have any idea?
Best Regards
Vincenzo
Solved! Go to Solution.
12-06-2013 04:46 AM
Hi
As I mentioned in first post on this thread ,I was in the same opinion that these messages are ignorable (if there are no operational issues) .. was expecting that support would say the same .. however good to have confirmation that its identified bug & would be fixed...
thx for the update ..
G
11-26-2013 07:52 PM
Hi Vincenzo,
Typically "reservation conflict" is SCSI3-PGR key write failure which could be caused by existing key or HW write failure.
Pls check the disk access capability by "dd" likely method and if there is existing key on disks by "vxfenclearpre"
11-27-2013 01:06 AM
but I only created the fencing disk group and I have not installed Oracle RAC!!!
The problem exists also both during boot server and when package cluster switching.
By log, the problem seems to recur every 20 minutes in /var/log/messages
11-27-2013 02:41 AM
Hi Vincenzo,
are there any operational effects because of these errors ? Are there any other SCSI or DMP related messages you saw around reservation conflicts ?
I saw a tech article
http://www.symantec.com/docs/TECH192940
above states that 6.0.1 should have a fix however I have checked the release notes of 6.0.1 & for 6.1 however both doesn't have anything specific to reservation conflcts. There is an issue mentioned in known issues section which appears if powerpath is in use as well (I assume you are not using powerpath).
If there are no operational issues, I would think of these messages as ignorable as I understand these messages are logged when trying to write operation on disks.
Just on another thought, to isolate the problem, is it possible to try the "raw" mode of fencing instead of DMP mode ?
G
11-27-2013 03:33 AM
Hi,
I have no error messages DMP o SCSI.
I tried this procedure http://www.symantec.com/docs/TECH192940
1-hastop -all
2-/etc/init.d/vxfen stop
3-vxdg -o groupreserve -o clearreserve -t import dgFence
4-/etc/init.d/vxfen start
Starting vxfen..
Loaded 2.6.18-128.el5 on kernel 2.6.18-348.el5
WARNING: No modules found for 2.6.18-348.el5, using compatible modules for 2.6.18-128.el5.
Starting vxfen.. Done
Please see the log file /var/VRTSvcs/log/vxfen/vxfen.log
in vxfen.log (VXFEN vxfenconfig NOTICE Driver will use SCSI-3 compliant disks)
5-hastart
6-reboot but the problem remains
Nov 26 12:39:54 mitoora1 kernel: LLT INFO V-14-1-10024 link 2 (bond0) node 1 active
Nov 26 12:39:55 mitoora1 rc: Starting xprtld: succeeded
Nov 26 12:39:55 mitoora1 rc: Starting vxodm: succeeded
Nov 26 12:39:57 mitoora1 kernel: GAB INFO V-15-1-20036 Port a gen 327d08 membership 01
Nov 26 12:39:57 mitoora1 kernel: GAB INFO V-15-1-20036 Port b gen 327d07 membership 01
Nov 26 12:39:57 mitoora1 kernel: sd 7:0:1:1: reservation conflict
Nov 26 12:39:57 mitoora1 kernel: sd 9:0:0:1: reservation conflict
Nov 26 12:39:57 mitoora1 kernel: sd 9:0:1:1: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:1:1: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:0:1: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:1:1: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:0:1: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 7:0:0:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 9:0:1:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 9:0:0:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:1:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:0:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:1:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:0:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 7:0:1:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 9:0:0:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 9:0:1:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:1:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:0:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:0:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:1:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: VXFEN INFO V-11-1-35 Fencing driver going into RUNNING state
and I have not powerpath.
Another server with the cluster up and package up I have the same problem
By log, the problem seems to recur every 20 minutes in /var/log/messages.!!!!!!!!!!!!!!!!!!
Best regards
Vincenzo
11-27-2013 05:01 AM
Hi Enzo,
Did you check what VxVM Disks and OS disks these errors correlate to?
From your vxfenadm output I can see that this is a 2 node cluster with 8paths to the disks.
On each of the disks there are 16 keys so this error shouldn't arise from the coordinator disks.
My best guess is that on some other disks visible to the host are keys from another machine, hence you see the message every discovery cycle.
Maybe some LUNS are zoned to another cluster as well and are in use on this one?
Can you run below command to find out which devices generate the error?
Output would look like
#lsscsi
[3:0:0:0] disk IBM 2105 0113 /dev/sdb
[3:0:0:1] disk IBM 2105 0113 /dev/sdf
[3:0:0:2] disk IBM 2105 0113 /dev/sdh
[3:0:0:3] disk IBM 2105 0113 /dev/sdl
Once you know the OS device name see what VxVM device they belong to:
# vxdisk path
SUBPATH DANAME DMNAME GROUP STATE
sde ibm_shark0_0 - - ENABLED
sdb ibm_shark0_0 - - ENABLED
sdd ibm_shark0_0 - - ENABLED
sdc ibm_shark0_0 - - ENABLED
Then run on these disks vxfenadm -s to see the keys.
As stated in the guide, such errors for the coordinator disks should only be seen during boot time, not every 20 minutes.
If it is any other disks (which I assume) then check whether they are in use by other cluster/machine or if the keys are left after an outage for example.
If used by another machine, hide the LUNs and if keys are left over after an outage remove the keys.
It might also be that another application is writing keys on the LUNs.
You might also see this kind of errors if the LUN setting is not r/w, in that case please check with your SAN team or HW vendor.
regards,
Dan
regards,
Dan
11-27-2013 05:59 AM
HI Dan
the key seems to be correct; in attach the output of the commands "vxfenadm -s all -f /etc/vxfentab",
11-27-2013 11:58 PM
Hi.
Can you attach the dmpevents.log file from server ?
Also, Do you know the firmware version of Storage here ? Is should be 4.2.1x or above
G
11-28-2013 12:33 AM
Hi Gaurav
in attach the dmpevents.log .
The version Storage V3700 is: 7.1.0 Build 79.8.1307111000
Best regards
Vincenzo
11-28-2013 01:00 AM
Hi,
the dmpevents suggests that you are receiving reservation errors with DMP on second set of highlighted devices.. i.e
sdy storwizev70000_000008 - - ENABLED
sdu storwizev70000_000008 - - ENABLED
sdag storwizev70000_000008 - - ENABLED
sdac storwizev70000_000008 - - ENABLED
sdi storwizev70000_000008 - - ENABLED
sde storwizev70000_000008 - - ENABLED
sdm storwizev70000_000008 - - ENABLED
sdq storwizev70000_000008 - - ENABLED
& if these all are paths to storwizev70000_000008
please give the details of this device
# vxdisk list
# vxdisk -e list
# vxddladm listsupport all
# vxddladm listexclude all
# vxddladm list devices
along with above listed devices, there are many other which are reporting reservation conflicts
I would still recommend to raise a support case & see if they have any fix/patch available for this. I believe they would have one
G
11-28-2013 01:18 AM
Hi,
In attach the output files about commands you suggested me. (lists.txt)
Thanks!!!!!!!
11-28-2013 09:03 AM
All outputs looks OK , ASL is also claiming the devices .. nothing wrong here ..
Do you know what failover mode is set in the array ? DMP has recommendation to run best in ALUA mode (worth looking at this as well)
G
11-29-2013 08:22 AM
Hi Gaurav,
The vendor IBM has confirmed to me that the Storwize V3700 type is ALUA.
The problem may be the library?
rpm -qa|grep VRTSaslapm
VRTSaslapm-5.1.134.000-SP1_RHEL5
vxddladm listsupport all |grep -i alua (I don't see IBM alua)
libvxhdsalua.so HITACHI DF600, DF600-V, DF600F, DF600F-V
libvxhpalua.so HP, COMPAQ HSV101, HSV111 (C)COMPAQ, HSV111, HSV200, HSV210, HSV300, HSV400, HSV450, HSV340, HSV360
vxdmpadm list dmpnode all |grep array-type
array-type = Disk
array-type = A/A-A-IBMSVC
array-type = A/A-A-IBMSVC
array-type = A/A-A-IBMSVC
array-type = A/A-A-IBMSVC
array-type = A/A-A-IBMSVC
array-type = A/A-A-IBMSVC
array-type = A/A-A-IBMSVC
array-type = A/A-A-IBMSVC
array-type = A/A-A-IBMSVC
array-type = A/A-A-IBMSVC
vxdmpadm listenclosure all
ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ARRAY_TYPE LUN_COUNT
=======================================================================================
disk Disk DISKS CONNECTED Disk 1
storwizev70000 StorwizeV7000 00c020207110XX00 CONNECTED A/A-A-IBMSVC 10
Best Regards
Vincenzo
11-30-2013 05:31 AM
Hi ,
Yep, its worth to ask support on this .. as per Symantec in the below article
http://www.symantec.com/business/support/index?page=content&id=TECH47728
page 35 says Storwise arrays are best supported by DMP in ALUA mode ..
& as per below article
http://www.symantec.com/business/support/index?page=content&id=TECH77062 .. there is no addition of ALUA support from change log
& unfortunately this is the last updated ASL/APM software package for Linux .. Support or backend teams can answer if there is an upcoming plan to upgrade libvxibmsvc.so for ALUA support ..
also, if there are any known issues (recently found) can be answered by support ..
All d best
G
12-06-2013 04:39 AM
Hi,
this is the answer of the support symantec:
"......As per the discussion with you because of these messages there will be no impact on the functionality of the product.
You may also refer :
http://www.symantec.com/docs/TECH170352
However will try to give the feedback internally so it get addressed in the newer releases."
Vincenzo
12-06-2013 04:46 AM
Hi
As I mentioned in first post on this thread ,I was in the same opinion that these messages are ignorable (if there are no operational issues) .. was expecting that support would say the same .. however good to have confirmation that its identified bug & would be fixed...
thx for the update ..
G