cancel
Showing results for 
Search instead for 
Did you mean: 

Fencing and Reservation Conflict

enzo68
Level 4

Hi to all

 

I have redhat linux 5.9 64bit with SFHA 5.1 SP1 RP4 with fencing enable ( our storage device is IBM .Storwize V3700 SFF  scsi3 compliant

[root@mitoora1 ~]# vxfenadm -d

I/O Fencing Cluster Information:
================================

 Fencing Protocol Version: 201
 Fencing Mode: SCSI3
 Fencing SCSI3 Disk Policy: dmp
 Cluster Members: 

        * 0 (mitoora1)
          1 (mitoora2)

 RFSM State Information:
        node   0 in state  8 (running)
        node   1 in state  8 (running)

 

********************************************

 in /etc/vxfenmode   (scsi3_disk_policy=dmp    and   vxfen_mode=scsi3)

vxdctl scsi3pr
scsi3pr: on

 [root@mitoora1 etc]# more /etc/vxfentab
#
# /etc/vxfentab:
# DO NOT MODIFY this file as it is generated by the
# VXFEN rc script from the file /etc/vxfendg.
#
/dev/vx/rdmp/storwizev70000_000007
/dev/vx/rdmp/storwizev70000_000008
/dev/vx/rdmp/storwizev70000_000009

******************************************

 [root@mitoora1 etc]# vxdmpadm listctlr all
CTLR-NAME       ENCLR-TYPE      STATE      ENCLR-NAME
=====================================================
c0              Disk            ENABLED      disk
c10             StorwizeV7000   ENABLED      storwizev70000
c7              StorwizeV7000   ENABLED      storwizev70000
c8              StorwizeV7000   ENABLED      storwizev70000
c9              StorwizeV7000   ENABLED      storwizev70000

main.cf

 

cluster drdbonesales (
        UserNames = { admin = hlmElgLimHmmKumGlj }
        ClusterAddress = "10.90.15.30"
        Administrators = { admin }
        UseFence = SCSI3
        )

**********************************************

 I configured the coordinator fencing so I have 3 lun in a veritas disk group ( dmp coordinator )
All seems works fine but I noticed a lot of  reservation conflict in the messages
of both nodes

On the log of the server I am constantly these messages:   /var/log/messages

Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 10:0:0:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 9:0:1:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 9:0:0:1: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:3: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:3: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:3: reservation conflict
Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:3: reservation conflict

 

 

 

You have any idea?

 

Best Regards

 

Vincenzo

 

 

 
1 ACCEPTED SOLUTION

Accepted Solutions

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi

As I mentioned in first post on this thread ,I was in the same opinion that these messages are ignorable (if there are no operational issues) .. was expecting that support would say the same .. however good to have confirmation that its identified bug & would be fixed...

 

thx for the update ..

 

G

View solution in original post

15 REPLIES 15

stinsong
Level 5

Hi Vincenzo,

Typically "reservation conflict" is SCSI3-PGR key write failure which could be caused by existing key or HW write failure.

Pls check the disk access capability by "dd" likely method and if there is existing key on disks by "vxfenclearpre"

enzo68
Level 4
Hi,

the key seems to be correct; in attach the output of the commands "vxfenadm -s all -f /etc/vxfentab",
 
"lltstat -C"  (cluster id)
 
and /opt/VRTSvcs/vxfen/bin/vxfentsthdw -c dgFence (Passed)

For the moment I only created the disk group for the fencing.

 
In the Doc "Veritas Storage Foundation and High Availability Solutions Release Notes
                 5.1 Service Pack 1 Rolling Patch 4 for Linux page 146"
 
"SCSI reservation errors during bootup
If you reboot a node of an SF Oracle RAC cluster, SCSI reservation errors may be
observed during bootup. [255515]
For example:
Nov 23 13:18:28 galaxy kernel: scsi3 (0,0,6) : RESERVATION CONFLICT
About Veritas Storage Foundation and High Availability Solutions 146
Known Issues
This message is printed for each disk that is a member of any shared disk group
which is protected by SCSI-3 I/O fencing. The message may be safely ignored."
 
 
 

but I only created the fencing disk group and I have not installed Oracle RAC!!!

 



The problem exists also  both during boot server and when  package cluster switching.

By log, the problem seems to recur every 20 minutes in /var/log/messages
 

 
Best regards
Vincenzo
 
 
 

 

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi Vincenzo,

are there any operational effects because of these errors ? Are there any other SCSI or DMP related messages you saw around reservation conflicts ?

I saw a tech article

http://www.symantec.com/docs/TECH192940

above states that 6.0.1 should have a fix however I have checked the release notes of  6.0.1 & for 6.1 however both doesn't have anything specific to reservation conflcts. There is an issue mentioned in known issues section which appears if powerpath is in use as well (I assume you are not using powerpath).

If there are no operational issues, I would think of these messages as ignorable as I understand these messages are logged when trying to write operation on disks.

Just on another thought, to isolate the problem, is it possible to try the "raw" mode of fencing instead of DMP mode ?

 

G

enzo68
Level 4

Hi,

I have no error messages DMP o SCSI.

I tried this procedure  http://www.symantec.com/docs/TECH192940

1-hastop -all

2-/etc/init.d/vxfen stop

3-vxdg -o groupreserve -o clearreserve -t import dgFence

4-/etc/init.d/vxfen start

Starting vxfen..
Loaded 2.6.18-128.el5 on kernel 2.6.18-348.el5
WARNING:  No modules found for 2.6.18-348.el5, using compatible modules for 2.6.18-128.el5.
Starting vxfen.. Done
Please see the log file /var/VRTSvcs/log/vxfen/vxfen.log

in vxfen.log  (VXFEN vxfenconfig NOTICE Driver will use SCSI-3 compliant disks)

5-hastart

6-reboot but  the problem remains

Nov 26 12:39:54 mitoora1 kernel: LLT INFO V-14-1-10024 link 2 (bond0) node 1 active
Nov 26 12:39:55 mitoora1 rc: Starting xprtld:  succeeded
Nov 26 12:39:55 mitoora1 rc: Starting vxodm:  succeeded
Nov 26 12:39:57 mitoora1 kernel: GAB INFO V-15-1-20036 Port a gen   327d08 membership 01
Nov 26 12:39:57 mitoora1 kernel: GAB INFO V-15-1-20036 Port b gen   327d07 membership 01
Nov 26 12:39:57 mitoora1 kernel: sd 7:0:1:1: reservation conflict
Nov 26 12:39:57 mitoora1 kernel: sd 9:0:0:1: reservation conflict
Nov 26 12:39:57 mitoora1 kernel: sd 9:0:1:1: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:1:1: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:0:1: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:1:1: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:0:1: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 7:0:0:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 9:0:1:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 9:0:0:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:1:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:0:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:1:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:0:3: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 7:0:1:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 9:0:0:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 9:0:1:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:1:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 8:0:0:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:0:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: sd 10:0:1:2: reservation conflict
Nov 26 12:39:58 mitoora1 kernel: VXFEN INFO V-11-1-35 Fencing driver going into RUNNING state

 

and I have not powerpath.

Another server with the cluster up and package up I have the same problem

By log, the problem seems to recur every 20 minutes in /var/log/messages.!!!!!!!!!!!!!!!!!!

 

 

Best regards

Vincenzo

 

 

 

 

Daniel_Matheus
Level 4
Employee Accredited Certified

Hi Enzo,

 

Did you check what VxVM Disks and OS disks these errors correlate to?

From your vxfenadm output I can see that this is a 2 node cluster with 8paths to the disks.

On each of the disks there are 16 keys so this error shouldn't arise from the coordinator disks.

My best guess is that on some other disks visible to the host are keys from another machine, hence you see the message every discovery cycle.

 

Maybe some LUNS are zoned to another cluster as well and are in use on this one?

 

Can you run below command to find out which devices generate the error?

Output would look like

#lsscsi

[3:0:0:0]    disk    IBM      2105             0113  /dev/sdb
[3:0:0:1]    disk    IBM      2105             0113  /dev/sdf
[3:0:0:2]    disk    IBM      2105             0113  /dev/sdh
[3:0:0:3]    disk    IBM      2105             0113  /dev/sdl
 

Once you know the OS device name see what VxVM device they belong to:

# vxdisk path
SUBPATH                     DANAME               DMNAME       GROUP        STATE
sde                         ibm_shark0_0         -            -            ENABLED
sdb                         ibm_shark0_0         -            -            ENABLED
sdd                         ibm_shark0_0         -            -            ENABLED
sdc                         ibm_shark0_0         -            -            ENABLED
 

 

Then run on these disks vxfenadm -s to see the keys.

As stated in the guide, such errors for the coordinator disks should only be seen during boot time, not every 20 minutes.

 

If it is any other disks (which I assume) then check whether they are in use by other cluster/machine or if the keys are left after an outage for example. 

If used by another machine, hide the LUNs and if keys are left over after an outage remove the keys.
It might also be that another application is writing keys on the LUNs.

 

You might also see this kind of errors if the LUN setting is not r/w, in that case please check with your SAN team or HW vendor.

 

regards,
Dan

 

regards,
Dan

enzo68
Level 4

HI Dan

the key seems to be correct; in attach the output of the commands "vxfenadm -s all -f /etc/vxfentab",

"lltstat -C"  (cluster id).
 
[root@mitoora1 ~]# lsscsi
[0:2:0:0]    disk    IBM      ServeRAID M5110e 3.24  /dev/sda
[2:0:0:0]    cd/dvd  IBM SATA DEVICE 81Y3674   IB01  /dev/sr0
[7:0:0:0]    disk    IBM      2145             0000  /dev/sdb
[7:0:0:1]    disk    IBM      2145             0000  /dev/sdc
[7:0:0:2]    disk    IBM      2145             0000  /dev/sdd
[7:0:0:3]    disk    IBM      2145             0000  /dev/sde
[7:0:1:0]    disk    IBM      2145             0000  /dev/sdf
[7:0:1:1]    disk    IBM      2145             0000  /dev/sdg
[7:0:1:2]    disk    IBM      2145             0000  /dev/sdh
[7:0:1:3]    disk    IBM      2145             0000  /dev/sdi

[8:0:0:0]    disk    IBM      2145             0000  /dev/sdj
[8:0:0:1]    disk    IBM      2145             0000  /dev/sdk
[8:0:0:2]    disk    IBM      2145             0000  /dev/sdl
[8:0:0:3]    disk    IBM      2145             0000  /dev/sdm
[8:0:1:0]    disk    IBM      2145             0000  /dev/sdn
[8:0:1:1]    disk    IBM      2145             0000  /dev/sdo
[8:0:1:2]    disk    IBM      2145             0000  /dev/sdp
[8:0:1:3]    disk    IBM      2145             0000  /dev/sdq
[9:0:0:0]    disk    IBM      2145             0000  /dev/sdr
[9:0:0:1]    disk    IBM      2145             0000  /dev/sds
[9:0:0:2]    disk    IBM      2145             0000  /dev/sdt
[9:0:0:3]    disk    IBM      2145             0000  /dev/sdu
[9:0:1:0]    disk    IBM      2145             0000  /dev/sdv
[9:0:1:1]    disk    IBM      2145             0000  /dev/sdw
[9:0:1:2]    disk    IBM      2145             0000  /dev/sdx
[9:0:1:3]    disk    IBM      2145             0000  /dev/sdy

[10:0:0:0]   disk    IBM      2145             0000  /dev/sdz
[10:0:0:1]   disk    IBM      2145             0000  /dev/sdaa
[10:0:0:2]   disk    IBM      2145             0000  /dev/sdab
[10:0:0:3]   disk    IBM      2145             0000  /dev/sdac
[10:0:1:0]   disk    IBM      2145             0000  /dev/sdad
[10:0:1:1]   disk    IBM      2145             0000  /dev/sdae
[10:0:1:2]   disk    IBM      2145             0000  /dev/sdaf
[10:0:1:3]   disk    IBM      2145             0000  /dev/sdag

[root@mitoora1 ~]# vxdisk path
SUBPATH                     DANAME               DMNAME       GROUP        STATE
sda                         disk_0               -            -            ENABLED
sdr                         storwizev70000_000005 -            -            ENABLED
sdv                         storwizev70000_000005 -            -            ENABLED
sdz                         storwizev70000_000005 -            -            ENABLED
sdad                        storwizev70000_000005 -            -            ENABLED
sdf                         storwizev70000_000005 -            -            ENABLED
sdb                         storwizev70000_000005 -            -            ENABLED
sdj                         storwizev70000_000005 -            -            ENABLED
sdn                         storwizev70000_000005 -            -            ENABLED

sds                         storwizev70000_000007 -            -            ENABLED
sdw                         storwizev70000_000007 -            -            ENABLED
sdae                        storwizev70000_000007 -            -            ENABLED
sdaa                        storwizev70000_000007 -            -            ENABLED
sdc                         storwizev70000_000007 -            -            ENABLED
sdg                         storwizev70000_000007 -            -            ENABLED
sdk                         storwizev70000_000007 -            -            ENABLED
sdo                         storwizev70000_000007 -            -            ENABLED
sdy                         storwizev70000_000008 -            -            ENABLED
sdu                         storwizev70000_000008 -            -            ENABLED
sdag                        storwizev70000_000008 -            -            ENABLED
sdac                        storwizev70000_000008 -            -            ENABLED
sdi                         storwizev70000_000008 -            -            ENABLED
sde                         storwizev70000_000008 -            -            ENABLED
sdm                         storwizev70000_000008 -            -            ENABLED
sdq                         storwizev70000_000008 -            -            ENABLED

sdx                         storwizev70000_000009 -            -            ENABLED
sdt                         storwizev70000_000009 -            -            ENABLED
sdaf                        storwizev70000_000009 -            -            ENABLED
sdab                        storwizev70000_000009 -            -            ENABLED
sdd                         storwizev70000_000009 -            -            ENABLED
sdh                         storwizev70000_000009 -            -            ENABLED
sdl                         storwizev70000_000009 -            -            ENABLED
sdp                         storwizev70000_000009 -            -            ENABLED

[root@mitoora1 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
disk_0       auto:none       -            -                                online invalid-------->internal disk boot
storwizev70000_000005 auto            -            -                error ---->           FREE
storwizev70000_000007 auto:cdsdisk    -            -            online      dgFence    100 Mb
storwizev70000_000008 auto:cdsdisk    -            -            online      dgFence    100 Mb
storwizev70000_000009 auto:cdsdisk    -            -            online      dgFence    100 Mb
 
Each LUN  is seen 8 times, is correct.
 
Thanks
Vincenzo

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi.

Can you attach the dmpevents.log file from server ?

Also, Do you know the firmware version of Storage here ?  Is should be 4.2.1x  or above

 

 

G

enzo68
Level 4

Hi Gaurav

in attach the dmpevents.log .

 

The version Storage V3700 is:   7.1.0 Build  79.8.1307111000

 

 

Best regards

 

Vincenzo

 

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

the dmpevents suggests that you are receiving reservation errors with DMP on second set of highlighted devices.. i.e

sdy                         storwizev70000_000008 -            -            ENABLED
sdu                         storwizev70000_000008 -            -            ENABLED
sdag                        storwizev70000_000008 -            -            ENABLED
sdac                        storwizev70000_000008 -            -            ENABLED
sdi                         storwizev70000_000008 -            -            ENABLED
sde                         storwizev70000_000008 -            -            ENABLED
sdm                         storwizev70000_000008 -            -            ENABLED
sdq                         storwizev70000_000008 -            -            ENABLED

& if these all are paths to storwizev70000_000008

please give the details of this device

# vxdisk list

# vxdisk -e list

# vxddladm listsupport all

# vxddladm listexclude all

# vxddladm list devices

 

along with above listed devices, there are many other which are reporting reservation conflicts

 

I would still recommend to raise a support case & see if they have any fix/patch available for this. I believe they would have one

 

G

 

enzo68
Level 4

Hi,

In attach the output files about commands you suggested me.  (lists.txt)

 

Thanks!!!!!!! 

 

 

Gaurav_S
Moderator
Moderator
   VIP    Certified

All outputs looks OK , ASL is also claiming the devices .. nothing wrong here ..

Do you know what failover mode is set in the array ? DMP has recommendation to run best in ALUA mode (worth looking at this as well)

 

G

 

enzo68
Level 4

Hi Gaurav,

The vendor IBM has confirmed to me that the  Storwize V3700   type is ALUA.

 

 

The problem may be the library?

 

rpm -qa|grep VRTSaslapm
VRTSaslapm-5.1.134.000-SP1_RHEL5

 

vxddladm listsupport all |grep -i alua (I don't see IBM alua)
libvxhdsalua.so     HITACHI             DF600, DF600-V, DF600F, DF600F-V
libvxhpalua.so      HP, COMPAQ          HSV101, HSV111 (C)COMPAQ, HSV111, HSV200, HSV210, HSV300, HSV400, HSV450, HSV340, HSV360

 

vxdmpadm list dmpnode all |grep array-type
array-type      = Disk
array-type      = A/A-A-IBMSVC
array-type      = A/A-A-IBMSVC
array-type      = A/A-A-IBMSVC
array-type      = A/A-A-IBMSVC
array-type      = A/A-A-IBMSVC
array-type      = A/A-A-IBMSVC
array-type      = A/A-A-IBMSVC
array-type      = A/A-A-IBMSVC
array-type      = A/A-A-IBMSVC
array-type      = A/A-A-IBMSVC

 vxdmpadm listenclosure all
ENCLR_NAME        ENCLR_TYPE     ENCLR_SNO      STATUS       ARRAY_TYPE     LUN_COUNT
=======================================================================================
disk              Disk           DISKS                CONNECTED    Disk        1
storwizev70000    StorwizeV7000  00c020207110XX00     CONNECTED    A/A-A-IBMSVC  10

 

Best Regards

Vincenzo

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi ,

Yep, its worth to ask support on this .. as per Symantec in the below article

http://www.symantec.com/business/support/index?page=content&id=TECH47728

page 35 says Storwise arrays are best supported by DMP in ALUA mode ..

 

& as per below article

http://www.symantec.com/business/support/index?page=content&id=TECH77062 .. there is no addition of ALUA support from change log

& unfortunately this is the last updated ASL/APM software package for Linux ..  Support or backend teams can answer if there is an upcoming plan to upgrade libvxibmsvc.so for ALUA support ..

also, if there are any known issues (recently found) can be answered by support ..

 

All d best

 

G

enzo68
Level 4

Hi,

this is the answer of the support symantec:

 

"......As per the discussion with you because of these messages there will be no impact on the functionality of the product.


You may also refer :

http://www.symantec.com/docs/TECH170352

However will try to give the feedback internally so it get addressed in the newer releases."

 

thank you for the support
 
have a nice weekend
 

Vincenzo

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi

As I mentioned in first post on this thread ,I was in the same opinion that these messages are ignorable (if there are no operational issues) .. was expecting that support would say the same .. however good to have confirmation that its identified bug & would be fixed...

 

thx for the update ..

 

G