Fencing and Reservation Conflict
Hi to all I have redhat linux 5.9 64bit with SFHA 5.1 SP1 RP4 with fencing enable ( our storage device is IBM .Storwize V3700 SFF scsi3 compliant [root@mitoora1 ~]# vxfenadm -d I/O Fencing Cluster Information: ================================ Fencing Protocol Version: 201 Fencing Mode: SCSI3 Fencing SCSI3 Disk Policy: dmp Cluster Members: * 0 (mitoora1) 1 (mitoora2) RFSM State Information: node 0 in state 8 (running) node 1 in state 8 (running) ******************************************** in /etc/vxfenmode (scsi3_disk_policy=dmp and vxfen_mode=scsi3) vxdctl scsi3pr scsi3pr: on [root@mitoora1 etc]# more /etc/vxfentab # # /etc/vxfentab: # DO NOT MODIFY this file as it is generated by the # VXFEN rc script from the file /etc/vxfendg. # /dev/vx/rdmp/storwizev70000_000007 /dev/vx/rdmp/storwizev70000_000008 /dev/vx/rdmp/storwizev70000_000009 ****************************************** [root@mitoora1 etc]# vxdmpadm listctlr all CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME ===================================================== c0 Disk ENABLED disk c10 StorwizeV7000 ENABLED storwizev70000 c7 StorwizeV7000 ENABLED storwizev70000 c8 StorwizeV7000 ENABLED storwizev70000 c9 StorwizeV7000 ENABLED storwizev70000 main.cf cluster drdbonesales ( UserNames = { admin = hlmElgLimHmmKumGlj } ClusterAddress = "10.90.15.30" Administrators = { admin } UseFence = SCSI3 ) ********************************************** I configured the coordinator fencing so I have 3 lun in a veritas disk group ( dmp coordinator ) All seems works fine but I noticed a lot of reservation conflict in the messages of both nodes On the log of the server I am constantly these messages: /var/log/messages Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 9:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 9:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:3: reservation conflict You have any idea? Best Regards VincenzoSolved11KViews1like15CommentsNew disk "ERROR" in vxdisk list
I have2 new SAN disks attached to a host.One looks normal andVeritas can see it and initialize it. The other, shows an "ERROR"in vxdisk list. vxdisk list DEVICE TYPE DISK GROUP STATUS c0t0d0s2 auto:none - - online invalid c2t0d0s2 auto:none - - online invalid c2t6d0s2 auto:none - - online invalid fabric_50 auto:cdsdisk fabric_11 ocsrawdg online shared . . . fabric_78 auto:cdsdisk fabric_28 ocsrawdg online shared fabric_79 auto - - error fabric_80 auto:cdsdisk - - online I left out a bunch of other disks between fabric_50 and fabric_78 as they are not relevant. Note fabric_79 and fabric_80 are the 2 new disks. They both appear normal in Solaris format, and the NetApp host tools show them as both good. #format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /ssm@0,0/pci@18,600000/scsi@2/sd@0,0 1. c2t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /ssm@0,0/pci@1c,600000/scsi@2/sd@0,0 2. c2t6d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /ssm@0,0/pci@1c,600000/scsi@2/sd@6,0 3. c8t60A98000486E5A71675A5A447168634Bd0 <NETAPP-LUN-7320 cyl 6526 alt 2 hd 16 sec 2048> /scsi_vhci/ssd@g60a98000486e5a71675a5a447168634b 4. c8t60A98000486E5A7153345A4471373748d0 <NETAPP-LUN-7320 cyl 48820 alt 2 hd 255 sec 189> /scsi_vhci/ssd@g60a98000486e5a7153345a4471373748 #sanlun lun show controller: lun-pathname device filename adapter protocol lun size lun state filer2: /vol/acqbiz_vis_prod_nona_nox_cluster_ebsfsdg/lun1 /dev/rdsk/c8t60A98000486E5A71675A5A447168634Bd0s2 qlc1 FCP 102g (109521666048) GOOD filer1: /vol/acqbiz_vis_prod_nona_nox_cluster_ocsrawdg/lun1 /dev/rdsk/c8t60A98000486E5A7153345A4471373748d0s2 qlc1 FCP 1.1t (1204738326528) GOOD I've left out a lot of excess output, but the interesting stuff should be here. Finally, issuing a vxdisk init give an error. #vxdisk init fabric_79 VxVM vxdisk ERROR V-5-1-5433 Device fabric_79: init failed: Device path not valid I even tried dd'ing /dev/zero onto the first 4 blocks of the device and relabeling the disk with format. Still no joy. Does anyone have any idea what the problem might be? I'm going to have hard time convincing the SAN folks it's a problem since it looks fine with format and the NetApp tool, but there must be something I've missed. One thing I probably should mention is the LUN is 1.1T in size.Solved10KViews1like20Commentssolution needed for vxfen issue
<!-- @page { margin: 0.79in } P { margin-bottom: 0.08in } A:link { so-language: zxx } --> there is a two node cluster and we split two node cluster for upgrade. The isolated node is not coming up as vxfen is not starting /02/01 11:52:05 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying... 2013/02/01 11:52:20 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying... 2013/02/01 11:52:35 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying... 2013/02/01 11:52:50 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying... 2013/02/01 11:53:05 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying... 2013/02/01 11:53:20 VCS CRITICAL V-16-1-10031 VxFEN driver not configured. VCS Stopping. Manually restart VCS after configuring fencing ^C IOFENCING configuration seems okay on node 2 as /etc/vxfentab has the entry for co-ordinator disks and /etc/vxfendg has diskgroup entry vxfendg2 and these disks and diskgroup are visible too. DEVICE TYPE DISK GROUP STATUS c0t5006016047201339d0s2 auto:sliced - (ossdg) online c0t5006016047201339d1s2 auto:sliced - (sybasedg) online c0t5006016047201339d2s2 auto:sliced - (vxfendg1) online c0t5006016047201339d3s2 auto:sliced - (vxfendg1) online c0t5006016047201339d4s2 auto:sliced - (vxfendg1) online c0t5006016047201339d5s2 auto:sliced - (vxfendg2) online c0t5006016047201339d6s2 auto:sliced - (vxfendg2) online c0t5006016047201339d7s2 auto:sliced - - online c2t0d0s2 auto:SVM - - SVM c2t1d0s2 auto:SVM - - SVM On checking vxfen.log nvoked S97vxfen. Starting Fri Feb 1 11:50:37 CET 2013 starting vxfen.. Fri Feb 1 11:50:37 CET 2013 calling start_fun Fri Feb 1 11:50:38 CET 2013 found vxfenmode file Fri Feb 1 11:50:38 CET 2013 calling generate_vxfentab Fri Feb 1 11:50:38 CET 2013 checking for /etc/vxfendg Fri Feb 1 11:50:38 CET 2013 found /etc/vxfendg. Fri Feb 1 11:50:38 CET 2013 calling generate_disklist Fri Feb 1 11:50:38 CET 2013 Starting vxfen.. Done. Fri Feb 1 11:50:38 CET 2013 starting in vxfen-startup Fri Feb 1 11:50:38 CET 2013 calling regular vxfenconfig Fri Feb 1 11:50:38 CET 2013 return value from above operation is 1 Fri Feb 1 11:50:38 CET 2013 output was VXFEN vxfenconfig ERROR V-11-2-1003 At least three coordinator disks must be defined Log Buffer: 0xfffffffff4041090 refadm2-oss1{root} # cat /etc/vxfendg vxfendg2 and there are below mentioned two disks in vxfendg2 c0t5006016047201339d5s2 auto:sliced - (vxfendg2) online c0t5006016047201339d6s2 auto:sliced - (vxfendg2) online is it due to two disks in coordinator diskgroup? Is it a known issue ?Solved7.6KViews1like16CommentsVCS ERROR V-16-1-10600 Cannot connect to VCS engine
Ihave installed Symantec Storage Foundation v.4.0 for Linux, Veritas Cluster Servises 4.1 with oracle agents on Red Hat Enterprise Linux 4 update 8. I have installed two nodes in the cluster. Everything worked perfectly. There has been a power failure. After that one node has changed status to UNKNOWN. In log file: II tried to implement the recommendations inhttp://www.symantec.com/business/support/index?page=content&id=TECH54873 It did not help. What to do to startanother node (erpnode2)?Solved7.6KViews0likes4Commentsunable to deport: VxVM vxdg ERROR V-5-1-584 Disk group: Some volumes in the disk group are in use
hi all, i am getting this error: VxVM vxdg ERROR V-5-1-584 Disk group : Some volumes in the disk group are in use none of the volumes are mounted. even after vol stop all the volume in the diskgroup. it still failed to deport with the same error. any idea how to fix this without a reboot? VRTSvxvm-5.1.132.211-5.1SP1RP2P2HF11_RHEL5 Redhat version is 5.8 thanks inadvance. Regards, idaSolved7.4KViews1like5CommentsAgent failed in VCS
Agent are in failed status. Below are the messages in engineA.log file. Please let me knwo the cause of this issue VCS WARNING V-16-1-53025 Agent Script has faulted; ipm connection was lost; restarting the agent VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/Script/ScriptAgent please check file VCS WARNING V-16-1-53025 Agent NIC has faulted; ipm connection was lost; restarting the agent VCS ERROR V-16-1-10008 Agent NIC has faulted 6 times since VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/NIC/NICAgent please check file VCS WARNING V-16-10001-4028 (unix) IP:Unix-G1-IP:monitor:Empty NetMask is supplied, default netmask will be used. VCS WARNING V-16-1-10023 Agent DiskGroup not sending alive messages since VCS WARNING V-16-1-53025 Agent DiskGroup has faulted; ipm connection was lost; restarting the agent VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/DiskGroup/DiskGroupAgent please check fileSolved7.3KViews2likes4CommentsVeritas Cluster Server Heartbeat link down, jeapordy state..
Hello Everyone, I am having this problem with the VCS hearbeat links. The VCS are being run on a Solaris machine v440. The VCS version is 4.0 on Solaris 9, I know it's old& EOL. Im just hoping to find and pinpoint the soloution to this problem. The VCS heartbeat links are running on 2 seperate Vlans.This is a 2 node cluster. Recently the old switch was taken out and a new switch CISCO 3750 was added. The switch shows the cables are connected and I am able to see link up from the switch side. Thelinks in ce4 of both servers are not linking. Any ideas besides faulty VLAN? How do I test the communications on that particular VLAN? Here are the results of various commands, any help is apperciated! Thank you! #lltstat -n LLT node information: Node State Links 0 node1 OPEN 1 * 1 node2 OPEN 2 #lltstat -nvv|head LLT node information: Node State Link Status Address 0 node1 OPEN ce4 DOWN ce6 UP 00:03:BA:94:F8:61 * 1 node2 OPEN ce4 UP 00:03:BA:94:A4:6F ce6 UP 00:03:BA:94:A4:71 2 CONNWAIT ce4 DOWN #lltstat -n node information: Node LLT no State Links * 0 node1 OPEN 2 1 node2 OPEN 1 #lltstat -nvv|head LLT node information: Node State Link Status Address * 0 node1 OPEN ce4 UP 00:03:BA:94:F8:5F ce6 UP 00:03:BA:94:F8:61 1 node2 OPEN ce4 DOWN ce6 UP 00:03:BA:94:A4:71 2 CONNWAIT ce4 DOWN #gabconfig -a GAB Port Memberships =============================================================== Port a gen 49c917 membership 01 Port a gen 49c917 jeopardy ;1 Port h gen 49c91e membership 01 Port h gen 49c91e jeopardy ;7.2KViews1like18CommentsVCS with VxVM - newgroup is auto-disabled in cluster
Hi Guys, new to Veritas World. trying to create and Active/passive on two node VCS cluster with VxVM. Installed SFCFS filesets on two AIX nodes. I have a shared disk of 5G on both nodes. I'm able to deport Diskgroup and import on second node and mount filesystem. The error which I'm getting is - # hagrp -online newgroup -sys tiefaphap603 VCS WARNING V-16-1-10159 Group newgroup is auto-disabled in cluster. This can happen if group is not probed on all alive nodes in group's SystemList or VCS engine is not running on all alive nodes in group's SystemList. #vi main.cf include "OracleASMTypes.cf" include "types.cf" include "CFSTypes.cf" include "Db2udbTypes.cf" include "OracleTypes.cf" include "SybaseTypes.cf" cluster tstcls ( UserNames = { admin = bklJknKikEkqGfhFhh } ClusterAddress = "10.68.73.180" Administrators = { admin } ) system tiefaphap603 ( ) system tiefaphap604 ( ) group ClusterService ( SystemList = { tiefaphap603 = 0, tiefaphap604 = 1 } AutoStartList = { tiefaphap603, tiefaphap604 } OnlineRetryLimit = 3 OnlineRetryInterval = 120 ) IP webip ( Device = en0 Address = "10.68.73.180" NetMask = "255.255.255.0" ) NIC csgnic ( Device = en0 NetworkHosts @tiefaphap603 = { "10.68.73.205" } NetworkHosts @tiefaphap604 = { "10.68.73.184" } ) webip requires csgnic // resource dependency tree // // group ClusterService // { // IP webip // { // NIC csgnic // } // } group newgroup ( SystemList = { tiefaphap603 = 0, tiefaphap604 = 1 } AutoStartList = { tiefaphap603 } ) DiskGroup data_dg ( DiskGroup = clshareddg ) Mount mnt ( MountPoint = "/clsfs02" BlockDevice = "/dev/vx/dsk/clshareddg/clslv02" FSType = vxfs ) mnt requires data_dg // resource dependency tree // // group newgroup // { // Mount mnt // { // DiskGroup data_dg // } // }Solved6.2KViews0likes4Commentscannot configure vxfen after reboot
Hello, We move physically a server, and after reboot, we cannot configure vxfen. # vxfenconfig -c VXFEN vxfenconfig ERROR V-11-2-1002 Open failed for device: /dev/vxfen with error 2 my vxfen.log : Wed Aug 19 13:17:09 CEST 2015 Invoked vxfen. Starting Wed Aug 19 13:17:23 CEST 2015 return value from above operation is 1 Wed Aug 19 13:17:23 CEST 2015 output was VXFEN vxfenconfig ERROR V-11-2-1041 Snapshot for this node is different from that of the running cluster. Log Buffer: 0xffffffffa0c928a0 VXFEN vxfenconfig NOTICE Driver will use customized fencing - mechanism cps Wed Aug 19 13:17:23 CEST 2015 exiting with 1 Engine version 6.0.10.0 RHEL 6.3 any idea to help me running the vxfen (and the had after ... ) ?6KViews0likes7CommentsService group does not fail over on another node on force power down.
VCS 6.0.1 Hi i have configured a two node cluster with local storage and running two service groups. They both running fine and i am able to switch over them to any node on the cluster but when i forcely power down a node where both service groups are active, just one service group fails over to another node and the one running apache resource gets faild and do not fail over. below pasted the contents of main.cf file. ========================================== cat /etc/VRTSvcs/conf/config/main.cf include "OracleASMTypes.cf" include "types.cf" include "Db2udbTypes.cf" include "OracleTypes.cf" include "SybaseTypes.cf" cluster mycluster ( UserNames = { admin = IJKcJEjGKfKKiSKeJH, root = ejkEjiIhjKjeJh } ClusterAddress = "192.168.25.101" Administrators = { admin, root } ) system server3 ( ) system server4 ( ) group ClusterService ( SystemList = { server3 = 0, server4 = 1 } AutoStartList = { server3, server4 } OnlineRetryLimit = 3 OnlineRetryInterval = 120 ) IP webip ( Device = eth0 Address = "192.168.25.101" NetMask = "255.255.255.0" ) NIC csgnic ( Device = eth0 ) webip requires csgnic // resource dependency tree // // group ClusterService // { // IP webip // { // NIC csgnic // } // } group httpsg ( SystemList = { server3 = 0, server4 = 1 } AutoStartList = { server3, server4 } OnlineRetryLimit = 3 OnlineRetryInterval = 15 ) Apache apachenew ( httpdDir = "/usr/sbin" ConfigFile = "/etc/httpd/conf/httpd.conf" ) IP ipresource ( Device = eth0 Address = "192.168.25.102" NetMask = "255.255.255.0" ) apachenew requires ipresource // resource dependency tree // // group httpsg // { // Apache apachenew // { // IP ipresource // } // } # ===================== engine logs while the powerdown occurs says - 2013/03/12 16:33:02 VCS INFO V-16-1-10077 Received new cluster membership 2013/03/12 16:33:02 VCS NOTICE V-16-1-10112 System (server3) - Membership: 0x1, DDNA: 0x0 2013/03/12 16:33:02 VCS ERROR V-16-1-10079 System server4 (Node '1') is in Down State - Membership: 0x1 2013/03/12 16:33:02 VCS ERROR V-16-1-10322 System server4 (Node '1') changed state from RUNNING to FAULTED 2013/03/12 16:33:02 VCS NOTICE V-16-1-10449 Group httpsg autodisabled on node server4 until it is probed 2013/03/12 16:33:02 VCS NOTICE V-16-1-10449 Group VCShmg autodisabled on node server4 until it is probed 2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group ClusterService is offline on system server4 2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group httpsg is offline on system server4 2013/03/12 16:33:02 VCS ERROR V-16-1-10205 Group ClusterService is faulted on system server4 2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group ClusterService is offline on system server4 2013/03/12 16:33:02 VCS INFO V-16-1-10493 Evaluating server3 as potential target node for group ClusterService 2013/03/12 16:33:02 VCS INFO V-16-1-10493 Evaluating server4 as potential target node for group ClusterService 2013/03/12 16:33:02 VCS INFO V-16-1-10494 System server4 not in RUNNING state 2013/03/12 16:33:02 VCS NOTICE V-16-1-10301 Initiating Online of Resource webip (Owner: Unspecified, Group: ClusterService) on System server3 2013/03/12 16:33:02 VCS WARNING V-16-1-11141 LLT heartbeat link status changed. Previous status =eth1, UP; Current status =eth1, DOWN. 2013/03/12 16:33:02 VCS INFO V-16-6-15015 (server3) hatrigger:/opt/VRTSvcs/bin/triggers/sysoffline is not a trigger scripts directory or can not be executed 2013/03/12 16:33:14 VCS INFO V-16-1-10298 Resource webip (Owner: Unspecified, Group: ClusterService) is online on server3 (VCS initiated) 2013/03/12 16:33:14 VCS NOTICE V-16-1-10447 Group ClusterService is online on system server3 as per the above logs, the default SG ClusterService has been failed over to another node but SG httpsg faild. please suggest on it. Thanks....Solved5.5KViews1like16Comments