Fencing and Reservation Conflict
Hi to all I have redhat linux 5.9 64bit with SFHA 5.1 SP1 RP4 with fencing enable ( our storage device is IBM .Storwize V3700 SFF scsi3 compliant [root@mitoora1 ~]# vxfenadm -d I/O Fencing Cluster Information: ================================ Fencing Protocol Version: 201 Fencing Mode: SCSI3 Fencing SCSI3 Disk Policy: dmp Cluster Members: * 0 (mitoora1) 1 (mitoora2) RFSM State Information: node 0 in state 8 (running) node 1 in state 8 (running) ******************************************** in /etc/vxfenmode (scsi3_disk_policy=dmp and vxfen_mode=scsi3) vxdctl scsi3pr scsi3pr: on [root@mitoora1 etc]# more /etc/vxfentab # # /etc/vxfentab: # DO NOT MODIFY this file as it is generated by the # VXFEN rc script from the file /etc/vxfendg. # /dev/vx/rdmp/storwizev70000_000007 /dev/vx/rdmp/storwizev70000_000008 /dev/vx/rdmp/storwizev70000_000009 ****************************************** [root@mitoora1 etc]# vxdmpadm listctlr all CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME ===================================================== c0 Disk ENABLED disk c10 StorwizeV7000 ENABLED storwizev70000 c7 StorwizeV7000 ENABLED storwizev70000 c8 StorwizeV7000 ENABLED storwizev70000 c9 StorwizeV7000 ENABLED storwizev70000 main.cf cluster drdbonesales ( UserNames = { admin = hlmElgLimHmmKumGlj } ClusterAddress = "10.90.15.30" Administrators = { admin } UseFence = SCSI3 ) ********************************************** I configured the coordinator fencing so I have 3 lun in a veritas disk group ( dmp coordinator ) All seems works fine but I noticed a lot of reservation conflict in the messages of both nodes On the log of the server I am constantly these messages: /var/log/messages Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 9:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 9:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:3: reservation conflict You have any idea? Best Regards VincenzoSolved11KViews1like15CommentsVCS ERROR V-16-1-10600 Cannot connect to VCS engine
I have installed Symantec Storage Foundation v.4.0 for Linux, Veritas Cluster Servises 4.1 with oracle agents on Red Hat Enterprise Linux 4 update 8. I have installed two nodes in the cluster. Everything worked perfectly. There has been a power failure. After that one node has changed status to UNKNOWN. In log file: I I tried to implement the recommendations in http://www.symantec.com/business/support/index?page=content&id=TECH54873 It did not help. What to do to start another node (erpnode2)?Solved7.7KViews0likes4CommentsVVR Replication configuration
HI Guys, I am configuring the replication with our 2 node cluster using cluster. but when adding the node is giving the error. i am not able to get proper logs also in engine.log and RVG.log. please help. [root@AOSCEDA01 ~]# vradmin -g jceda_dg addsec jceda_rvg AOSCEDA01 AOSCEDA02 prlink=to_AOSCEDA01 srlink=to_AOSCEDA02 VxVM VVR vradmin ERROR V-5-52-417 RVG jceda_rvg already exists in disk group jceda_dg. VxVM VVR vradmin ERROR V-5-52-802 Cannot start command execution on Secondary. [root@AOSCEDA01 ~]#Solved5.9KViews0likes33CommentsService group does not fail over on another node on force power down.
VCS 6.0.1 Hi i have configured a two node cluster with local storage and running two service groups. They both running fine and i am able to switch over them to any node on the cluster but when i forcely power down a node where both service groups are active, just one service group fails over to another node and the one running apache resource gets faild and do not fail over. below pasted the contents of main.cf file. ========================================== cat /etc/VRTSvcs/conf/config/main.cf include "OracleASMTypes.cf" include "types.cf" include "Db2udbTypes.cf" include "OracleTypes.cf" include "SybaseTypes.cf" cluster mycluster ( UserNames = { admin = IJKcJEjGKfKKiSKeJH, root = ejkEjiIhjKjeJh } ClusterAddress = "192.168.25.101" Administrators = { admin, root } ) system server3 ( ) system server4 ( ) group ClusterService ( SystemList = { server3 = 0, server4 = 1 } AutoStartList = { server3, server4 } OnlineRetryLimit = 3 OnlineRetryInterval = 120 ) IP webip ( Device = eth0 Address = "192.168.25.101" NetMask = "255.255.255.0" ) NIC csgnic ( Device = eth0 ) webip requires csgnic // resource dependency tree // // group ClusterService // { // IP webip // { // NIC csgnic // } // } group httpsg ( SystemList = { server3 = 0, server4 = 1 } AutoStartList = { server3, server4 } OnlineRetryLimit = 3 OnlineRetryInterval = 15 ) Apache apachenew ( httpdDir = "/usr/sbin" ConfigFile = "/etc/httpd/conf/httpd.conf" ) IP ipresource ( Device = eth0 Address = "192.168.25.102" NetMask = "255.255.255.0" ) apachenew requires ipresource // resource dependency tree // // group httpsg // { // Apache apachenew // { // IP ipresource // } // } # ===================== engine logs while the powerdown occurs says - 2013/03/12 16:33:02 VCS INFO V-16-1-10077 Received new cluster membership 2013/03/12 16:33:02 VCS NOTICE V-16-1-10112 System (server3) - Membership: 0x1, DDNA: 0x0 2013/03/12 16:33:02 VCS ERROR V-16-1-10079 System server4 (Node '1') is in Down State - Membership: 0x1 2013/03/12 16:33:02 VCS ERROR V-16-1-10322 System server4 (Node '1') changed state from RUNNING to FAULTED 2013/03/12 16:33:02 VCS NOTICE V-16-1-10449 Group httpsg autodisabled on node server4 until it is probed 2013/03/12 16:33:02 VCS NOTICE V-16-1-10449 Group VCShmg autodisabled on node server4 until it is probed 2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group ClusterService is offline on system server4 2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group httpsg is offline on system server4 2013/03/12 16:33:02 VCS ERROR V-16-1-10205 Group ClusterService is faulted on system server4 2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group ClusterService is offline on system server4 2013/03/12 16:33:02 VCS INFO V-16-1-10493 Evaluating server3 as potential target node for group ClusterService 2013/03/12 16:33:02 VCS INFO V-16-1-10493 Evaluating server4 as potential target node for group ClusterService 2013/03/12 16:33:02 VCS INFO V-16-1-10494 System server4 not in RUNNING state 2013/03/12 16:33:02 VCS NOTICE V-16-1-10301 Initiating Online of Resource webip (Owner: Unspecified, Group: ClusterService) on System server3 2013/03/12 16:33:02 VCS WARNING V-16-1-11141 LLT heartbeat link status changed. Previous status =eth1, UP; Current status =eth1, DOWN. 2013/03/12 16:33:02 VCS INFO V-16-6-15015 (server3) hatrigger:/opt/VRTSvcs/bin/triggers/sysoffline is not a trigger scripts directory or can not be executed 2013/03/12 16:33:14 VCS INFO V-16-1-10298 Resource webip (Owner: Unspecified, Group: ClusterService) is online on server3 (VCS initiated) 2013/03/12 16:33:14 VCS NOTICE V-16-1-10447 Group ClusterService is online on system server3 as per the above logs, the default SG ClusterService has been failed over to another node but SG httpsg faild. please suggest on it. Thanks....Solved5.6KViews1like16CommentsCVM won't start on remote node with an FSS diskgroup
I am testing FSS (Flexible Shared Storage) on SF 6.1 on RH 5.5 in a Virtual Box VM and when I try to start CVM on the remote node I get: VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster reason: Disk for disk group not found: retry to add a node failed Here is my setup: Node A is master with a local (sdd) and remote disk (B_sdd) [root@r55v61a ~]# vxdctl -c mode mode: enabled: cluster active - MASTER master: r55v61a [root@r55v61a ~]# vxdisk list DEVICE TYPE DISK GROUP STATUS B_sdd auto:cdsdisk - - online remote sda auto:none - - online invalid sdb auto:none - - online invalid sdc auto:cdsdisk - - online sdd auto:cdsdisk - - online exported Node B is the slave, and sees local (sdd) and remote disk (A_sdd) [root@r55v61b ~]# vxdisk list DEVICE TYPE DISK GROUP STATUS A_sdd auto:cdsdisk - - online remote sda auto:none - - online invalid sdb auto:none - - online invalid sdc auto:cdsdisk - - online sdd auto:cdsdisk - - online exported On node A, I add an FSS diskgroup, so on node A the disk is local [root@r55v61a ~]# vxdg -s -o fss=on init fss-dg fd1_La=sdd [root@r55v61a ~]# vxdisk list DEVICE TYPE DISK GROUP STATUS B_sdd auto:cdsdisk - - online remote sda auto:none - - online invalid sdb auto:none - - online invalid sdc auto:cdsdisk - - online sdd auto:cdsdisk fd1_La fss-dg online exported shared And on node B the disk in fss-dg is remote [root@r55v61b ~]# vxdisk list DEVICE TYPE DISK GROUP STATUS A_sdd auto:cdsdisk fd1_La fss-dg online shared remote sda auto:none - - online invalid sdb auto:none - - online invalid sdc auto:cdsdisk - - online sdd auto:cdsdisk - - online exported I then stop and start VCS on node B which is when I see the issue: 2014/05/13 12:05:23 VCS INFO V-16-2-13716 (r55v61b) Resource(cvm_clus): Output of the completed operation (online) ============================================== ERROR: ============================================== 2014/05/13 12:05:24 VCS ERROR V-16-20006-1005 (r55v61b) CVMCluster:cvm_clus:monitor:node - state: out of cluster reason: Disk for disk group not found: retry to add a node failed If I destroy fss-dg diskgroup on node A, then CVM will start on node B, so issue is the FSS diskgroup where it seems CVM cannot find the remote disk in the diskgroup I can also get round issue by stopping VCS on node A and then CVM will start on node B: [root@r55v61b ~]# hagrp -online cvm -sys r55v61b [root@r55v61b ~]# vxdisk -o alldgs list DEVICE TYPE DISK GROUP STATUS sda auto:none - - online invalid sdb auto:none - - online invalid sdc auto:cdsdisk - - online sdd auto:cdsdisk - - online exported If I then start VCS on node A, then B is able to see the FSS diskgroup: [root@r55v61b ~]# vxdisk list DEVICE TYPE DISK GROUP STATUS A_sdd auto:cdsdisk fd1_La fss-dg online shared remote sda auto:none - - online invalid sdb auto:none - - online invalid sdc auto:cdsdisk - - online sdd auto:cdsdisk - - online exported I can stop and start VCS on each node when disks are just exported and VCS is able to see disk from other node, but when I create the FSS diskgroup, CVM won't start on the system that has the remote disk - does anybody have any ideas as to why? MikeSolved5.5KViews1like21CommentsIP agent for same mac address interface
Hi all, Our environment as the following: OS: redhat 6.5 VCS: VCS 6.2.1 Our server have two physical network port, namely eth0 and eth1. We do create tagged vlan, vlan515, vlan516, vlan518, vlan520 based on eth0 and eth1. We are able to create resource IP on vlan518 and failover between two nodes. However, when we create resource IP on vlan515, it is not able to bring it online. According to the link, https://support.symantec.com/en_US/article.TECH214469.html, It knows that duplicate mac address would cause the problem. However, it can't figure out where "MACAddress" attribute in VCS Java Console as mentioned in the solution. I did manually add "MACAddress" attribute on main.cf on either NIC or IP resource, it come with not support with haconf -verify command. Any hints or solution for the problem when configure the IP agent resource on same mac address? Thanks, XentarSolved5.5KViews0likes22CommentsAgent exiting in vcs 6.1 on rhel6.6
Hi All, i have 10 nodes vcs cluster setup on rhel 6.6 ,here 28 service groups are running . some are parallel and mostly are failover service groups i am getting below error around work Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10121]: VCS ERROR V-16-2-13120 Thread(4151479088) Error receiving from the engine. Agent(LVMVolumeGroup) is exiting Jun 19 05:36:50 HTNDPUEDSVC01 t of memory [5967] Jun 19 05:36:50 HTNDPUEDSVC01 lloc() FAILED at file Memory.C, line 376, memory could not be allocated. Process ID 5967 dumping core deliberately! Jun 19 05:37:12 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 22 sec Jun 19 05:37:13 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 23 sec Jun 19 05:37:14 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 24 sec Jun 19 05:37:15 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 25 sec Jun 19 05:37:16 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 26 sec Jun 19 05:37:17 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 27 sec Jun 19 05:37:18 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 28 sec Jun 19 05:37:19 HTNDPUEDSVC01 abrt[14811]: Write error: No space left on device Jun 19 05:37:19 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 5967 inactive 29 sec Jun 19 05:37:19 HTNDPUEDSVC01 abrt[14811]: Error writing '/var/spool/abrt/ccpp-2016-06-19-05:36:50-5967.new/coredump' Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20058 Port h[GAB_USER_CLIENT (refcount 0)] process 5967: heartbeat failed, killing process Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20059 Port h[GAB_USER_CLIENT (refcount 0)] heartbeat interval 30000 msec. Statistics: Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 0 ~ 6000 msec: 75483871 Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 6000 ~ 12000 msec: 0 Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 12000 ~ 18000 msec: 0 Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 18000 ~ 24000 msec: 0 Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 24000 ~ 30000 msec: 0 Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20088 System information: Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20089 number of cpu: 16 Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20090 physical memory: 49426504 K Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20091 free memory: 5399360 K Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20041 Port h: client process failure: killing process Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB WARNING V-15-1-20161 Port h client process killed, GAB will initiate regmon action syslog after 200 sec Jun 19 05:37:20 HTNDPUEDSVC01 kernel: GAB INFO V-15-1-20032 Port h closed Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10130]: VCS ERROR V-16-2-13120 Thread(4151511776) Error receiving from the engine. Agent(HostMonitor) is exiting. Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10121]: VCS ERROR V-16-2-13120 Thread(4151479088) Error receiving from the engine. Agent(LVMVolumeGroup) is exiting. Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10133]: VCS ERROR V-16-2-13120 Thread(4151663328) Error receiving from the engine. Agent(Proxy) is exiting. Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10116]: VCS ERROR V-16-2-13120 Thread(4151875280) Error receiving from the engine. Agent(Application) is exiting. Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10128]: VCS ERROR V-16-2-13120 Thread(4151543504) Error receiving from the engine. Agent(VMwareDisks) is exiting. Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10120]: VCS ERROR V-16-2-13120 Thread(4151569120) Error receiving from the engine. Agent(IP) is exiting. Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10122]: VCS ERROR V-16-2-13120 Thread(4151507680) Error receiving from the engine. Agent(LVMLogicalVolume) is exiting. Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10124]: VCS ERROR V-16-2-13120 Thread(4151524144) Error receiving from the engine. Agent(Phantom) is exiting. Jun 19 05:37:20 HTNDPUEDSVC01 AgentFramework[10126]: VCS ERROR V-16-2-13120 Thread(4152023856) Error receiving from the engine. Agent(NIC) is exiting. Jun 19 05:37:20 HTNDPUEDSVC01 hashadow[4005]: VCS ERROR V-16-1-11103 VCS exited. It will restart Please suggest what action i should take at this place of errorSolved5.3KViews0likes2CommentsVeritas cluster is not starting
Hi , We have veritas cluster setup for oracle database Engine Ver : 5.1.10.0 On Linux Red hat 5.(2 server) The Server is undegoing domain migration. what steps shall be necessary to be done on VCS end. Thanks the VCS is not starting, hastart did not respond. while server rebooted below are the logs that show , Iam not that aquintant with VCS , please advise. $ /opt/VRTSvcs/bin/hastatus -sum VCS WARNING V-16-1-10641 IpmHandle::open Cannot create AF_INET6 socket. errno = 97 VCS ERROR V-16-1-10600 Cannot connect to VCS engine VCS WARNING V-16-1-11046 Local system not available log at /var/VRTSvcs/log/engine_A.log 2016/01/24 19:21:15 VCS NOTICE V-16-1-11022 VCS engine (had) started 2016/01/24 19:21:15 VCS NOTICE V-16-1-11050 VCS engine version=5.1 2016/01/24 19:21:15 VCS NOTICE V-16-1-11051 VCS engine join version=5.1.10.0 2016/01/24 19:21:15 VCS NOTICE V-16-1-11052 VCS engine pstamp=5.1.100.000-5.1SP1GA-2010-09-30_23.30.00 2016/01/24 19:21:15 VCS INFO V-16-1-10196 Cluster logger started 2016/01/24 19:21:15 VCS NOTICE V-16-1-10114 Opening GAB library 2016/01/24 19:21:15 VCS NOTICE V-16-1-10619 'HAD' starting on: vdalpxorap002 2016/01/24 19:21:15 VCS INFO V-16-1-51138 Number of processors configured on this system are 32 2016/01/24 19:21:15 VCS WARNING V-16-1-51140 In a multi-CPU system, configure an adequately high value for the ShutdownTimeout attribute. This ensures that when a system panics, its service groups successfully fail over to other systems. For more information, refer to the VCS Administrator's Guide 2016/01/24 19:21:15 VCS WARNING V-16-1-10543 IpmServer::open Cannot create socket errno = 97 2016/01/24 19:21:16 VCS INFO V-16-1-10125 GAB timeout set to 30000 ms 2016/01/24 19:21:16 VCS NOTICE V-16-1-11057 GAB registration monitoring timeout set to 200000 ms 2016/01/24 19:21:16 VCS NOTICE V-16-1-11059 GAB registration monitoring action set to log system message 2016/01/24 19:21:30 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding 2016/01/24 20:23:59 VCS INFO V-16-1-10196 Cluster logger started 2016/01/24 20:23:59 VCS NOTICE V-16-1-11022 VCS engine (had) started 2016/01/24 20:23:59 VCS NOTICE V-16-1-11050 VCS engine version=5.1 2016/01/24 20:23:59 VCS NOTICE V-16-1-11051 VCS engine join version=5.1.10.0 2016/01/24 20:23:59 VCS NOTICE V-16-1-11052 VCS engine pstamp=5.1.100.000-5.1SP1GA-2010-09-30_23.30.00 2016/01/24 20:23:59 VCS NOTICE V-16-1-10114 Opening GAB library 2016/01/24 20:23:59 VCS NOTICE V-16-1-10619 'HAD' starting on: vdalpxorap002 2016/01/24 20:23:59 VCS INFO V-16-1-51138 Number of processors configured on this system are 32 2016/01/24 20:23:59 VCS WARNING V-16-1-51140 In a multi-CPU system, configure an adequately high value for the ShutdownTimeout attribute. This ensures that when a system panics, its service groups successfully fail over to other systems. For more information, refer to the VCS Administrator's Guide 2016/01/24 20:23:59 VCS WARNING V-16-1-10543 IpmServer::open Cannot create socket errno = 97 2016/01/24 20:23:59 VCS INFO V-16-1-10125 GAB timeout set to 30000 ms 2016/01/24 20:23:59 VCS NOTICE V-16-1-11057 GAB registration monitoring timeout set to 200000 ms 2016/01/24 20:23:59 VCS NOTICE V-16-1-11059 GAB registration monitoring action set to log system message 2016/01/24 20:24:14 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding 2016/01/25 00:18:45 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2016/01/25 04:18:46 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2016/01/25 08:18:49 VCS INFO V-16-1-53504 VCS Engine Alive message!!Solved5KViews0likes5Comments