Fencing and Reservation Conflict
Hi to all I have redhat linux 5.9 64bit with SFHA 5.1 SP1 RP4 with fencing enable ( our storage device is IBM .Storwize V3700 SFF scsi3 compliant [root@mitoora1 ~]# vxfenadm -d I/O Fencing Cluster Information: ================================ Fencing Protocol Version: 201 Fencing Mode: SCSI3 Fencing SCSI3 Disk Policy: dmp Cluster Members: * 0 (mitoora1) 1 (mitoora2) RFSM State Information: node 0 in state 8 (running) node 1 in state 8 (running) ******************************************** in /etc/vxfenmode (scsi3_disk_policy=dmp and vxfen_mode=scsi3) vxdctl scsi3pr scsi3pr: on [root@mitoora1 etc]# more /etc/vxfentab # # /etc/vxfentab: # DO NOT MODIFY this file as it is generated by the # VXFEN rc script from the file /etc/vxfendg. # /dev/vx/rdmp/storwizev70000_000007 /dev/vx/rdmp/storwizev70000_000008 /dev/vx/rdmp/storwizev70000_000009 ****************************************** [root@mitoora1 etc]# vxdmpadm listctlr all CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME ===================================================== c0 Disk ENABLED disk c10 StorwizeV7000 ENABLED storwizev70000 c7 StorwizeV7000 ENABLED storwizev70000 c8 StorwizeV7000 ENABLED storwizev70000 c9 StorwizeV7000 ENABLED storwizev70000 main.cf cluster drdbonesales ( UserNames = { admin = hlmElgLimHmmKumGlj } ClusterAddress = "10.90.15.30" Administrators = { admin } UseFence = SCSI3 ) ********************************************** I configured the coordinator fencing so I have 3 lun in a veritas disk group ( dmp coordinator ) All seems works fine but I noticed a lot of reservation conflict in the messages of both nodes On the log of the server I am constantly these messages: /var/log/messages Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 9:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 9:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:3: reservation conflict You have any idea? Best Regards VincenzoSolved11KViews1like15CommentsVeritas Cluster Server Heartbeat link down, jeapordy state..
Hello Everyone, I am having this problem with the VCS hearbeat links. The VCS are being run on a Solaris machine v440. The VCS version is 4.0 on Solaris 9, I know it's old& EOL. Im just hoping to find and pinpoint the soloution to this problem. The VCS heartbeat links are running on 2 seperate Vlans.This is a 2 node cluster. Recently the old switch was taken out and a new switch CISCO 3750 was added. The switch shows the cables are connected and I am able to see link up from the switch side. Thelinks in ce4 of both servers are not linking. Any ideas besides faulty VLAN? How do I test the communications on that particular VLAN? Here are the results of various commands, any help is apperciated! Thank you! #lltstat -n LLT node information: Node State Links 0 node1 OPEN 1 * 1 node2 OPEN 2 #lltstat -nvv|head LLT node information: Node State Link Status Address 0 node1 OPEN ce4 DOWN ce6 UP 00:03:BA:94:F8:61 * 1 node2 OPEN ce4 UP 00:03:BA:94:A4:6F ce6 UP 00:03:BA:94:A4:71 2 CONNWAIT ce4 DOWN #lltstat -n node information: Node LLT no State Links * 0 node1 OPEN 2 1 node2 OPEN 1 #lltstat -nvv|head LLT node information: Node State Link Status Address * 0 node1 OPEN ce4 UP 00:03:BA:94:F8:5F ce6 UP 00:03:BA:94:F8:61 1 node2 OPEN ce4 DOWN ce6 UP 00:03:BA:94:A4:71 2 CONNWAIT ce4 DOWN #gabconfig -a GAB Port Memberships =============================================================== Port a gen 49c917 membership 01 Port a gen 49c917 jeopardy ;1 Port h gen 49c91e membership 01 Port h gen 49c91e jeopardy ;7.2KViews1like18CommentsService group does not fail over on another node on force power down.
VCS 6.0.1 Hi i have configured a two node cluster with local storage and running two service groups. They both running fine and i am able to switch over them to any node on the cluster but when i forcely power down a node where both service groups are active, just one service group fails over to another node and the one running apache resource gets faild and do not fail over. below pasted the contents of main.cf file. ========================================== cat /etc/VRTSvcs/conf/config/main.cf include "OracleASMTypes.cf" include "types.cf" include "Db2udbTypes.cf" include "OracleTypes.cf" include "SybaseTypes.cf" cluster mycluster ( UserNames = { admin = IJKcJEjGKfKKiSKeJH, root = ejkEjiIhjKjeJh } ClusterAddress = "192.168.25.101" Administrators = { admin, root } ) system server3 ( ) system server4 ( ) group ClusterService ( SystemList = { server3 = 0, server4 = 1 } AutoStartList = { server3, server4 } OnlineRetryLimit = 3 OnlineRetryInterval = 120 ) IP webip ( Device = eth0 Address = "192.168.25.101" NetMask = "255.255.255.0" ) NIC csgnic ( Device = eth0 ) webip requires csgnic // resource dependency tree // // group ClusterService // { // IP webip // { // NIC csgnic // } // } group httpsg ( SystemList = { server3 = 0, server4 = 1 } AutoStartList = { server3, server4 } OnlineRetryLimit = 3 OnlineRetryInterval = 15 ) Apache apachenew ( httpdDir = "/usr/sbin" ConfigFile = "/etc/httpd/conf/httpd.conf" ) IP ipresource ( Device = eth0 Address = "192.168.25.102" NetMask = "255.255.255.0" ) apachenew requires ipresource // resource dependency tree // // group httpsg // { // Apache apachenew // { // IP ipresource // } // } # ===================== engine logs while the powerdown occurs says - 2013/03/12 16:33:02 VCS INFO V-16-1-10077 Received new cluster membership 2013/03/12 16:33:02 VCS NOTICE V-16-1-10112 System (server3) - Membership: 0x1, DDNA: 0x0 2013/03/12 16:33:02 VCS ERROR V-16-1-10079 System server4 (Node '1') is in Down State - Membership: 0x1 2013/03/12 16:33:02 VCS ERROR V-16-1-10322 System server4 (Node '1') changed state from RUNNING to FAULTED 2013/03/12 16:33:02 VCS NOTICE V-16-1-10449 Group httpsg autodisabled on node server4 until it is probed 2013/03/12 16:33:02 VCS NOTICE V-16-1-10449 Group VCShmg autodisabled on node server4 until it is probed 2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group ClusterService is offline on system server4 2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group httpsg is offline on system server4 2013/03/12 16:33:02 VCS ERROR V-16-1-10205 Group ClusterService is faulted on system server4 2013/03/12 16:33:02 VCS NOTICE V-16-1-10446 Group ClusterService is offline on system server4 2013/03/12 16:33:02 VCS INFO V-16-1-10493 Evaluating server3 as potential target node for group ClusterService 2013/03/12 16:33:02 VCS INFO V-16-1-10493 Evaluating server4 as potential target node for group ClusterService 2013/03/12 16:33:02 VCS INFO V-16-1-10494 System server4 not in RUNNING state 2013/03/12 16:33:02 VCS NOTICE V-16-1-10301 Initiating Online of Resource webip (Owner: Unspecified, Group: ClusterService) on System server3 2013/03/12 16:33:02 VCS WARNING V-16-1-11141 LLT heartbeat link status changed. Previous status =eth1, UP; Current status =eth1, DOWN. 2013/03/12 16:33:02 VCS INFO V-16-6-15015 (server3) hatrigger:/opt/VRTSvcs/bin/triggers/sysoffline is not a trigger scripts directory or can not be executed 2013/03/12 16:33:14 VCS INFO V-16-1-10298 Resource webip (Owner: Unspecified, Group: ClusterService) is online on server3 (VCS initiated) 2013/03/12 16:33:14 VCS NOTICE V-16-1-10447 Group ClusterService is online on system server3 as per the above logs, the default SG ClusterService has been failed over to another node but SG httpsg faild. please suggest on it. Thanks....Solved5.5KViews1like16CommentsError During VCS configuration
Hi All, I have setup VCS Lab on my own laptop Environment is as follows: Veritas Workstation 8.0 2 x VMS with 1.5 GB RAM each, Solaris 10 U9 64-bit OS. With All pre-requisites like 3 NICS. 1 for Public interface 2 for Private interface i.e. for I/O fencing. Authentication without root password. for shared Storage I have created one more VM with Solaris 10 OS and few disks are shared using iscsi protocol to two node cluster. I am able to install x_86 VCS 5.1 on this two node cluster sucessfully. But when I am trying to configure cluster using ./installvcs -configure I observed following error : Veritas Cluster Server 5.1 SP1 Configure Program [m [7m prod dev [m [1m 1[0m) Configure heartbeat links using LLT over Ethernet [1m 2[0m) Configure heartbeat links using LLT over UDP [1m 3[0m) Automatically detect configuration for LLT over Ethernet [1m b[0m) Back to previous menu [1mHow would you like to configure heartbeat links? [1-3,b,q,?] (3)[0m 1 Discovering NICs on prod ............................................................................................ Discovered e1000g0 e1000g1 e1000g2 e1000g3 [1mEnter the NIC for the first private heartbeat link on prod: [b,q,?] (e1000g2)[0m [1mWould you like to configure a second private heartbeat link? [y,n,q,b,?] (n)[0m y [1mEnter the NIC for the second private heartbeat link on prod: [b,q,?] (e1000g3)[0m [1mWould you like to configure a third private heartbeat link? [y,n,q,b,?] (n)[0m [1mDo you want to configure an additional low-priority heartbeat link? [y,n,q,b,?] (n)[0m [1mAre you using the same NICs for private heartbeat links on all systems? [y,n,q,b,?] (y)[0m Can't use string ("VCS51") as a HASH ref while "strict refs" in use at ../scripts/CPIP/Prod/VCS51.pm line 3870, <STDIN> line 10. I tried to google above problem but could not find the answer , if anybody encounter such error and resolved this pls help me to resolve. For any further info. require pls reply. Thanks & Regards AnishSolved4.9KViews2likes17CommentsHow to recover from failed attempt to switch to a different node in cluster
Hello everyone. I have a two node cluster to serve as an Oracle database server. I have the Oracle binaries installed on disks local to each of the nodes (so they are outside the control of the Cluster Manager). I have a diskgroup which is three 1TB LUNs from my SAN, six volumes on the diskgroup (u02 through u07), six mount points (/u02 through /u07), a database listener and the actual Oracle database. I was able to successfully manually bring up these individual components and confirmed that the database was up an running. I then tried a "Switch To" operation to see if everything would mode to the other node of the cluster. It turns out this was a bad idea. Within the Cluster Manager gui, the diskgroup has a state of Online, Istate of "Waiting to go offline propogate" and Flag of "Unable to offline". The volumes show as "Offline on all systems" but the mounts still show as online with "Status Unknown". When I try to take the mount points offline, I get the message "VCS ERROR V-16-1-10277 The Service Group i1025prd to which Resource Mnt_scratch belongs has failed or switch, online, and offline operations are prohibited." Can anyone tell me how I can fix this? KenSolved4.9KViews1like17CommentsVCS dependency question - Service group is not runninng on the intended node upon cluster startup.
Dear All, I have a questions regarding the service group dependency. I have two parent service group "Group1" and "Group2" . They are dependent on a child service group "ServerGroup1_DG" which has been configured as parallel service. In the main.cf file, I have configured the service group "Group1" to be started on Node1 and service group "Group2" to be started on Node2. However, I don't know why the cluster do not start the service group "ServerGroup1_DG" on Node2 before starting the service group "Group2". During the cluster startup, the cluster evaluated Node1 to be a target node for service group "Group2", and it went on to online the service group on Node1. This is not the wanted behaviour. I would like to have the service group "Group1" running on Node1 and service group "Group2" running on Node2 when the cluster starts up. Can anyone help to shed some light on how I can solve this problem? Thanks. p/s: The VCS version is 5.0 on solaris platform. main.cf:- group Group1 ( SystemList = { Node1 = 1, Node2 = 2 } AutoStartList = { Node1, Node2 } ) requires group ServerGroup1_DG online local firm group Group2 ( SystemList = { Node1 = 2, Node2 = 1 } AutoStartList = { Node2, Node1 } ) requires group ServerGroup1_DG online local firm group ServerGroup1_DG ( SystemList = { Node1 = 0, Node2 = 1 } AutoFailOver = 0 Parallel = 1 AutoStartList = { Node1, Node2 } ) CFSMount cfsmount2 ( Critical = 0 MountPoint = "/var/opt/xxxx/ServerGroup1" BlockDevice = "/dev/vx/dsk/xxxxdg/vol01" MountOpt @Node1 = "cluster" MountOpt @Node2 = "cluster" NodeList = { Node1, Node2 } ) CVMVolDg cvmvoldg2 ( Critical = 0 CVMDiskGroup = xxxxdg CVMActivation @Node1 = sw CVMActivation @Node2 = sw ) requires group cvm online local firm cfsmount2 requires cvmvoldg2 Engine Log:- 2010/06/25 16:05:47 VCS NOTICE V-16-1-10447 Group ServerGroup1_DG is online on system Node1 2010/06/25 16:05:47 VCS WARNING V-16-1-50045 Initiating online of parent group Group3, PM will select the best node 2010/06/25 16:05:47 VCS WARNING V-16-1-50045 Initiating online of parent group Group2, PM will select the best node 2010/06/25 16:05:47 VCS WARNING V-16-1-50045 Initiating online of parent group Group1, PM will select the best node 2010/06/25 16:05:47 VCS INFO V-16-1-10493 Evaluating Node2 as potential target node for group Group3 2010/06/25 16:05:47 VCS INFO V-16-1-10163 Group dependency is not met if group Group3 goes online on system Node2 2010/06/25 16:05:47 VCS INFO V-16-1-10493 Evaluating Node1 as potential target node for group Group3 2010/06/25 16:05:47 VCS INFO V-16-1-10493 Evaluating Node2 as potential target node for group Group2 2010/06/25 16:05:47 VCS INFO V-16-1-10163 Group dependency is not met if group Group2 goes online on system Node2 2010/06/25 16:05:47 VCS INFO V-16-1-10493 Evaluating Node1 as potential target node for group Group2 2010/06/25 16:06:15 VCS NOTICE V-16-1-10447 Group ServerGroup1_DG is online on system Node2 Regards, RyanSolved4.7KViews0likes2Commentslltconfig ERROR V-14-2-15121 LLT unconfigure aborted,
Hello, I've a 2 node cluster, with on CP server and 2 fencing disks. But think something is wrong on my fencing configuration. OS : RHEL6.6 VCS 6.2.1 # gabconfig -a GAB Port Memberships =============================================================== Port a gen 816e15 membership 01 Port a gen 816e15 jeopardy ;1 Port b gen 816e1b membership 01 Port b gen 816e1b jeopardy ;1 Port h gen 816e1e membership 01 Port h gen 816e1e jeopardy ;1 # lltconfig -a list Link 0 (bond0): Node 0 Node01 : 00:XX:XX:XX:XX:YY Node 1 Node02 : 00:XX:XX:XX:XX:YX permanent # cat /etc/llttab set-node Node02 set-cluster 229 link bond0 bond0 - ether - - link eth2 eth-00:11:AA:77:BB:22- ether - - I don't have a eth2 on my server, I tried to remove the second line of my/etc/llttab and stop and restart # /etc/init.d/llt stop Stopping LLT: LLT lltconfig ERROR V-14-2-15121 LLT unconfigure aborted, unregister 4 port(s) LLT:Warning: lltconfig failed. Retrying [1] LLT lltconfig ERROR V-14-2-15121 LLT unconfigure aborted, unregister 4 port(s) LLT:Warning: lltconfig failed. Retrying [2] LLT lltconfig ERROR V-14-2-15121 LLT unconfigure aborted, unregister 4 port(s) LLT:Warning: lltconfig failed. Retrying [3] LLT lltconfig ERROR V-14-2-15121 LLT unconfigure aborted, unregister 4 port(s) LLT:Warning: lltconfig failed. Retrying [4] LLT lltconfig ERROR V-14-2-15121 LLT unconfigure aborted, unregister 4 port(s) LLT:Warning: lltconfig failed. Retrying [5] LLT:Error: lltconfig failed Any suggestions will be welcome ... thanks CédricSolved4.6KViews0likes4CommentsReset Password of VCS User in Secure Cluster.
Dear All, Greetings! Recently i have added a VCS user in one of our VCS Cluster which in running in secure mode to access the Java Console. However i did not find any option to set a password for the same. After going thru some article i tried to #hauser -update <user name> however this option is not available for hauser command. I need your support to resolve the issue. VCS Version : 6.1 OS : RHEL 5.8 x86_64 I have followed below steps as of now. #haconf -makerw #hauser -add Admin -priv Administrator #hauser -update Admin ]# hauser -update Admin VCS WARNING V-16-1-10603 Unknown option: -update VCS INFO V-16-1-10601 Usage: hauser -add <name> [-priv <Administrator|Operator|Guest> [-group <group(s)>]] hauser -addpriv <name> <Administrator|Operator|Guest> [-group <group(s)>] hauser -delpriv <name> <Administrator|Operator|Guest> [-group <group(s)>] hauser -addpriv <name> <AdministratorGroup|OperatorGroup> [-group <group(s)>] hauser -delpriv <name> <AdministratorGroup|OperatorGroup> [-group <group(s)>] hauser -delete <name> hauser -display [<name>] hauser -list hauser -help Thanks.Solved4.6KViews0likes5CommentsVCS Setup - "Install A Clustered Master Server" Greyed Out
I originally posted this in the NetBackup forum but it seems to be a VCS issue that is the root of the problem: https://www-secure.symantec.com/connect/forums/install-clustered-master-server-greyed-out I am trying to set up a test NBU 7.6.0.1 master server cluster with VCS on Windows Server 2008 R2but the option to "install a clustered master server" is greyed out. I have built2 new servers, installed SFHA / VCS 6.0.1and configured disks / volumes / disk groups / VVR replication. The VCS installation has not been modified or customised in any way as from my understanding the NBU installation will create its own service group. I have carried out the initial configuration and added the remote cluster node. Within cluster explorer, the defaultClusterService service group is all up and running, the global cluster option is enabled andthe remote cluster status also shows that the other cluster is up. Marianne has alreadyasked me to run two commands with the following output: 'gabconfig -a' - "GAB gabconfig ERROR V-15-2-25022" lltconfig -nvv - "LLT lltconfig ERROR V-14-2-15000 open \\.\llt failed: error number: 0x2" I've probably just made a silly mistake somewherebut I can't figure outwhat piece of the puzzle I have missed - any ideas?Solved4.5KViews1like37CommentsNetApp snapmirror resource not getting online with VCS integration
Dear All, I would like to have some help to identify following error while onlining a NetApp SnapMirror agent that was intergrated with VCS. The filer name and its credential s were given properly and even i can able to ssh it from the servers but for some reason the resource is not getting online and shows following error in the engine.log 2014/09/30 15:02:44 VCS WARNING V-16-20059-1002 (node)NetAppSnapMirror:testSM:online:Encountered errors while decrypting password! 2014/09/30 15:02:44 VCS ERROR V-16-20059-1000 (node11) NetAppSnapMirror:testSM:online:ONTAPI 'system-get-version' failed on filer node1.sys Error : in Zapi::invoke, cannot connect to socket Actually i am doing global Clustering configuration where i have two node cluster and a filer at primary site, In dr i have a single node cluster and a filer. the replication is async. Any support highly appreacited. Thanks. Uvi.4.5KViews1like12Comments