Fencing and Reservation Conflict
Hi to all I have redhat linux 5.9 64bit with SFHA 5.1 SP1 RP4 with fencing enable ( our storage device is IBM .Storwize V3700 SFF scsi3 compliant [root@mitoora1 ~]# vxfenadm -d I/O Fencing Cluster Information: ================================ Fencing Protocol Version: 201 Fencing Mode: SCSI3 Fencing SCSI3 Disk Policy: dmp Cluster Members: * 0 (mitoora1) 1 (mitoora2) RFSM State Information: node 0 in state 8 (running) node 1 in state 8 (running) ******************************************** in /etc/vxfenmode (scsi3_disk_policy=dmp and vxfen_mode=scsi3) vxdctl scsi3pr scsi3pr: on [root@mitoora1 etc]# more /etc/vxfentab # # /etc/vxfentab: # DO NOT MODIFY this file as it is generated by the # VXFEN rc script from the file /etc/vxfendg. # /dev/vx/rdmp/storwizev70000_000007 /dev/vx/rdmp/storwizev70000_000008 /dev/vx/rdmp/storwizev70000_000009 ****************************************** [root@mitoora1 etc]# vxdmpadm listctlr all CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME ===================================================== c0 Disk ENABLED disk c10 StorwizeV7000 ENABLED storwizev70000 c7 StorwizeV7000 ENABLED storwizev70000 c8 StorwizeV7000 ENABLED storwizev70000 c9 StorwizeV7000 ENABLED storwizev70000 main.cf cluster drdbonesales ( UserNames = { admin = hlmElgLimHmmKumGlj } ClusterAddress = "10.90.15.30" Administrators = { admin } UseFence = SCSI3 ) ********************************************** I configured the coordinator fencing so I have 3 lun in a veritas disk group ( dmp coordinator ) All seems works fine but I noticed a lot of reservation conflict in the messages of both nodes On the log of the server I am constantly these messages: /var/log/messages Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 9:0:1:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 9:0:0:1: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 7:0:1:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:0:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 8:0:1:3: reservation conflict Nov 26 15:14:09 mitoora2 kernel: sd 10:0:1:3: reservation conflict You have any idea? Best Regards VincenzoSolved11KViews1like15CommentsVCS Error Codes for all platforms
Hello Gents, Do we have a list of all erro codes for VCS ? also if all the error codes are generic and are common for all platforms (including Linux,Solaris, Windows,AIX) Need this confirmation urgently, planning to design a common monitoring agent. Best Regards, NimishSolved6.4KViews2likes6CommentsFreeze SG or Freeze Node ?
hi there, I have a scnario where applications are under cluster [2 node cluster]. I had an issue in the past where hastop -local was executed before bringing down applications manually which resulted in appsg to go in faulted state. My question is should I freeze the service groups individually or should I freeze the nodes while performing OS patch maintainance ? Does freezing node solves the same purpose as that freezing sg's in a perticular node ? Regards, Saurabh6KViews1like9CommentsProblem with VRTS . oracle alram
hi all , recently i come up against oracle issue ,i'm getting the next error message : VCS ERROR V-16-2-13027 (mdsu1a) Resource(mdsuOracleLog_lv) - monitor procedure did not complete within the expected time. my question is why this error can appear? this is also causing to a failover of the servers and the database changing to FAULTED state DBA solution was to adjust interval/timeout values and is might help ,but i want to anlayze this problem and unstrstand why this is hapenning on my system . I have atteached logs and useful information . if someone can help me with this Thx ,Solved2.6KViews0likes17Commentsneed a solution
we have 2 node cluster and with version 5.1 we experienced outage and I think it was due to below error messages can someone shed some light on these messages qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(1): Loop OFFLINE qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(1): Loop ONLINE fctl: [ID 999315 kern.warning] WARNING: fctl(4): AL_PA=0xe8 doesn't exist in LILP map scsi: [ID 107833 kern.warning] WARNING: /pci@0,600000/pci@0/pci@9/SUNW,qlc@0/fp@0,0/ssd@w203400a0b875f9d9,0 (ssd3): Command failed to complete...Device is gone scsi: [ID 107833 kern.warning] WARNING: /pci@0,600000/pci@0/pci@9/SUNW,qlc@0/fp@0,0/ssd@w203400a0b875f9d9,0 (ssd3): Command failed to complete...Device is gone scsi: [ID 107833 kern.warning] WARNING: /pci@0,600000/pci@0/pci@9/SUNW,qlc@0/fp@0,0/ssd@w203400a0b875f9d9,0 (ssd3): Command failed to complete...Device is gone scsi: [ID 243001 kern.info] /pci@0,600000/pci@0/pci@9/SUNW,qlc@0/fp@0,0 (fcp4): offlining lun=0 (trace=0), target=e8 (trace=2800004) vxdmp: [ID 631182 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 removed disk array 600A0B800075F9D9000000004D2334F5, datype = ST2540- vxdmp: [ID 443116 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 i/o error occured (errno=0x6) on dmpnode 334/0x2c last message repeated 59 times vxdmp: [ID 480808 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 disabled path 118/0x18 belonging to the dmpnode 334/0x28 due to open failure vxdmp: [ID 824220 kern.notice] NOTICE: VxVM vxdmp V-5-0-111 disabled dmpnode 334/0x28 what is this dmpnode 334/0x28 signify, I forget how to map this to device as i only remember is tht its in hexadecimal. Also, what could be the cause of it ... is it due to HBA as issue starts with the message like below qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(1): Loop OFFLINE qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(1): Loop ONLINE fctl: [ID 999315 kern.warning] WARNING: fctl(4): AL_PA=0xe8 doesn't exist in LILP map1.8KViews2likes4CommentsLLT: node1 in trouble
Hello All, Recently this message started appearing on the server. Oct 14 08:38:15 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Oct 14 08:38:15 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active Oct 14 08:38:42 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Oct 14 08:38:43 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active Oct 14 08:38:45 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Oct 14 08:38:45 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active Oct 14 08:38:55 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Oct 14 08:38:56 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active Oct 14 08:39:01 db1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Oct 14 08:39:05 db1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active Oct 14 08:39:05 db1 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019 delayed hb 600 ticks from 1 link 0 (ce1) Oct 14 08:39:05 db1 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost 11 hb seq 19344285 from 1 link 0 (ce1) The messages dates back to sept 20 till today. Message is from Oct 14. bash-2.05$ lltstat -nvv|head LLT node information: Node State Link Status Address * 0 db1 OPEN ce1 UP 00:03:BA:93: ce6 UP 00:03:BA:85: 1 db2 OPEN ce1 UP 00:03:BA:93: ce6 UP 00:03:BA:95: 2 CONNWAIT ce1 DOWN Any advice is greatly apperciated, thank you.1.8KViews0likes4CommentsSG is not switching to next node.
Hi All, I am new to VCS but good in HACMP. In our environment we are using VCS-6.0, I one server we found that the SG is not moving from one node to another node when we tried manual failover using the bellow command. hagrp -switch <SGnamg> -to <sysname> We able to see that the SG is offline in the currnent node but it's not coming online in the secondary node. There is no error locked in engine_A.log except the bellow entry cpus load more than 60% <Secondary node name> Can anyone help me to find the solution for this. I will provide the output of any commands if you need more info to help me out to get this trouble shooted :) Thanks,Solved1.8KViews1like8CommentsHeartbeat timeout value
Hi, Recently one of the cluster node got rebooted due to all heartbeat network down (Due to some changes on switch. It took about app 60 Secs) We informed about the reboot to Network Team and in turn they suggested to change the heartbeat timeout value to 60 Secs. Requesting your help - Is it advisable to change the heartbeat timeout value to 60 Secs. I think the default value is 15 Secs. If we change the value from default, what is the consequences? Please advise. Divakar1.7KViews1like4CommentsIs there a way to query VCS to see if monitor timeouts are occurring?
Without attempting to parse the output of engine_A.log, is there a way to use the ha* command set to query VCS to see if monitor timeouts are occuring? We would like to be able to create an alert when monitor timeouts are occurring. -SeannSolved1.6KViews0likes2Comments