Veritas Cluster LLT link failure
I'm having an issue I'm unable to identify with Veritas Cluster 4.0 MP1 on solaris 9. The cluster supports an instance of Oracle 9i. The node falls out of membership then a short while later will reconnect. Below are snips from /var/adm/messages and the logs from the Cisco switch. Jul 4 04:02:50 jfkdbsp1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Jul 4 04:02:51 jfkdbsp1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active Jul 4 04:02:53 jfkdbsp1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Jul 4 04:02:58 jfkdbsp1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active Jul 4 04:02:58 jfkdbsp1 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019 delayed hb 650 ticks from 1 link 0 (ce1) Jul 4 04:02:58 jfkdbsp1 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost 12 hb seq 35184900 from 1 link 0 (ce1) Jul 4 18:34:07 jfkdbsp1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Jul 4 18:34:11 jfkdbsp1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active Jul 4 18:34:11 jfkdbsp1 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost 10 hb seq 35289448 from 1 link 0 (ce1) Jul 4 18:34:13 jfkdbsp1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Jul 4 18:34:19 jfkdbsp1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 8 sec (36427270) Jul 4 18:34:20 jfkdbsp1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active Jul 4 18:34:20 jfkdbsp1 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019 delayed hb 850 ticks from 1 link 0 (ce1) Jul 4 18:34:20 jfkdbsp1 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost 16 hb seq 35289466 from 1 link 0 (ce1) Jul 4 18:34:22 jfkdbsp1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Jul 4 18:34:24 jfkdbsp1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (ce1) node 1 active Jul 4 18:34:24 jfkdbsp1 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost 8 hb seq 35289475 from 1 link 0 (ce1) Jul 4 18:34:35 jfkdbsp1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (ce1) node 1 in trouble Jul 4 18:34:41 jfkdbsp1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 8 sec (36427294) Jul 4 18:34:42 jfkdbsp1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 9 sec (36427294) Jul 4 18:34:43 jfkdbsp1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 10 sec (36427294) Jul 4 18:34:44 jfkdbsp1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 11 sec (36427294) Jul 4 18:34:45 jfkdbsp1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 12 sec (36427294) Jul 4 18:34:46 jfkdbsp1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 13 sec (36427294) Jul 4 18:34:47 jfkdbsp1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 14 sec (36427294) Jul 4 18:34:48 jfkdbsp1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 15 sec (36427294) Jul 4 18:34:49 jfkdbsp1 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (ce1) node 1 inactive 16 sec (36427294) Jul 4 18:34:49 jfkdbsp1 llt: [ID 911753 kern.notice] LLT INFO V-14-1-10033 link 0 (ce1) node 1 expired Jul 4 18:34:54 jfkdbsp1 gab: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port a gen 53bd93 membership 01 Jul 4 18:34:54 jfkdbsp1 gab: [ID 608499 kern.notice] GAB INFO V-15-1-20037 Port a gen 53bd93 jeopardy ;1 Jul 4 18:34:54 jfkdbsp1 gab: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port h gen 53bd9a membership 01 Jul 4 18:34:54 jfkdbsp1 gab: [ID 608499 kern.notice] GAB INFO V-15-1-20037 Port h gen 53bd9a jeopardy ;1 Jul 4 18:34:54 jfkdbsp1 Had[2025]: [ID 702911 daemon.notice] VCS INFO V-16-1-10077 Received new cluster membership Jul 4 18:34:54 jfkdbsp1 Had[2025]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10087 System jfkdbsf1 (Node '1') is in Regardy Membership - Membership: 0x3, Jeopardy: 0x2 Jul 4 18:34:55 jfkdbsp1 genunix: [ID 408789 kern.warning] WARNING: ce1: fault detected external to device; service degraded Jul 4 18:34:55 jfkdbsp1 genunix: [ID 451854 kern.warning] WARNING: ce1: xcvr addr:0x00 - link down Jul 4 18:34:56: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/12, changed state to down Jul 4 18:34:57: %LINK-3-UPDOWN: Interface GigabitEthernet0/12, changed state to down Jul 4 18:35:54 jfkdbsp1 genunix: [ID 408789 kern.notice] NOTICE: ce1: fault cleared external to device; service available Jul 4 18:35:54 jfkdbsp1 genunix: [ID 451854 kern.notice] NOTICE: ce1: xcvr addr:0x00 - link up 1000 Mbps full duplex Jul 4 18:35:56: %LINK-3-UPDOWN: Interface GigabitEthernet0/12, changed state to up Jul 4 18:35:58: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/12, changed state to up I've replaced the patch cable between the server and the switch, no change. engine_a.log goes back to the original install back in 2008, this issue has occured 700+ times. Since these interfaces aren't plumbed by the OS, is there any way to get diagnostic information from LLT that can shed some light on the cause? I have three sites with an identical config, of the three, I see these errors at site two but there are less than half the number, and site three has zero errors. Any help is appreciated. Thanks.2.1KViews0likes6Commentswhen I switch over oracle via hagui; oracle home owner changes
Dear All, I have a solaris cluster installed oracle db and version 5 VCS. Once I switch-over oracle (actually group of services including oracle) oracle db is not able to up on the other node. when I have a look I see that oracle home directory owner changed, I manually changes the owner and manually bring up via hagui once everything is OK I do again a switch over and on the other node the owner seems to be changed again. what would be the reason veritas or oracle? Can you please help? 2012/04/26 09:39:29 VCS NOTICE V-16-20002-210 Oracle:qipdb:monitor:Setting cookie for proc = ora_pmon_QIPDB, PID = /proc/1107/psinfo 2012/04/26 09:39:29 VCS NOTICE V-16-20002-210 Oracle:qipdb:monitor:Setting cookie for proc = ora_lgwr_QIPDB, PID = /proc/1116/psinfo 2012/04/26 09:39:29 VCS NOTICE V-16-20002-210 Oracle:qipdb:monitor:Setting cookie for proc = ora_dbw0_QIPDB, PID = /proc/1114/psinfo 2012/04/26 09:39:29 VCS NOTICE V-16-20002-210 Oracle:qipdb:monitor:Setting cookie for proc = ora_smon_QIPDB, PID = /proc/1120/psinfo 2012/04/26 09:41:22 VCS WARNING V-16-20002-207 Oracle:qipdb:monitor:Open for ora_pmon failed, setting cookie to null 2012/04/26 10:15:37 VCS ERROR V-16-2-13066 Thread(3) Agent is calling clean for resource(qipdb) because the resource is not up even after online completed. 2012/04/26 10:15:38 VCS ERROR V-16-2-13068 Thread(3) Resource(qipdb) - clean completed successfully. 2012/04/26 10:15:38 VCS ERROR V-16-2-13071 Thread(3) Resource(qipdb): reached OnlineRetryLimit(0). thanks, HalitSolved1.4KViews0likes6CommentsSPFILE Autobackup fails after Db moved to SFHA cluster
Last night I migrated my production database to a SFHA cluster made up two Sun T3-1B blade servers. The nodes of the cluster are: st31bbl01 (Sun T3-1B blade 01) and st31bbl02 (Sun T3-1B blade 02). The name of the virtual database server than can fail over between these nodes is st31bora01 (Sun T3-1B oracle 01). Database backups report as successful within the Netbackup GUI but when I check the log file from the RMAN backup script I see the list of Db files successfully backed up followed by this: Starting Control File and SPFILE Autobackup at 29-SEP-11 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of Control File and SPFILE Autobackup command on ORA_SBT_TAPE_1 channel at 09/29/2011 01:47:42 ORA-19506: failed to create sequential file, name="PASPROD_c-1968089396-20110929-01", parms="" ORA-27027: sbtremove2 returned error ORA-19511: Error received from media manager layer, error text: Failed to remove, PASPROD_c-1968089396-20110929-01, from image catalog. When I check the log file under /usr/openv/netbackup/logs/dbclient, I find the following: 21:46:54.822 [9177] <2> sbtremove2: INF - entering 21:46:54.822 [9177] <2> int_RemoveImage: INF - entering 21:46:54.822 [9177] <2> int_RemoveImage: INF - Removing backup image => <PASPROD_c-1968089396-20110928-04> 21:46:54.822 [9177] <2> xbsa_ValidateFeatureId: INF - entering 21:46:54.822 [9177] <2> xbsa_ValidateFeatureId: INF - leaving (0) 21:46:54.822 [9177] <2> int_FindBackupImage: INF - entering 21:46:54.822 [9177] <2> int_GetBfsDateRange: INF - entering 21:46:54.823 [9177] <2> int_GetBfsDateRange: INF - RMAN file name = PASPROD_c-1968089396-20110928-04 21:46:54.823 [9177] <2> int_GetBfsDateRange: INF - probable auto controlfile 21:46:54.823 [9177] <2> int_GetBfsDateRange: INF - Start range check: bfs_time=1317232800, start=819936000, end=1317354414 21:46:54.823 [9177] <2> int_logDateRange: INF - Start Time = 09/27/11 12:00:00 21:46:54.823 [9177] <2> int_logDateRange: INF - End Time = 09/29/11 12:00:00 21:46:54.823 [9177] <2> int_logDateRange: INF - leaving 21:46:54.823 [9177] <2> xbsa_BeginTransaction: INF - entering 21:46:54.823 [9177] <4> VxBSABeginTxn: INF - entering VxBSABeginTxn. 21:46:54.823 [9177] <4> VxBSAGetEnv: INF - entering GetEnv - NBBSA_DB_TYPE 21:46:54.823 [9177] <4> VxBSAGetEnv: INF - returning - Oracle 21:46:54.823 [9177] <4> VxBSAGetEnv: INF - entering GetEnv - NBBSA_NTUPLE_RESTORE 21:46:54.823 [9177] <4> VxBSAGetEnv: INF - returning - 21:46:54.824 [9177] <4> VxBSAGetEnv: INF - entering GetEnv - NBBSA_COPY_NUMBER 21:46:54.824 [9177] <4> VxBSAGetEnv: INF - returning - 21:46:54.824 [9177] <2> xbsa_BeginTransaction: INF - leaving (0) 21:46:54.824 [9177] <2> xbsa_QueryObject: INF - entering 21:46:54.824 [9177] <4> VxBSAQueryObject: INF - entering QueryObject. 21:46:54.824 [9177] <4> dbc_GetServerClientConfig: entering dbc_GetServerClientConfig. 21:46:54.824 [9177] <4> dbc_GetServerClientConfig: ServerName: <inf-srv17.apacorp.net>, ClientName: <st31bora01> 21:46:54.824 [9177] <4> VxBSAGetEnv: INF - entering GetEnv - BSA_SERVICE_HOST 21:46:54.824 [9177] <4> VxBSAGetEnv: INF - returning - inf-srv17.apacorp.net 21:46:54.824 [9177] <4> VxBSAGetEnv: INF - entering GetEnv - NBBSA_CLIENT_HOST 21:46:54.824 [9177] <4> VxBSAGetEnv: INF - returning - st31bora01 21:46:54.824 [9177] <4> VxBSAGetEnv: INF - entering GetEnv - NBBSA_POLICY 21:46:54.824 [9177] <4> VxBSAGetEnv: INF - returning - ORA_PASPROD 21:46:54.825 [9177] <4> VxBSAGetEnv: INF - entering GetEnv - NBBSA_KEYWORD 21:46:54.825 [9177] <4> VxBSAGetEnv: INF - returning - pasprod 21:46:54.825 [9177] <4> bsa_bplist: entering bsa_bplist 21:46:54.828 [9177] <2> vnet_async_connect: vnet_vnetd.c.3983: connect in progress: 0 0x00000000 21:46:54.865 [9177] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2043: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006 21:46:54.865 [9177] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2057: service: bprd 21:46:55.075 [9177] <2> vnet_async_connect: vnet_vnetd.c.4169: in progress connect: 0 0x00000000 21:46:55.076 [9177] <2> vnet_async_connect: vnet_vnetd.c.4172: connect: async CONNECT FROM 192.168.84.211.63677 TO 192.168.86.11.13724 fd = 14 21:46:55.076 [9177] <2> logconnections: BPRD CONNECT FROM 192.168.84.211.63677 TO 192.168.86.11.13724 21:46:55.076 [9177] <2> vauth_authentication_required: vauth_comm.c.749: no methods for address: no authentication required 21:46:55.076 [9177] <2> vauth_connector: vauth_comm.c.182: no methods for address: no authentication required 21:46:55.076 [9177] <2> bprd_connect: no authentication required 21:46:55.076 [9177] <2> vnet_dlopen_vxss_client_magic: vnet_vxss.c.1766: Assuming no VxSS for DB Agents: 0 0x00000000 21:46:55.076 [9177] <2> bsa_bplist: start_date = Tue Sep 27 12:00:00 2011 21:46:55.077 [9177] <2> bsa_bplist: end_date = Thu Sep 29 12:00:00 2011 21:46:55.077 [9177] <2> bsa_bplist: Request = oraprod dbaprod st31bora01 st31bora01 st31bora01 ORA_PASPROD 7 pasprod 3 999 1317146400 1317319200 4 4 1 1 1 0 4 1017 2007 4 0 C C C C C 0 2 0 0 0 21:46:55.077 [9177] <4> bsa_bplist: Filepath = /PASPROD_c-1968089396-20110928-04 21:46:55.273 [9177] <2> dbc_get_string: Output = EXIT STATUS 131 21:46:55.273 [9177] <16> VxBSAQueryObject: ERR - dbc_get_string() failed 131 21:46:55.273 [9177] <2> xbsa_ProcessError: INF - entering 21:46:55.273 [9177] <2> xbsa_ProcessError: INF - leaving 21:46:55.273 [9177] <16> xbsa_QueryObject: ERR - VxBSAQueryObject: Failed with error: Server Status: client is not validated to use the server 21:46:55.273 [9177] <2> xbsa_QueryObject: INF - leaving (3) 21:46:55.273 [9177] <2> xbsa_EndTransaction: INF - entering 21:46:55.273 [9177] <4> VxBSAEndTxn: INF - entering VxBSAEndTxn. 21:46:55.273 [9177] <4> VxBSAEndTxn: INF - Transaction being COMMITED. 21:46:55.273 [9177] <4> VxBSAGetEnv: INF - entering GetEnv - NBBSA_LOG_DIRECTORY 21:46:55.273 [9177] <4> VxBSAGetEnv: INF - returning - dbclient 21:46:55.273 [9177] <4> VxBSAEndTxn: INF - Cleaning directory: </usr/openv/netbackup/logs/dbclient> 21:46:55.274 [9177] <4> delete_old_files: entering delete_old_files. 21:46:55.274 [9177] <2> xbsa_EndTransaction: INF - leaving (0) 21:46:55.274 [9177] <2> int_FindBackupImage: INF - leaving 21:46:55.274 [9177] <16> int_RemoveImage: ERR - Failed to remove, PASPROD_c-1968089396-20110928-04, from image catalog. 21:46:55.274 [9177] <2> int_RemoveImage: INF - leaving 21:46:55.274 [9177] <2> sbtremove2: INF - leaving 21:46:55.274 [9177] <2> sbterror: INF - entering 21:46:55.274 [9177] <2> sbterror: INF - Error=7501: Failed to remove, PASPROD_c-1968089396-20110928-04, from image catalog. . 21:46:55.274 [9177] <2> sbterror: INF - leaving I suspect the message "client is not validated to use the server" indicates that I have a mis-configuration between the node names, the database virtual server name and the cluster name. Does anyone have any recommendations on where to look to resolve this? Thanks KenSolved1.1KViews0likes3CommentsSFHA on linux Oracle DB corrupted
Hi All I installed SFHA 5.1on Linux Redhat 5.3 on Two node cluster by follow the steps in Veritas Document and the VCS installed sucessfully, service group switch to other node is done sucessfully. the problem when servcie group switch to or make failover to other node database become coeeupted and RMAN detect that the Oracle database files currpted Last try I have following error [oracle@RERDDB1 bin]$ ./rman target system/admin nocatalog Recovery Manager: Release 10.2.0.5.0 - Production on Wed Dec 8 21:55:35 2010 Copyright (c) 1982, 2007, Oracle. All rights reserved. ORACLE error from target database: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 11855) ORA-01110: data file 1: '/oradata/RERD/system01.dbf' error executing package DBMS_RCVMAN in TARGET database RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-00554: initialization of internal recovery manager package failed RMAN-06429: TARGET database is not compatible with this version of RMAN [oracle@RERDDB1 bin]$ I need quick response because I lose long time to fix problem. RegardsSolved804Views0likes1CommentProblem with VRTS . oracle alram
hi all , recently i come up against oracle issue ,i'm getting the next error message : VCS ERROR V-16-2-13027 (mdsu1a) Resource(mdsuOracleLog_lv) - monitor procedure did not complete within the expected time. my question is why this error can appear? this is also causing to a failover of the servers and the database changing to FAULTED state DBA solution was to adjust interval/timeout values and is might help ,but i want to anlayze this problem and unstrstand why this is hapenning on my system . I have atteached logs and useful information . if someone can help me with this Thx ,Solved2.6KViews0likes17Commentsproblem with VRTS ONG resource
Ive got a problem . i have 2 servers with VRTS cluster altought VRTS ONG process is online,im getting this message very often MIG_MPM_1a mpm1a (Veritas_Cluster_Server): ONG (ONG): Resource state is unknown and also this message VCS INFO V-16-2-13001 (mpm1a) Resource(ONG): Output of the completed operation (monitor) /opt/VRTSvcs/bin/ONG/monitor: test: unknown operator 2300 i have compared both configuration files and nothing missing . what could be the problem here ? Thanks,Solved2.5KViews0likes25CommentsOracle RAC and Split Brain Situation.
Folks, Any thoughts how does Oracle RAC handles Split Brain situation ? Might be this is a relative question with w.r.t two varied technologies .. but I assume its the best place to put this as Veritas is master in handling SPLIT BRAIN situation. H2H, ArvindSolved1.2KViews0likes1Comment