Failover

73 Topics

Can LLT heartbeats communicate between NICs with different device names?
One 2-node vcs cluster, the heartbeat NICs are eth2 and eth3 on each node, IF eth2 on node1 down, and eth3 on node2 down. Does this mean the 2 heartbeat Links both down, and the Cluster is in split brain situation? Can LLT heartbeats communicate between NIC eth2 and NIC eth3? Since the 《VCSInstallation Guide》requires the 2 heartbeat Links in different networks.We should put eth2 of both nodes in the VLAN (VLAN1), and put eth3 of both nodes in another vlan (VLAN2). So in this situation heartbeats cannot communicate between eth2 and eth3. But, in a production cluster system, we found out the 4 NICs--eth2 and eth3 of both nodes are all in a same VLAN. and this lead me to post the discussion thread to ask this question: IF eth2 on node1 down, and eth3 on node2 down, What will happen to the cluster (which isin active-standby mode)? Thanks!
Solved
zhangchao
12 years ago Place Cluster Server
1.6KViews
5likes
5Comments
NFS share doesn't failover due to being busy
Hello! We are trying to implement a failover cluster, which hosts database and files on clustered NFS share. Files are used by the clustered application itself, and by several other hosts. The problem is, that when active node fails (I mean an ungraceful server shutdown or some clustered service stop), the other hosts still continue to use files on our cluster-hosted NFS share. That leads to an NFS-share "hanging", when it doesn't work on the first node, and still cannot be brought online of the second node. Other hosts also experience hanging of requests to that NFS share. Later, I will attach logs, where problem can be observed. The only possible corrective action found by us is total shutdown and sequential start of all cluster nodes and other hosts. Please recommend us a best-practice actions, required for using NFS share on veritas cluster server (maybe, some start/stop/clean scripts being included as a cluster resource, or additional cluster configuration options). Thank you, in advance! Best regards, Maxim Semenov.
Solved
semenov_m_o
11 years ago Place Cluster Server
4.4KViews
3likes
13Comments
Error while installing VCS in solaris 11
The following warnings were discovered on the systems: CPI WARNING V-9-40-4923 To avoid a potential reboot after installation, you should modify the /etc/system file on solaris with the appropriate values, and reboot prior to package installation. Appropriate /etc/system file entries are shown below: set lwp_default_stksize=0x8000 set rpcmod:svc_default_stksize=0x8000 CPI WARNING V-9-40-4923 To avoid a potential reboot after installation, you should modify the /etc/system file on solaris11 with the appropriate values, and reboot prior to package installation. Appropriate /etc/system file entries are shown below: set lwp_default_stksize=0x8000 set rpcmod:svc_default_stksize=0x8000 installer log files and summary file are saved at: /opt/VRTS/install/logs/installer-201301250058fNC
Solved
sk_sami783
12 years ago Place Cluster Server
623Views
2likes
1Comment
Is it recommended to configure coordinator DG for failover SG
Dears , Is it mandatory or recommended to implment IO fencing using co-ordinator disks if i have only failover service group with non-shared disk groups ? disk headers will have the info. "on which node it's imported" and DG is not shared ..
Solved
hytham_fekry
12 years ago Place Cluster Server
691Views
2likes
4Comments
Reg application service returning "The program exited with return code <0>"
Hi All, I configured my application in VCS and when i try to online the service, its actually online the service( I could check via service status command), but VCS getting the error code "The program exited with return code <0>" Here is my main.cf file parameters for the specific service Application dfm-ocie ( StartProgram = "/etc/init.d/ocie start" StopProgram = "/etc/init.d/ocie stop" MonitorProcesses = { classpath, "/opt/netapp/essentials/jboss/lib/jboss-logmanager.jar" } ) 2014/12/12 11:57:18 VCS INFO V-16-10031-509 (vmlnx64-xyz) Application:dfm-ocie:online:Executed </etc/init.d/ocie> as user <root>. The program exited with return code <0>. 2014/12/12 11:57:19 VCS INFO V-16-2-13716 (vmlnx64-xyz) Resource(dfm-ocie): Output of the completed operation (online) ============================================== Starting NetApp OnCommand Insight Essentials Server service. This may take couple of minutes Successfully started NetApp OnCommand Insight Essentials Server service ============================================== 2014/12/12 11:59:20 VCS ERROR V-16-2-13066 (vmlnx64-xyz) Agent is calling clean for resource(dfm-ocie) because the resource is not up even after online completed. 2014/12/12 11:59:21 VCS INFO V-16-2-13068 (vmlnx64-xyz) Resource(dfm-ocie) - clean completed successfully. 2014/12/12 11:59:21 VCS INFO V-16-2-13071 (vmlnx64-xyz) Resource(dfm-ocie): reached OnlineRetryLimit(0). 2014/12/12 11:59:23 VCS ERROR V-16-1-54031 Resource dfm-ocie (Owner: Unspecified, Group: dfmgrpkjag) is FAULTED on sys vmlnx64-xyz Is there any thing iam missing here? Also can some one explian whats the MonitorProcesses doesn? -Jaga
Solved
jagathe
10 years ago Place Cluster Server
1.3KViews
1like
3Comments
SG is not switching to next node.
Hi All, I am new to VCS but good in HACMP. In our environment we are using VCS-6.0, I one server we found that the SG is not moving from one node to another node when we tried manual failover using the bellow command. hagrp -switch <SGnamg> -to <sysname> We able to see that the SG is offline in the currnent node but it's not coming online in the secondary node. There is no error locked in engine_A.log except the bellow entry cpus load more than 60% <Secondary node name> Can anyone help me to find the solution for this. I will provide the output of any commands if you need more info to help me out to get this trouble shooted :) Thanks,
Solved
prabindr
10 years ago Place Cluster Server
1.8KViews
1like
8Comments
VCS failovers and copies the crontabs
Hello, I am using VCS on Oracle M9000 machines. I have three node cluster. The question is when I failover the services from one node to another I want all the crontabs to be copied to the other live node as well. Which doesnt seem to be working fine for now in my Domain. Can you please help me out that where to define this 'copy cron' procedure so evertime when one enviorment fails over to another node it also copies the same cron from the previous system. Or if there is any procedure which copies the crontabs of every user daily on all cluster nodes. I need to know if this can be configured in VCS. All useful replies are welcome. Best Regards, Mohammad Ali Sarwar
Ali_Sarwar
11 years ago Place Cluster Server
1.2KViews
1like
3Comments
cluster behavior needed, which cfg vars to modify
Hallo, I wish to have the following behavior from a Veritas cluster, monitoring a resource (app): resource failed, first attempt to restart it on the same node, if not, migrate it to the second node. However, is there another monitor which forces the resource to directly migrate if it fails too many times in a given timeframe, instead on starting it again on the same node ? When testing, I have different behaviors depending on how much time I wait between manually killing the app and I do not know exactly which configurations I have to edit. basically, the question is how much time do I have between manually failing the resource, so the cluster restarts it again on the _same_ node? cfg so far ->ToleranceLimit = 0RestartLimit = 1OnlineTimeout = 300.
Solved
pb227
12 years ago Place Cluster Server
358Views
1like
1Comment
CompositeFileShare subfolders not avail on HA node when created outside of VCS
Windows 2008R2, VCS 6.0.1, Shared subfolders are not available after switching to a second node when using CompositeFileShare with ShareSubdirectories enabled. On the first node, using Windows tools, a new subdirectory share becomes visible and the subdirectory is available. However, when switching to the second node (hagrp -switch), the subfolder is no longer visible. These are not hidden shares. Failed with AccessBasedenumeration 0 or 1.
bjh
12 years ago Place Cluster Server
686Views
1like
5Comments
VCS ERROR V-16-2-13067 SERVER01 Agent is calling clean for resource because the resource became OFFLINE unexpectedly, on its own.
Hi,, I have 8 nodes Veritas cluster running on RHEL 5.5 in which we are using Firedrill resources. For some time nowFireDrill resources faulted everyday with below error. LOGS: 2011/01/21 20:38:06 VCS INFO V-16-20054-101 SERVER01 MirrorViewSnap:mirrorviewsnap_ora:monitor:Ping output: PING XX.XX.XX.XX (XX.XX.XX.XX) 56(84) bytes of data. 64 bytes from XX.XX.XX.XX: icmp_seq=1 ttl=125 time=0.289 ms --- XX.XX.XX.XX ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.289/0.289/0.289/0.000 ms 2011/01/21 20:38:22 VCS ERROR V-16-2-13067 SERVER01 Agent is calling clean for resource(fd_mnt_oradata2) because the resource became OFFLINE unexpectedly, on its own. 2011/01/21 20:38:23 VCS NOTICE V-16-10031-5512 SERVER01 Mount:fd_mnt_oradata2:clean:Trying force umount with signal 9... 2011/01/21 20:38:23 VCS INFO V-16-2-13716 SERVER01 Resource(fd_mnt_oradata2): Output of the completed operation (clean) ============================================== Cannot stat /oradata2: Input/output error Cannot stat /oradata2: Input/output error Cannot stat /oradata2: Input/output error ============================================== 2011/01/21 20:38:23 VCS INFO V-16-2-13068 SERVER01 Resource(fd_mnt_oradata2) - clean completed successfully. 2011/01/21 20:38:24 VCS INFO V-16-1-10307 Resource fd_mnt_oradata2 (Owner: unknown, Group: fd_oracle) is offline on SERVER01 (Not initiated by VCS) 2011/01/21 20:38:24 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fd_LISTENER (Owner: unknown, Group: fd_oracle) on System SERVER01 2011/01/21 20:38:24 VCS INFO V-16-20002-40 SERVER01 Netlsnr:fd_LISTENER:offline:lsnrctl returned the following output +--------------------------------------------------------------------+ LD_LIBRARY_PATH - /usr/lib: LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 24-APR-2012 20:38:24 Copyright (c) 1991, 2009, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=xx.xx.xx.xx)(PORT=1530))) The command completed successfully +====================================================================+ 2011/01/21 20:38:26 VCS INFO V-16-1-10305 Resource fd_LISTENER (Owner: unknown, Group: fd_oracle) is offline on SERVER01 (VCS initiated) 2011/01/21 20:38:26 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fd_PDDCOTC (Owner: unknown, Group: fd_oracle) on System SERVER01 2011/01/21 20:38:26 VCS WARNING V-16-20002-23 SERVER01 Oracle:fd_PDDCOTC:offline:Oracle database PDDCOTCF not running 2011/01/21 20:38:27 VCS ERROR V-16-2-13067 SERVER01 Agent is calling clean for resource(fd_mnt_oradata1) because the resource became OFFLINE unexpectedly, on its own. 2011/01/21 20:38:28 VCS NOTICE V-16-10031-5512 SERVER01 Mount:fd_mnt_oradata1:clean:Trying force umount with signal 9... 2011/01/21 20:38:28 VCS INFO V-16-2-13716 SERVER01 Resource(fd_mnt_oradata1): Output of the completed operation (clean) ============================================== Cannot stat /oradata1: Input/output error Cannot stat /oradata1: Input/output error Cannot stat /oradata1: Input/output error ============================================== 2011/01/21 20:38:28 VCS INFO V-16-1-10305 Resource fd_PDDCOTC (Owner: unknown, Group: fd_oracle) is offline on SERVER01 (VCS initiated) 2011/01/21 20:38:28 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fd_ip_listener (Owner: unknown, Group: fd_oracle) on System SERVER01 2011/01/21 20:38:28 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fd_mnt_oradata1 (Owner: unknown, Group: fd_oracle) on System SERVER01 2011/01/21 20:38:28 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fd_oradata2 (Owner: unknown, Group: fd_oracle) on System SERVER01 2011/01/21 20:38:28 VCS INFO V-16-2-13068 SERVER01 Resource(fd_mnt_oradata1) - clean completed successfully. 2011/01/21 20:38:29 VCS INFO V-16-1-10305 Resource fd_mnt_oradata1 (Owner: unknown, Group: fd_oracle) is offline on SERVER01 (VCS initiated) 2011/01/21 20:38:29 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fd_oradata1 (Owner: unknown, Group: fd_oracle) on System SERVER01 2011/01/21 20:38:29 VCS INFO V-16-1-10306 Resource fd_mnt_oradata1 (Owner: unknown, Group: fd_oracle) is offline on SERVER01 (Previous State = OFFLINE) 2011/01/21 20:38:30 VCS INFO V-16-2-13716 SERVER01 Resource(fd_oradata2): Output of the completed operation (offline) ============================================== VxVM vxprint ERROR V-5-1-582 Disk group ora_dg_fd: No such disk group ============================================== 2011/01/21 20:38:30 VCS INFO V-16-1-10305 Resource fd_ip_listener (Owner: unknown, Group: fd_oracle) is offline on SERVER01 (VCS initiated) 2011/01/21 20:38:31 VCS INFO V-16-2-13716 SERVER01 Resource(fd_oradata1): Output of the completed operation (offline) ============================================== VxVM vxprint ERROR V-5-1-582 Disk group ora_dg_fd: No such disk group ============================================== 2011/01/21 20:38:31 VCS INFO V-16-1-10305 Resource fd_oradata2 (Owner: unknown, Group: fd_oracle) is offline on SERVER01 (VCS initiated) 2011/01/21 20:38:31 VCS INFO V-16-1-10305 Resource fd_oradata1 (Owner: unknown, Group: fd_oracle) is offline on SERVER01 (VCS initiated) 2011/01/21 20:38:31 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fd_ora_dg (Owner: unknown, Group: fd_oracle) on System SERVER01 2011/01/21 20:38:31 VCS WARNING V-16-10031-1521 SERVER01 DiskGroup:fd_ora_dg:offline:The command *vxvol -g ora_dg_fd stopall* failed. Doing a forced stop. 2011/01/21 20:38:32 VCS INFO V-16-2-13716 SERVER01 Resource(fd_ora_dg): Output of the completed operation (offline) ============================================== VxVM vxvol ERROR V-5-1-607 Diskgroup ora_dg_fd not found VxVM vxvol ERROR V-5-1-607 Diskgroup ora_dg_fd not found VxVM vxdg ERROR V-5-1-580 Disk group ora_dg_fd: Flush failed: Disk group is disabled ============================================== 2011/01/21 20:38:33 VCS INFO V-16-1-10305 Resource fd_ora_dg (Owner: unknown, Group: fd_oracle) is offline on SERVER01 (VCS initiated) 2011/01/21 20:38:33 VCS NOTICE V-16-1-10300 Initiating Offline of Resource mirrorviewsnap_ora (Owner: unknown, Group: fd_oracle) on System SERVER01 2011/01/21 20:38:41 VCS INFO V-16-20054-101 SERVER01 MirrorViewSnap:mirrorviewsnap_ora:offline:Ping output: PING XX.XX.XX.XX (XX.XX.XX.XX) 56(84) bytes of data. 64 bytes from XX.XX.XX.XX: icmp_seq=1 ttl=125 time=0.271 ms
sachinsharma10
13 years ago Place Cluster Server
900Views
1like
2Comments