Failover

137 Topics

Can LLT heartbeats communicate between NICs with different device names?
One 2-node vcs cluster, the heartbeat NICs are eth2 and eth3 on each node, IF eth2 on node1 down, and eth3 on node2 down. Does this mean the 2 heartbeat Links both down, and the Cluster is in split brain situation? Can LLT heartbeats communicate between NIC eth2 and NIC eth3? Since the 《VCSInstallation Guide》requires the 2 heartbeat Links in different networks.We should put eth2 of both nodes in the VLAN (VLAN1), and put eth3 of both nodes in another vlan (VLAN2). So in this situation heartbeats cannot communicate between eth2 and eth3. But, in a production cluster system, we found out the 4 NICs--eth2 and eth3 of both nodes are all in a same VLAN. and this lead me to post the discussion thread to ask this question: IF eth2 on node1 down, and eth3 on node2 down, What will happen to the cluster (which isin active-standby mode)? Thanks!
Solved
zhangchao
12 years ago Place Cluster Server
1.6KViews
5likes
5Comments
SmartIO blueprint and deployment guide for Solaris platform
SmartIO for Solaris was introduced in Storage Foundation HA 6.2. SmartIO enables data efficiency on your SSDs through I/O caching. Using SmartIO to improve efficiency, you can optimize the cost per Input/Output Operations Per Second (IOPS). SmartIO supports both read and write-back caching for the VxFS file systems that are mounted on VxVM volumes, in multiple caching modes and configurations. SmartIO also supports block-level read caching for applications running on VxVM volumes. The SmartIO Blueprint for Solaris give an overview of the benefits of using SmartIO technology, the underlying technology, and the essential configuration steps to configure it. In the SmartIO Deployment Guide for Solaris, multiple deployment scenarios of SmartIO and how to manage them are covered in detail. Let us know if you have any questions or feedback!
Moiz_A
10 years ago Place Storage and Clustering
456Views
3likes
0Comments
NFS share doesn't failover due to being busy
Hello! We are trying to implement a failover cluster, which hosts database and files on clustered NFS share. Files are used by the clustered application itself, and by several other hosts. The problem is, that when active node fails (I mean an ungraceful server shutdown or some clustered service stop), the other hosts still continue to use files on our cluster-hosted NFS share. That leads to an NFS-share "hanging", when it doesn't work on the first node, and still cannot be brought online of the second node. Other hosts also experience hanging of requests to that NFS share. Later, I will attach logs, where problem can be observed. The only possible corrective action found by us is total shutdown and sequential start of all cluster nodes and other hosts. Please recommend us a best-practice actions, required for using NFS share on veritas cluster server (maybe, some start/stop/clean scripts being included as a cluster resource, or additional cluster configuration options). Thank you, in advance! Best regards, Maxim Semenov.
Solved
semenov_m_o
11 years ago Place Cluster Server
4.4KViews
3likes
13Comments
Disable AutoFailOver from stopping services
I have several services that I am monitoring that are set to autofailover to a seconds system. It has some up that occasionally I need to restart a service without HA failing over to another system. I can disable the failover from happening using the following command hagrp -modify App_Cluster AutoFailOver 0 However what happens is that if the service is stopped HA continues to shutdown all the other services that are up. I was researching and I cam across disabling the Evacuate on HA, but even with it disabled, it still shuts down the other services. hagrp -modify App_Cluster Evacuate 0 I want the other services to continue to run even if one went down for some reason. What is the best way to accomplish this?
Solved
mkruer
13 years ago Place ApplicationHA
1.9KViews
3likes
4Comments
Failed to get the MSDTC Security configuration for VCS from registry
Hi, I created campus cluster with 2 servers (Operating system: Windows server 2012 R2. SQL server 2012. Symantec Storage Foundation 6.1.) Service group became online through time. In first time - Faulted. I clear fault on this server and try up service group. Service online. I made it offline then try up and got faulted. Then online, faulted and so on... Faulted SQL agent. In SQL server logs I didn't find any errors. In C:\Program Files\Veritas\cluster server\log\SQLserver_A.txt I got errors 2015/05/16 20:41:33 VCS NOTICE V-16-20093-75 SQLServer:SQLServer-SQL1:online:Failed to get the MSDTC Security configuration for VCS from registry.Error : [2, 2] 2015/05/16 20:47:50 VCS ERROR V-16-20093-11 SQLServer:SQLServer-SQL1:online:Failed to wait for the service 'MSSQL$SQL1' to start. Error = [2 ,258] 2015/05/16 20:47:50 VCS DBG_21 V-16-50-0 SQLServer:SQLServer-SQL1:online:*** Start of debug information dump for troubleshooting *** LibLogger.cpp:VLibThreadLogQueue::Dump[206] 2015/05/16 20:47:50 VCS DBG_21 V-16-50-0 SQLServer:SQLServer-SQL1:online:(2) CRegKey::Open failed for Software\Veritas\VCS\EnterpriseAgents\SQLServer\SQLServer-SQL1. LibVcsHive.cpp:VLibVcsHive::_GetDWORDValue[435] 2015/05/16 20:47:50 VCS DBG_21 V-16-50-0 SQLServer:SQLServer-SQL1:online:(2) CRegKey::Open failed for Software\Veritas\VCS\EnterpriseAgents\SQLServer\__Global__. LibVcsHive.cpp:VLibVcsHive::_GetDWORDValue[435] 2015/05/16 20:47:50 VCS DBG_21 V-16-50-0 SQLServer:SQLServer-SQL1:online:(2) _GetDWORDValue failed. Subkey = Software\Veritas\VCS\EnterpriseAgents\SQLServer\__Global__, Name = IgnoreMSDTCSecurity LibVcsHive.cpp:VLibVcsHive::GetValue[401] 2015/05/16 20:47:50 VCS DBG_21 V-16-50-0 SQLServer:SQLServer-SQL1:online:Wait timed out for service MSSQL$SQL1 LibService.cpp:VLibService::WaitForServiceStatus[275] 2015/05/16 20:47:50 VCS DBG_21 V-16-50-0 SQLServer:SQLServer-SQL1:online:*** End of debug information dump for troubleshooting *** LibLogger.cpp:VLibThreadLogQueue::Dump[217] 2015/05/16 20:47:50 VCS WARNING V-16-2-13140 Thread(10516) Could not find timer entry with id 274 2015/05/16 20:47:50 VCS INFO V-16-20093-29 SQLServer:SQLServer-SQL1:monitor:The 'MSSQL$SQL1' service is not in stopped or running state. State = 2. 2015/05/16 20:48:50 VCS INFO V-16-20093-29 SQLServer:SQLServer-SQL1:monitor:The 'MSSQL$SQL1' service is not in stopped or running state. State = 2. 2015/05/16 20:49:50 VCS INFO V-16-20093-29 SQLServer:SQLServer-SQL1:monitor:The 'MSSQL$SQL1' service is not in stopped or running state. State = 2. 2015/05/16 20:49:50 VCS ERROR V-16-2-13066 Thread(9380) Agent is calling clean for resource(SQLServer-SQL1) because the resource is not up even after online completed. 2015/05/16 20:49:50 VCS WARNING V-16-20093-55 SQLServer:SQLServer-SQL1:clean:The service 'MSSQL$SQL1' is not in running state. Attempt to stop it might be unsuccessful. 2015/05/16 20:53:32 VCS WARNING V-16-2-13140 Thread(9380) Could not find timer entry with id 279 2015/05/16 20:53:32 VCS ERROR V-16-2-13068 Thread(9380) Resource(SQLServer-SQL1) - clean completed successfully. 2015/05/16 20:53:32 VCS ERROR V-16-2-13071 Thread(9380) Resource(SQLServer-SQL1): reached OnlineRetryLimit(0). 2015/05/16 20:56:41 VCS NOTICE V-16-20093-75 SQLServer:SQLServer-SQL1:online:Failed to get the MSDTC Security configuration for VCS from registry.Error : [2, 2] There are two massages when service group became online 2015/05/16 20:58:58 VCS INFO V-16-20093-30002 SQLServer:SQLServer-SQL1:imf_register:Registering with IMF for online monitoring 2015/05/16 21:08:28 VCS INFO V-16-20093-30001 SQLServer:SQLServer-SQL1:imf_register:Registering with IMF for offline monitoring Can you heip me resolve this issue? Thanks in advance.
KANSTANTSIN
10 years ago Place Storage Foundation for Windows
1.3KViews
2likes
4Comments
Symantec ApplicationHA 6.2: Monitoring applications with Intelligent Monitoring Framework
Symantec ApplicationHA 6.2: Monitoring applications with Intelligent Monitoring Framework Introduced in this release, the Intelligent Monitoring Framework (IMF) feature improves ApplicationHA efficiency with: Faster detection of application faults Ability to monitor a large number of application components, with minimal effect on performance IMF is automatically enabled, if you use the Symantec High Availability Wizard to configure an application for monitoring. The feature was introduced in ApplicationHA 6.1 for Windows. In ApplicationHA 6.2, it is extended to AIX, Linux, and Solaris. For details, see the following topics: How intelligent monitoring works:AIX,Linux (KVM), Linux (VMware), andSolaris. Enabling debug logs for IMF:AIX,Linux (KVM),Linux (VMware), andSolaris. Gathering IMF information for support analysis:AIX,Linux (KVM),Linux (VMware), andSolaris. This release introduces IMF support for the folloing ApplicationHA agents: Apache HTTP Server DB2 Database (not applicable to Oracle VM Server for SPARC environment) Oracle Database Generic (custom) applications The following topics describe how to use the Symantec High Availability wizard to configure each supported application for IMF-enabled monitoring: Configuring application monitoring for Apache:AIX,Linux (KVM),Linux (VMware), andSolaris. Configuring application monitoring for DB2:AIX,Linux (KVM),and(Linux (VMware). Configuring application monitoring for Oracle:AIX,Linux (KVM),(Linux (VMware), andSolaris. Configuring application monitoring for generic applications:AIX,Linux (KVM),(Linux (VMware), andSolaris. You can use Symantec Cluster Server (VCS) commands to perform more advanced IMF actions. ApplicationHA and VCS documentation is available on the SORTwebsite.
Sanjay_Pendse
10 years ago Place Storage and Clustering
467Views
2likes
0Comments
SFHA Solutions 6.1: Using AdaptiveHA to select the largest system for failover
Symantec Cluster Server (VCS) service groups are virtual containers that manage groups of resources required to run a managed application. The FailOverPolicy service group attribute governs how VCS determines the target system for failover. For more information, see About service groups Service group attributes Cluster attributes About defining failover policies When you set FailOverPolicy to BiggestAvailable, AdaptiveHA enables VCS to dynamically select the cluster node with the most available resources to fail over an application. VCS monitors and forecasts the unused capacity of systems in terms of CPU, Memory, and Swap, to select the largest available system. If you set FailOverPolicy to BiggestAvailable for a service group, you must specify the load values in terms such as, 1 CPU, 1GB RAM, and 1GB SWAP, in the Load service group attribute.You only need to specify those resources that are used by the service group. For example, if the service group does not use the Swap resource, only specify the CPU and Memory resources in the Load attribute. Note: The Load FailOverPolicy isbeingdeprecated after this release. Symantec recommends that you change to theBiggestAvailableFailOverPolicy for enabling AdaptiveHA. For more information, see About AdaptiveHA Enabling AdaptiveHA for a service group If you upgrade VCS manually, ensure that you update the VCS configuration file (main.cf) to enable AdaptiveHA. When you upgrade from an older version of VCS using the installer, the main.cf file gets automatically upgraded. For more information, seeManually upgrading the VCS configuration file to the latest version VCS documentation for other platforms and releases can be found on theSORTwebsite.
Moiz_A
11 years ago Place Storage and Clustering
489Views
2likes
0Comments
vxio: Cluster software communication timeout. Reservation refresh has been suspended
Hi, We are experiencing this error on one of our clusters. It'sa two-node campus cluster with the following specifications SiteA Node1 is a Windows Server 2008 R2virtual machine residing on a ESXi 5.1 host in this site Disk1 and 3areLUNs in an enclosure in this site SiteB Node2 is a Windows Server 2008 R2 virtual machine residing on a ESXi 5.1 host in this site Disk2and 4areLUNs in an enclosure in this site We havecreated twoVMDGs, one contains Disk 1 and 2, while the other contains Disk 3 and 4.On these VMDGs, wehave created mirrored dynamic volumes.TheVMDGs arethen presented to the failover cluster. The quorum type on the failover cluster is a file share witness, onanother server. We are also running Microsoft System Center Configuration Manager to install updates and patches on Node 1 and 2. Whenever patches are installed on a node, it gets restarted. Whenever that occurs, failover from Node 1 to Node 2 occurs for the cluster resource group. Everything seems to failover just fine, and the VMDG is imported successfully (according to the log). But 10 minutes after the VMDG has been imported, the following error is logged on Node 2 http://s28.postimg.org/ubh8skfh9/vmdg2.png If I check the status of the VMDGs in VEA its Deported for both VMDGs. http://s3.postimg.org/72ort9683/vmdg3.png But even if the disks and VMDGs seem to be offline on the active node, failover does not occur, as in Failover Cluster Manager, the VMDG is online, but there are no volumes enumerated on it. http://s12.postimg.org/p31vncct9/vmdg1.png Has anyone else experienced the same, and knows why the status of the disks change to deported, without failover occuring?
Solved
Balthier35
10 years ago Place Storage Foundation for Windows
2.6KViews
2likes
6Comments
Error while installing VCS in solaris 11
The following warnings were discovered on the systems: CPI WARNING V-9-40-4923 To avoid a potential reboot after installation, you should modify the /etc/system file on solaris with the appropriate values, and reboot prior to package installation. Appropriate /etc/system file entries are shown below: set lwp_default_stksize=0x8000 set rpcmod:svc_default_stksize=0x8000 CPI WARNING V-9-40-4923 To avoid a potential reboot after installation, you should modify the /etc/system file on solaris11 with the appropriate values, and reboot prior to package installation. Appropriate /etc/system file entries are shown below: set lwp_default_stksize=0x8000 set rpcmod:svc_default_stksize=0x8000 installer log files and summary file are saved at: /opt/VRTS/install/logs/installer-201301250058fNC
Solved
sk_sami783
12 years ago Place Cluster Server
623Views
2likes
1Comment
SFHA Solutions 6.0.1: About GAB seeding and its role in VCS and other SFHA products
Group Membership and Atomic Broadcast (GAB) is a kernel component of Veritas Cluster Server (VCS) that provides globally-ordered messages that keep nodes synchronized. GAB maintains the cluster state information and the correct membership on the cluster. However, GAB needs another kernel component, Low Latency Transport (LLT), to send messages to the nodes and keep the cluster nodes connected. How GAB and LLT function together in a VCS cluster? VCS uses GAB and LLT to share data among nodes over private networks. LLT is the transport protocol responsible for fast kernel-to-kernel communications. GAB carries the state of the cluster and the cluster configuration to all the nodes on the cluster. These components provide the performance and reliability that VCS requires. In a cluster, nodes must share the groups, resources and the resource states. LLT and GAB help the nodes communicate. For information on LLT, GAB, and private networks, see: About LLT and GAB About network channels for heartbeating GAB seeding The GAB seeding function ensures that a new cluster starts with an accurate membership count of the number of nodes in the cluster. It prevents your cluster from a preexisting network partition upon initial start-up. A preexisting network partition refers to the failure in the communication channels that occurs while the nodes are down and VCS cannot respond. When the nodes start, GAB seeding reduces the vulnerability to network partitioning, regardless of the cause of the failure. GAB services are used by all Veritas Storage Foundation and High Availability (SFHA) products. For information about preexisting network partitions, and how seeding functions in VCS, see: About preexisting network partitions About VCS seeding Enabling automatic seeding of GAB If I/O fencing is configured in the enabled mode, you can edit the /etc/vxfenmode file to enable automatic seeding of GAB. If the cluster is stuck with a preexisting split-brain condition, I/O fencing allows automatic seeding of GAB. You can set the minimum number of nodes to form a cluster for GAB to seed by configuring the Control port seed and Quorum flag parameters in the /etc/gabtab file. Quorum is the number of nodes that need to join a cluster for GAB to complete seeding. For information on configuring the autoseed_gab_timeout parameter in the /etc/vxfenmode file, see: About I/O fencing configuration files For information on configuring the control port seed and the Quorum flag parameters in GAB, see: About GAB run-time or dynamic tunable parameters For information on split-brain conditions, see: About the Steward process: Split-brain in two-cluster global clusters How I/O fencing works in different event scenarios Example of a preexisting network partition (split-brain) Role of GAB seeding in cluster membership For information on how the nodes gain cluster membership, seeding a cluster, and manual seeding of a cluster, see: About cluster membership Initial joining of systems to cluster membership Seeding a new cluster Seeding a cluster using the GAB auto-seed parameter through I/O fencing Manual seeding of a cluster Troubleshooting issues that are related to GAB seeding and preexisting network partitions For information on the issues that you may encounter when GAB seeds a cluster and preexisting network partitions, see: Examining GAB seed membership Manual GAB membership seeding Waiting for cluster membership after VCS start-up Summary of best practices for cluster communications System panics to prevent potential data corruption Fencing startup reports preexisting split-brain Clearing preexisting split-brain condition Recovering from a preexisting network partition (split-brain) Example Scenario I – Recovering from a preexisting network partition Example Scenario II – Recovering from a preexisting network partition Example Scenario III – Recovering from a preexisting network partition gabconfig (1M) 6.0.1 manual pages: AIX Solaris For more information on seeding clusters to prevent preexisting network partitions, see: Veritas Cluster Server Administrator's Guide Veritas Cluster Server Installation Guide Veritas Cluster Server documentation for other releases and platforms can be found on the SORT website.
Jubin
12 years ago Place Storage and Clustering
739Views
2likes
0Comments