Solved: Cluster configuration - RDC or GCO?

cmoreno1978 · ‎02-07-2011

Hi all, and thanks again for your help

In other discussions at the forum was asking about the possibilities of setting up a single node cluster with GCO option. As the analysis progressed, we realized that it was possible to configure a RDC configuration. I will Try to summarize the characteristics that we currently have and after that maybe you can help

The customer informed to us that the ISP provider will expand the bandwidth to 3 Mbps, and the distance is about 110Km ... The scheme will be "easier" because both sites have their internet out through the same component so we can switch the same IP for services beetwen sites, now the key factors to implement this are the Latency, bandwith and the application writes behavior

I ran a ping to get an idea of the latency between sites, i understand that we must add more latency by other factors (replication, networking components, etc)

bash-3.00# ping -s XXX.XXX.99.161 1472
PING XXX.XXX.99.161: 1472 data bytes
1480 bytes from XXX.XXX.99.161: icmp_seq=0. time=28.9 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=1. time=36.2 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=2. time=41.1 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=3. time=21.7 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=4. time=34.6 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=5. time=20.2 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=6. time=25.6 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=7. time=24.1 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=8. time=44.4 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=9. time=44.8 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=10. time=47.6 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=11. time=27.9 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=12. time=45.3 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=13. time=34.1 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=14. time=17.5 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=15. time=24.7 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=16. time=17.5 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=17. time=24.5 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=18. time=18.3 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=19. time=36.1 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=20. time=18.4 ms
1480 bytes from XXX.XXX.99.161: icmp_seq=21. time=20.5 ms

The ping probes and replies are carried by the ICMP protocol. ICMP is carried within an IP packet. The IP protocol has a header overhead of 20 bytes, and ICMP has a header overhead of 8 bytes, making 28 total bytes of overhead within a maximum packet size of 1500 bytes. This leaves 1500-28 = 1472 as the longest ping request which can be made without fragmentation.

Which plans to replicate is about 40GB of data initially, the deltas do not exceed 5GB...in fact the average of kw/s not exceed 95.90 Kilobytes per second (writes). We will have two NICs for heartbeats, one NIC for replication and one NIC for Application service.

Here in this forum have helped me to understand the possibilities that it provides SF + VVR, with the basic info provided and considering that the replication will be asynchronous, can we deploy RDC in this scenario?

Wally_Heim · ‎02-07-2011

Hi cmoreno1978,

It looks like you can deploy an RDC or a GCO configuration if you want. Althought it is recommended to have multiple heartbeat links when configuring a RDC configuration.

The main difference that you will need to keep in mind if the way the two of them do failover between the sites.

A RDC is configured as a single cluster and will fail the service group between the two sites automatically. With this configuration and default settings, the remote site will always try to bring up the service group if communications are lost to the primary site. The default settings of the RVGPrimary resource will perform a take over in this case and may cause some issues with data loss if the replication link is not up to date.

A GCO, by default will not allow site failover automatically. Site failover will require Administrative intervention so you will be aware of this before it happens. As a result the takeover is only done if you give GCO the authority to move the group while the link is down.

The other item that you need to be aware of with an RDC configuration is TCP/IP related. You have mentioned that you are running 4 NICs and 1 WAN link. You have not mentioned how you are going to setup IP on the 4 NICs. If all 4 links are on the same IP subnet, then Windows will only use 1 NIC for outbound traffic based on the order they appear in the routing table. This is a function on how Microsoft implemented their TCP/IP stack and it cannot be changed by SFW-HA. Multiple IP subnets can be configured if desired.

Thanks,

Wally

View solution in original post

Wally_Heim · ‎02-07-2011

Hi cmoreno1978,

It looks like you can deploy an RDC or a GCO configuration if you want. Althought it is recommended to have multiple heartbeat links when configuring a RDC configuration.

The main difference that you will need to keep in mind if the way the two of them do failover between the sites.

A RDC is configured as a single cluster and will fail the service group between the two sites automatically. With this configuration and default settings, the remote site will always try to bring up the service group if communications are lost to the primary site. The default settings of the RVGPrimary resource will perform a take over in this case and may cause some issues with data loss if the replication link is not up to date.

A GCO, by default will not allow site failover automatically. Site failover will require Administrative intervention so you will be aware of this before it happens. As a result the takeover is only done if you give GCO the authority to move the group while the link is down.

The other item that you need to be aware of with an RDC configuration is TCP/IP related. You have mentioned that you are running 4 NICs and 1 WAN link. You have not mentioned how you are going to setup IP on the 4 NICs. If all 4 links are on the same IP subnet, then Windows will only use 1 NIC for outbound traffic based on the order they appear in the routing table. This is a function on how Microsoft implemented their TCP/IP stack and it cannot be changed by SFW-HA. Multiple IP subnets can be configured if desired.

Thanks,

Wally

mikebounds · ‎02-07-2011

As I said on previous post, even though the default for auto/manual failover is different between the 2, both can be configured as either, but GCO will take a lot longer to alert you a site is down.

Historically, VCS could not be configured for manual failover between sites and so async VVR was not supported (although I could never find an official statement for this in the product guides). Now (from 5.1), VCS can be configured for manual failover between sites and therefore async VVR should be supported (but you may want to clarify with Symantec). Note you can change gabtimeout for heartbeats if the latency if too high between the sites, so using RDC should be ok from a technical point of view

Some other differences I didn't mention in previous post are

For GCO, you have to manually keep 2 main.cf's in sync, whereas for an RDC, there is only one main.cf
You need less NICs for GCO as you don't need private heartbeats, but note for RDC, you could configure 1 private heartbeat. VVR n/w as lowpri heartbeat and have a dual n/w for public. Note, you probably shouldn't configure more than 2 heartbeats unless you have more than 2 independent links between the 2 sites.

If you have not configured GCO/RDC before, then I would recommend you getting a consultant to help you with design/implementation - Symantec should be able to put you in touch with Consultants in your area.

Mike

RiaanBadenhorst · ‎02-09-2011

Hi,

If you want your failover to be automatic, and use GCO, you could consider the implementation of a stewards somewhere (maybe at the ISP). Depends on the network configuration.

When all communication links between any two clusters are lost, each cluster contacts the Steward with an inquiry message. The Steward sends an ICMP ping to the cluster in question and responds with a negative inquiry if the cluster is running or with positive inquiry if the cluster is down. The Steward can also be used in configurations with more than two clusters.

A Steward is effective only if there are independent paths from each cluster to the host that runs the Steward. If there is only one path between the two clusters, you must prevent split-brain by confirming manually via telephone or some messaging system with administrators at the remote site if a failure has occurred. By default, VCS global clusters fail over an application across cluster boundaries with administrator confirmation. You can configure automatic failover by setting the ClusterFailOverPolicy attribute to Auto.

mikebounds · ‎02-09-2011

I would not configure auto failover in GCO or RDC if you are in async mode and you only have one node at each site (as per previous post) as if a node reboots the other node will take over, probably with old data.

Mike

Zahid_Haseeb · ‎03-08-2011

First of all you must know where Symantec offer RDC and where GCO. I think RDC is what Campus clustering which is good for a scenerio where the secondary site is nearby and the GCO is design for a big disasters like earthquake.

Note: In RDC scenerio the heartbeats works on LLT. Offcourse the Routers will be included in your WIDE scenerio. Does the Router can handle the LLT packets if you create the RDC environment?

Wally_Heim · ‎03-09-2011

Hi Zahid,

LLT is a non-routable protocol. You would have to use LLT over UDP or configure the routers as bridged (not recommended) for an RDC configuration to work with routers.

GCO sends TCP/IP (Icmp) heartbeats which are routable.

Thanks,

Wally

joseph_dangelo · ‎03-21-2011

One thing to also consider here is that an RDC configuraton can be achieved without using the HA/DR Version of Storage Foundation. When you apply a VVR key it enables the agents in VCS to support RDC's but not GCO. Therefore if you do choose the GCO route, you are going to want to verify you have the correct license. Hope this helps.

VOX

Cluster configuration - RDC or GCO?