cancel
Showing results for 
Search instead for 
Did you mean: 

VCS Heartbeat over Fibre Optic Cables

tme
Level 2

Hello,

I have a 2 node Veritas Cluster running on Solaris 10 that has been having problems with split-brain that results in corruption of my Oracle databases.  The heartbeat connections are configured through switches and I want to change that to be direct connect between the two cluster members.  I realize that ideally, I would have the network interfaces on each node (2 per node, different NICs, different busses) to use ethernet cross-over cables as the best practice.  Unfortunately, I do not have those ports available unless I procure another 2 or 4 port NIC and do some network reconfiguration.

I do however have two independent Fiber interfaces available on both cluster members and I am wondering if I can use fiber optic cables for the heartbeat network?  And if I can, what should the llttab.conf file look like?

I'm pretty new to this stuff and by no means a SME on Veritas Clustering. So anything folks can offer up to assist here would really be appreciated.

Thanks - Tom E.

1 ACCEPTED SOLUTION

Accepted Solutions

joseph_dangelo
Level 6
Employee Accredited

Tom,

There are a number of fail-safes are preventative tools embedded with Storage Foundation HA/VCS to mitigate split brain.  My first suggestion would be to avoid using cross-over cables as it precludes you from growing the cluster beyond 2 nodes. The network medium (Fiber LC vs, Copper RJ45)  used is irrelevant as long as it supports Ethernet. Although I must say I've have never configured an LC Fiber crossover connection before.  That being said,  I am more curious as to why you are experiencing split brain as often as it sounds. 

When you configured your "High Priority" LLT links, you should be able to sustain the loss of a single connection without creating a split brain scenario.  Is it possible that you have a single point of failure somewhere along the heartbeat network path?

You will probably want to add what is called a "Low Priority" LLT link as well.  This is done so over the production interface and sends LLT packets and at a much lower interval so as to not interfere with data throughput.

You're best option however is to configure I/O fencing as a means to prevent data corruption all together. I/O fencing uses disk based arbitration in the event of a split brain to ensure that only one node has access to the data volumes.  You will want to consult the admin guides for SFHA to better acquaint yourself with I/O Fencing.

https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/vcs_admin_51sp1_sol...

Starting on page 303.

Hope this helps,

Joe D

View solution in original post

7 REPLIES 7

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

If so, it it impossible to configure navtive LLT on FC. Native LLT can be configured on ethernet interface only.

Have you consider to configure LLT over UDP on existing NICs? Or have you configured native LLT on non-dedicating NICs until now, and need to change to dedicating NICs?

I'm not sure, but it might be possible to configure LLT over UDP  on IPFC(Internet Protocol over Fibre Channel).

Reference:

Solaris SAN Configuration and Multipathing Guide
http://download.oracle.com/docs/cd/E19253-01/820-1931/820-1931.pdf

Veritas Cluster Server 5.1 Installation Guide
https://sort.symantec.com/public/documents/sf/5.1/solaris/pdf/vcs_install.pdf

joseph_dangelo
Level 6
Employee Accredited

Tom,

There are a number of fail-safes are preventative tools embedded with Storage Foundation HA/VCS to mitigate split brain.  My first suggestion would be to avoid using cross-over cables as it precludes you from growing the cluster beyond 2 nodes. The network medium (Fiber LC vs, Copper RJ45)  used is irrelevant as long as it supports Ethernet. Although I must say I've have never configured an LC Fiber crossover connection before.  That being said,  I am more curious as to why you are experiencing split brain as often as it sounds. 

When you configured your "High Priority" LLT links, you should be able to sustain the loss of a single connection without creating a split brain scenario.  Is it possible that you have a single point of failure somewhere along the heartbeat network path?

You will probably want to add what is called a "Low Priority" LLT link as well.  This is done so over the production interface and sends LLT packets and at a much lower interval so as to not interfere with data throughput.

You're best option however is to configure I/O fencing as a means to prevent data corruption all together. I/O fencing uses disk based arbitration in the event of a split brain to ensure that only one node has access to the data volumes.  You will want to consult the admin guides for SFHA to better acquaint yourself with I/O Fencing.

https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/vcs_admin_51sp1_sol...

Starting on page 303.

Hope this helps,

Joe D

Gaurav_S
Moderator
Moderator
   VIP    Certified

Fully agree with recommendations from Joe, FC is not an option for LLT however you have excellent technology IOFencing to prevent split brains & data corruption.

Have  a look at VCS users guide, would surely help you ..

 

G

tme
Level 2

Good point but it is not a fibre channel interface, it's an ethernet interface.  Just a different medium.  Should use the same TCP/IP stack and the same ethernet device driver.

 

Thanks - Tom

tme
Level 2

I really appreciate your reply.  We are temporarily going to configure the heartbeat cables using SC to SC multimode fibre on the fibre ethernet interfaces that are available.

We are going to procure additional copper ethernet cards and after our peak season (sometime in May 2012) reconfigure our network slightly so that all 5 of our clusters use copper cross-over cables for the heartbeat network.  Two of the five are currently configured this way.

We are confident that the one cluster that is giving us problems has to do with a problem or spof in the heartbeat network.  Most likely they are connected to the same switch because when we did the switch OS upgrades, we had the problem in just that one cluster.  And it's always been just that one cluster.

Our project timeline and capacity planning indicate that our needs won't require additional cluster members.

We are also planning on implementing I/O Fencing at our earliest convenience.

Thanks again for your reply - Tom

joseph_dangelo
Level 6
Employee Accredited

Tom,

You are very welcome Tom.  Let us know if you need any other assistance.  If you find that you are configuring multiple clusters with I/O fencing and find it a bit cumbersome to provision so many coordinator disks/luns, you can alternatively implement network based fencing with Coordination Point Server.

Joe D 

Noel_Victor
Not applicable

Hi Tom,

Have you tried FCOE ( FIbre Channel over Ethernet) you should be able to transport FC packets on Ethernet medium/channels. You may want to use CNA adapters to fix this issue.

Noel Victor