VCS Heartbeat over Fibre Optic Cables
Hello,
I have a 2 node Veritas Cluster running on Solaris 10 that has been having problems with split-brain that results in corruption of my Oracle databases. The heartbeat connections are configured through switches and I want to change that to be direct connect between the two cluster members. I realize that ideally, I would have the network interfaces on each node (2 per node, different NICs, different busses) to use ethernet cross-over cables as the best practice. Unfortunately, I do not have those ports available unless I procure another 2 or 4 port NIC and do some network reconfiguration.
I do however have two independent Fiber interfaces available on both cluster members and I am wondering if I can use fiber optic cables for the heartbeat network? And if I can, what should the llttab.conf file look like?
I'm pretty new to this stuff and by no means a SME on Veritas Clustering. So anything folks can offer up to assist here would really be appreciated.
Thanks - Tom E.
Tom,
There are a number of fail-safes are preventative tools embedded with Storage Foundation HA/VCS to mitigate split brain. My first suggestion would be to avoid using cross-over cables as it precludes you from growing the cluster beyond 2 nodes. The network medium (Fiber LC vs, Copper RJ45) used is irrelevant as long as it supports Ethernet. Although I must say I've have never configured an LC Fiber crossover connection before. That being said, I am more curious as to why you are experiencing split brain as often as it sounds.
When you configured your "High Priority" LLT links, you should be able to sustain the loss of a single connection without creating a split brain scenario. Is it possible that you have a single point of failure somewhere along the heartbeat network path?
You will probably want to add what is called a "Low Priority" LLT link as well. This is done so over the production interface and sends LLT packets and at a much lower interval so as to not interfere with data throughput.
You're best option however is to configure I/O fencing as a means to prevent data corruption all together. I/O fencing uses disk based arbitration in the event of a split brain to ensure that only one node has access to the data volumes. You will want to consult the admin guides for SFHA to better acquaint yourself with I/O Fencing.
Starting on page 303.
Hope this helps,
Joe D