The Evolution of Clustering

H__Shannon · ‎07-15-2008

Many of today’s clustering solutions were devised back in the early to mid-1990s and were predicated on the SCSI technology of the day. In the SCSI world, you couldn’t have more than two servers connected to a single storage path. With an active-passive configuration, however, you could have dual channel SCSI hosts that were connected to the same external storage so that when one server failed, the standby server would pick up and get control of the storage. From this scenario, the paradigm of two-node active-passive or sometimes active-active clustering was introduced.

When the storage technology moved to a more complex SAN technology that actually allowed you to have even hundreds of hosts connected to the same storage, the two-node, active-passive clustering technology was not keeping up. After all, who can afford to have half of your servers in a cluster active and half of them passive—read, idle—just in case?

Enter N-to-1 clustering. N represents the number of active servers and 1 represents the standby or spare node. Sometimes referred to as dedicated spare clustering, this made it possible to have just one passive, standby server in the cluster that is dedicated to acting as a spare for each of the active servers in the same cluster.

The trouble with the N-to-1 configuration, however, is that the dedicated spare—the passive server—has to return to its original state so that it can be available for any subsequent failovers from the other servers. In the N-to-1 world, once a failed server is brought back online, its workload must also be brought back online from the server to which it failed over. In other words, the clustering arrangement must be restored back to the way it was, with the server specified as the dedicated spare always returning back to being the passive spare it was originally configured to be. For administrators, that’s just more downtime and another hoop to jump through. For the application and its users, it’s another outage to restore the cluster to the original configuration.

Enter N+1 clustering, or so-called “roaming spare” clustering. With this arrangement, any server in the cluster can act as the spare for any other server, and failed applications do not have to fail back to the server from which they came. No more downtime, and no more hassle. N+1 clustering is much more cost effective from a server perspective than the active/passive or even active/active configurations of the past, and does not have the downsides of the N-to-1 cluster described above.

Veritas Cluster Server takes N+1 a step further and provides N-to-N clustering. A smarter way to cluster, it determines which server to fail over to based on application needs at the time of failure and the current state of resources in the cluster. This is a more granular active/active approach, and it uses spare capacity instead of dedicated spares. The result? Improved server utilization and hardware savings.

More about intelligent failover will be discussed in a subsequent blog on Service Group Workload Management.

Message Edited by hshannon on 07-15-2008 11:01 AM

VOX

The Evolution of Clustering