Business interrupted
Author’s Note:  It’s come to our attention that while this is primarily a technical blog geared towards current users of Veritas Cluster Server (VCS), there are some readers of this blog with little background in clustering for availability.  So with that in mind, we begin a series of posts which we hope will provide some foundation knowledge for those aspiring to HA Gurudom.

Nod if any of this sounds familiar …

Your company demands IT services that pretty much never go down. After all, business availability and IT service availability are inextricably linked today.

But things happen. Like hardware failures. Power outages. Upgrades. Patches. Shrinking budgets. And other natural and man-made disasters. Your basic Career Limiting Event™ when the revenue generating or customer facing application goes down.

Given such an environment, is it even possible to achieve application dial tone and the like?

Sure. It starts by clustering.

Clustering solutions have been around for quite some time now, but IT administrators who are new to the technology can likely benefit from an overview of what clustering is and why it was created.

Basically, a cluster is a group of computers that work together to run a set of applications and provide the appearance of a single system to the client and application. While cables connect the computers together physically, cluster software connects them from an application perspective. Within the cluster, the servers are in constant communication with one another.

The objective of clustering is to protect against server, application, and database downtime by eliminating the single point of failure within a single server. It works by ensuring that should one of the servers within that cluster, or a resource running on a server in that cluster, become unavailable as a result of failure or maintenance, another server begins providing that service automatically without user or administrator intervention.

The beauty of clustering is that users continue to use the service, and are not even aware that it is being provided from a different physical server.  After a brief interruption of service, the user simply reconnects to that service (like reloading a web page), and they are off and running again.

The clustering technologies of yesterday differ significantly from today’s robust, flexible solutions. Better yet, when used in combination with data replication and other technologies, these solutions go a long way toward keeping an IT environment up and running, even in the event of a disaster.

Check back as we take a deeper dive into high availability, clustering, and more in subsequent blog posts.