Solved: How to configure a VCS cluster with a roaming spar...

jeffschaller · ‎10-08-2014

Hello all!

We are looking to add some capacity (2 more servers) to an existing VCS cluster. One server will be for growth; the other new server will be our new N+1 or roaming spare. How do I configure VCS to keep that N+1 node "dark" except in the event of a system failure? For "regular" service group failures, I want them to relocate within the "N" other nodes; but if any of those nodes fails, the service groups should relocate to this N+1/spare/dark node. Has anyone done this?

I've looked through the google search results for "roaming spare" (finding only Eileen's summary article), and through the admin docs but found only graphical representations of an N+1 cluster, or a reference to "service group workload management policy module" which looks to me to only do variations of load-based failover decisions for service groups (not for system-level failures).

Thanks!

-jeff

mikebounds · ‎10-10-2014

You can also use dynamic load so on the N+1 system use "hasys load" to feed in a high value when all the other nodes are available so other nodes will be used because they have less load. i.e you write your own external script which would normally run periodically and feed load into VCS, so when a server goes down within the N nodes, if your script on the N+1 node is running VCS commads to detect what nodes are available, then it can lower the load when a server goes down. You could combine this with triggers so that rather than freezing scripts, the triggers runs hasys -load to lower and higher the load on the N+1 node, so rather than run hasys -load periodically (in cron for example), you would only run hasys load in trigger scripts. One issue in doing this is that sysoffline trigger is run on lowest-numbered system in RUNNING state and you need to run hasys -load locally, but you could hacli (or ssh) for this.

VCS is very flexible - I have been using for 14 years and never found anything it can't do - there is always some work aroound with all the tools that are available.

Mike

View solution in original post

Gaurav_S · ‎10-08-2014

Hi Jeff,

Two options I could think of

1. Control the behavior of failover with right sequence of "systemlist" attribute. SystemList attribute would define the order in which system to be picked for failover ... Read below

https://sort.symantec.com/public/documents/sfha/6.1/solaris/productguides/html/vcs_admin/ch03s03s01.htm

2. You opt for capacity management for service group, aka service group workload management .. refer about the concept here

https://sort.symantec.com/public/documents/sfha/6.1/solaris/productguides/html/vcs_admin/ch11s06.htm

Option 2 is a complex level configuration however it works pretty well ...

G

jeffschaller · ‎10-08-2014

Hi Gaurav!

Thanks for writing back so quickly! I have looked into each of those options, but I'm not sure that they solve my problem.

Option 1 - use a SystemList attribute. If I set each SG's SystemList to include systems 1 through 4, for example; then they would never fail over to system 5. The problem with this solution is that I would really like the SG's to fail over to system 5 if any of the other systems fail! If I instead set the SystemLists to include system 5 as the last item in the list, then I would have to manually update each SG after a system failure. For example, if system 2 fails and all of the SG's move to system 5, then I need to edit the SystemLists to put system 2 at the end.

Option 2 - service group workload management. I did look in to this, and while I still like the idea for other reasons (balancing workload more intelligently across our systems than the haphazard "method" now), I'm not sure that it really implements a dark-node solution. With manual configuration changes (like in option 1), I could set the capacity of system 5 to some low number (zero?), and as long as the other systems in the cluster had capacity, the SG's would stay away from system 5. But, in the event of a system failure (say system 2), then system 2's SG's would be redistributed among all of the remaining systems, presumably also system 5, if that takes their capacities into negative numbers. That just seems like a little too much manual configuration -- and updating -- of the SG load and system capacity numbers to be really workable.

Maybe I've been unintentionally misled by the literature out there; it seems to claim that VCS is built from the ground up with support for N-to-1, N-N, and N+1 configurations. I just can't yet see the configuration that would make N+1 happen.

-jeff

mikebounds · ‎10-08-2014

You should be able to achieve what you want with a combintion of:

System Zones
System limits and group prereqs
System capacity and group load

So configure your N nodes in SystemZone 0 and your N+1 node in SystemZone 1, so then groups will only fail to N+1 node if necessary (I'll describe what I mean by necessary later). So then you configure system capacity and group load so that in the event a group fails, VCS will choose the best system for that group within the N nodes. Now your N nodes can only handle so much so if you have too many failures (system or service group failures) then the system Capacity is allowed to go negative so that is where you need to use System limits and group prereqs to determine how much each system can handle either as a single metric (like Capacity and load) or with System limits and group prereqs you can use multiple metrics, like CPU, Mem, Semaphores etc.). Only when a group prereqs are not met for all limits in the N systems will node N+1 be chosen.

So this actually doesn't distinguish between system and group failover, but it would roughly follow what you require as suppose N=3 and each system can handle 3 groups and by default there are 2 groups on each node (so you have 6 groups running across 3 nodes). Then if a group fails on sys1 then it may fail to sys2 and if another group fails on sys1 then it would fail to sys3. The same would happen if instead sys 1 failed - the groups would fail to sys 2 and sys 3 and I don't know why you would NOT want this to happen as if sys2 and sys3 can handle the load I don't see why you want to use your N+1 (4th) node. But then after either of these events just described, suppose then a group fails on sys2 (or sys2 fails), then if sys1 is still up (because groups failed on sys1, not the system), then groups on sys2 will fail to sys1, but if sys1 is down, then as sys 3 limits would be exceeded (can only run 3 groups), the N+1 node will be chosen.

Mike

mikebounds · ‎10-08-2014

If you really only want N+1 node to be chosen on system failure, then you could use triggers:

sysjoin trigger: Use this so that when a system joins the cluster, if it is the last to join, then it freezes system N+1 to prevent it been used to fail groups to

sysoffline trigger: Use this to unfreeze system N+1 when a system is offlined. Note as the system state is passed to this trigger you can distinguish between a system faulting and being cleanly shutdown if you want to treat these differently.

You would need to test this to make sure that the sysoffline trigger is able to unfreeze system before VCS makes its decisions on where to fail groups to.

Alterntively you could use preonine trigger, so that if a group tries to online on a system and you determine by scripting that other nodes are down and so VCS should online group on N+1 node instead.

Mike

jeffschaller · ‎10-09-2014

Thank you, Mike! You've given me a bunch to think about. I'm a little relieved (in an odd way) that the options so far in this thread are fairly cumbersome; I was worried that I was missing some easy/obvious solution.

I should have prefaced this whole thread with more information about the cluster -- it's full of databases.

The trigger idea sounded the best to me _theoretically_, but the difficulty of testing it makes me anxious about it. We'd also have to script it very carefully to account for which systems are "live" and which is the N+1.

I really like your idea for using Load/Capacity and SystemZones. Part of that is because we're just using Priority failover now, and Load/Capacity seems like such a smarter way to do it. The part that worries me about this solution is that for a system failure, it's my understanding that VCS would essentially "fill up" the SystemZone 0, then finally move the remaining SG's to the N+1 node, resulting in a fairly unbalanced RG / load. I started pushing a little harder on this idea, and we could probably make it work _better_ by setting (and re-setting) system Capacities as we put new SG's into the cluster, but keep each system's Capacity just barely above the evenly-spaced load total per system. Then, in the case of one or two RG failures, they could move to other systems (they'd have spare Capacity available); but if a whole system goes down, the remaining systems would not get too overloaded; and the bulk of the RG's would move to the N+1 system. That would result in a more-balanced endgame, which is nice. The hard part of this solution is constantly readjusting the loads/capacities to keep them in line with growth.

Apologies for thinking out loud here, but in considering all of this all morning, I realized that we're looking at two basic failure scenarios:

1. the failure of one or a couple RG's

2. the failure of an entire system

Case #1 is, for us, more rare than case #2. As a result, I'm thinking more along the lines of a RoundRobin failoverpolicy. For case #1, we'd manually move the RG back to one of the "main" systems at the next convenience; case #2 would populate the N+1 system until it was evenly balanced (as far as numbers of SGs go).

What I like about RR failover is that it keeps our configuration much more stable and hands-off. It auto-balances the RG's in a reasonable manner -- not perfect, but better than our current Priority setting. As long as our DBA's are OK with dealing with the one-off failures and subsequent relocations, we might have a winner.

If anyone else has any more suggestions (or solutions!) or questions about how we end up doing this, please feel free!

-jeff

mikebounds · ‎10-10-2014

You can also use dynamic load so on the N+1 system use "hasys load" to feed in a high value when all the other nodes are available so other nodes will be used because they have less load. i.e you write your own external script which would normally run periodically and feed load into VCS, so when a server goes down within the N nodes, if your script on the N+1 node is running VCS commads to detect what nodes are available, then it can lower the load when a server goes down. You could combine this with triggers so that rather than freezing scripts, the triggers runs hasys -load to lower and higher the load on the N+1 node, so rather than run hasys -load periodically (in cron for example), you would only run hasys load in trigger scripts. One issue in doing this is that sysoffline trigger is run on lowest-numbered system in RUNNING state and you need to run hasys -load locally, but you could hacli (or ssh) for this.

VCS is very flexible - I have been using for 14 years and never found anything it can't do - there is always some work aroound with all the tools that are available.

Mike

jeffschaller · ‎10-14-2014

Mike,

Thanks again for another option. I've marked the latest post as a solution, mainly to thank you for your ideas. We have not yet integrated the new nodes, but your leads give us options to mull over. If I remember, I'll follow up on this post with the solution we decide on. Hopefully this thread will help some others as well!

-jeff

VOX

How to configure a VCS cluster with a roaming spare?