Solved: SQL memory management in active/active configurati...

scogeb · ‎11-07-2012

Hi, I will have 4 nodes in an active/active/active/active configuration. Each node will have 3 SQL instances installed on it. Each node has 256GB RAM. I know I can set the limit each instance can use, but the more memory SQL gets, the better it runs. Ideally, I would set each instance to use 80GB or so. Roughly 1/3 of the node's memory, leaving some for the OS and Veritas. What happens if one of the nodes goes down? Where would the newly failed over instance get it's memory from? Is there a way to manage this? Should I let SQL manage it? I don't want a run-away query on one instance to hog all the node's memory though and affect the other instances on that node. What are my options?

mikebounds · ‎11-08-2012

There is load balancing built in using System Capacity and service group Load. So for example you could have 3 SQL service groups with loads of around 80 to represent the memory they use if they are not quite all equal and then you can define either static or dynamic Capacity. Easiest is static capacity so if your 4 servers have the same memory you could define these as 256 if this is how much memory they have so initially service groups may reside as follows as (you can set startup policy as the normal priority to specify where they startup or select "Load" to let VCS decide):

Sys1, SGs: 70, 80, 90, Avail Cap = 16 (256 - 70 - 80 - 90)

Sys2, SGs: 65, 75, 85, Avail Cap = 31

Sys3 SGs: 70, 75, 80, Avail Cap = 31

Sys4 SGs: 75, 80, 85, Avail Cap = 16

If a system fails then VCS will decide where they go, but note Available Cap can go negative, so if first system failed then you would have something like:

Sys2, SGs: 65, 75, 85, NEW 80, Avail Cap = -49 (31-80)

Sys3 SGs: 70, 75, 80, NEW 90, Avail Cap = -59

Sys4 SGs: 75, 80, 85, NEW 70, Avail Cap = -54

If you want to avoid or minimse negative capacity, then you can use System Limits and Service group Prereqs, but note VCS will never offline or move a service group to make space - if you want to do this you would need to use triggers (like preonline and loadwarning trigger) - or use VCS One product.

System Capacity and service group Load defines one number as VCS can only use one number to determine where to fail to, but with System Limits and Service group Prereqs you can define mulitple Limits (for systems) and Prereqs (for service groups) for Memory, CPU, NumSQLInst etc and these are hard limits which cannot go negative.

You can also use Dynamic Capacity where you have an external program which feeds VCS about available capacity on a server.

For more info on above, see "Service group workload management" section in VCS user guide.

Above is all how you manage VCS, but managing how SQL uses memory is another matter. I would guess if you tell SQL to use 80GB per instance and you have 3 running on a 256GB node, then you won't be able to start another instance that requests 80GB and it will probably fail, or only be able to use a very small amount of available memory. If you can dynamically reduce memory in SQL, then you could do this in a preonline trigger script for a service group that is trying to fail to the system, but even if this is possible it would delay failover while the preonline script tries to claim some memory back.

Therefore I would configure each SQL instance to use about 1/4 of the available memory to allow for one node to fail without having to dynamically reclaim memory. Some SQL instances, as in my example, can have different amounts of memory reserved and so if you match servicegroup load with what you allocate in SQL, then VCS will decide the best place to go and you can enforce this with SG PreReqs or dynamically reallocate memory using a PreOnline trigger if this is possible in SQL.

Mike

View solution in original post

mikebounds · ‎11-08-2012

There is load balancing built in using System Capacity and service group Load. So for example you could have 3 SQL service groups with loads of around 80 to represent the memory they use if they are not quite all equal and then you can define either static or dynamic Capacity. Easiest is static capacity so if your 4 servers have the same memory you could define these as 256 if this is how much memory they have so initially service groups may reside as follows as (you can set startup policy as the normal priority to specify where they startup or select "Load" to let VCS decide):

Sys1, SGs: 70, 80, 90, Avail Cap = 16 (256 - 70 - 80 - 90)

Sys2, SGs: 65, 75, 85, Avail Cap = 31

Sys3 SGs: 70, 75, 80, Avail Cap = 31

Sys4 SGs: 75, 80, 85, Avail Cap = 16

If a system fails then VCS will decide where they go, but note Available Cap can go negative, so if first system failed then you would have something like:

Sys2, SGs: 65, 75, 85, NEW 80, Avail Cap = -49 (31-80)

Sys3 SGs: 70, 75, 80, NEW 90, Avail Cap = -59

Sys4 SGs: 75, 80, 85, NEW 70, Avail Cap = -54

If you want to avoid or minimse negative capacity, then you can use System Limits and Service group Prereqs, but note VCS will never offline or move a service group to make space - if you want to do this you would need to use triggers (like preonline and loadwarning trigger) - or use VCS One product.

System Capacity and service group Load defines one number as VCS can only use one number to determine where to fail to, but with System Limits and Service group Prereqs you can define mulitple Limits (for systems) and Prereqs (for service groups) for Memory, CPU, NumSQLInst etc and these are hard limits which cannot go negative.

You can also use Dynamic Capacity where you have an external program which feeds VCS about available capacity on a server.

For more info on above, see "Service group workload management" section in VCS user guide.

Above is all how you manage VCS, but managing how SQL uses memory is another matter. I would guess if you tell SQL to use 80GB per instance and you have 3 running on a 256GB node, then you won't be able to start another instance that requests 80GB and it will probably fail, or only be able to use a very small amount of available memory. If you can dynamically reduce memory in SQL, then you could do this in a preonline trigger script for a service group that is trying to fail to the system, but even if this is possible it would delay failover while the preonline script tries to claim some memory back.

Therefore I would configure each SQL instance to use about 1/4 of the available memory to allow for one node to fail without having to dynamically reclaim memory. Some SQL instances, as in my example, can have different amounts of memory reserved and so if you match servicegroup load with what you allocate in SQL, then VCS will decide the best place to go and you can enforce this with SG PreReqs or dynamically reallocate memory using a PreOnline trigger if this is possible in SQL.

Mike

scogeb · ‎11-16-2012

Mike, thanks for the post. I think I will either leave enough memory free so the instances that failover to the other nodes won't have issues or do an active/active/active/passive setup. Thanks again!

VOX

SQL memory management in active/active configuration