Solved: Service group dependency

Karel_Bollen · ‎11-13-2012

I have a 3 node cluster with VCS 6.0.1 and RHEL6

On the first two nodes by default different production applications are active and on the third node the acceptance applications are active.

The acceptance applications should be brought Offline, prior to the fail-over of the production applications to the tird node.

The acceptance applications should never fail-over.

I've solved this with a preonline trigger script, but I wonder if there is not another, easier, way of doing this making it less script depend.

Karel

mikebounds · ‎11-13-2012

You need an offline local dependency so make Both production service group parent of child acceptance service group so you have:

prod1_grp depends on accept_grp offline local

prod2_grp depends on accept_grp offline local

This means accept_grp must be offline on node 3 to be able to online or switch prod1_grp or prod2_grp to node3, so you will get a 'group dependency violation error' if you try to do this as the expectation is that if you manually switch or online prod1_grp or prod2_grp on node3, that you manually offline accept_grp first, BUT, if prod1_grp or prod2_grp fault and tries to switch to node3, then the service group dependency means VCS will automatically offlines accept_grp so that prod grp can online on node3 and in this instance if accept_grp can only run on node3, it will stay down.

Mike

View solution in original post

g_lee · ‎11-13-2012

Assuming your nodes are node1, node2, node3

When you say the acceptance application should never failover, does this mean this can only run on node3 (as opposed to the 2 prod applications which can run on any of the three nodes)?

From what you describe it sounds like online remote soft/online remote firm may be the best place to start (online remote firm will restart the parent if the child faults, so it will depend how you want the groups to behave) - where the acceptance app group is the parent, and the prod app group(s) are the child(ren).

See the following from the VCS 6.0.1 Administrator's Guide for more details/limitations, etc.

The role of service group dependencies:
https://sort.symantec.com/public/documents/sfha/6.0.1/linux/productguides/html/vcs_admin/ch12.htm

About failover parent / failover child:
https://sort.symantec.com/public/documents/sfha/6.0.1/linux/productguides/html/vcs_admin/ch12s02s01.htm

Note: If you had one prod group, offline local may also have been an option; however as you have 2 groups this cannot be used as you can't have a parent where multiple child service groups are offline local - see below for more details:

Dependencies not supported for multiple child service groups
https://sort.symantec.com/public/documents/sfha/6.0.1/linux/productguides/html/vcs_admin/ch12s05s02.htm

Karel_Bollen · ‎11-13-2012

Acceptance can only be active on node3.

Production applications can run on all 3 nodes, but in case of node3, only if the corresponding acceptance application is brought Offline first.

With clusters I've worked in the past, you could give a value to a kind of LicenseToKill attribute in an application which Offlined automatically applications with the same value in an AutoBreak attribute, prior to go Online on that node.

I've tried online remote (soft / hard) but I get an 'group dependency violation error' in case I switch the production application to the node where acceptance is Online.

Karel

mikebounds · ‎11-13-2012

You need an offline local dependency so make Both production service group parent of child acceptance service group so you have:

prod1_grp depends on accept_grp offline local

prod2_grp depends on accept_grp offline local

This means accept_grp must be offline on node 3 to be able to online or switch prod1_grp or prod2_grp to node3, so you will get a 'group dependency violation error' if you try to do this as the expectation is that if you manually switch or online prod1_grp or prod2_grp on node3, that you manually offline accept_grp first, BUT, if prod1_grp or prod2_grp fault and tries to switch to node3, then the service group dependency means VCS will automatically offlines accept_grp so that prod grp can online on node3 and in this instance if accept_grp can only run on node3, it will stay down.

Mike

mikebounds · ‎11-13-2012

I am guessing you may have:
prod1_grp - SystemList = node1, node3
prod2_grp - SystemList = node2, node3
accept_grp - SystemList = node3

However, this is quite restrictive and I would configure all 3 service groups to run on all 3 nodes - i.e all service groups have SystemList = node1, node2, node3 and configure offline local dependency as above and configure Limits = {NumProd=1} on each system and Prerequisites = {NumProd=1}

WIth the offline local dependency configured, this will prevent conflictions between prod and acceptance and if for example prod1_grp faults as oppose to node1 faults, then accept will be switched to node1 rather than leaving it offline, or if node1 does fail, then when node1 comes back up then accept_grp would online on node1. Now with Prod1_grp running on node3, then if node2 failed, then Prod2_grp would fail to node1, which would offline accept_grp and leave it down (as there is nowhere else for accept_grp to go) - this is because you have set a Limit of NumProd=1 on systems and the group has used this limit on node3. If you want node3 to be able to run both Prod groups, then you can set Limits = {NumProd=2} and you can set this value to 1 or 2 on node1 or node2 depending on what you require.

You may not want if this flexible, but you can configure somewhere in between, so you could just use Prereqs and Limits to prevent both prod groups failing to node3.

For more info on above, see "Service group workload management" section in VCS user guide where you may also want to use Load based failover (you may also want to look at https://www-secure.symantec.com/connect/forums/sql-memory-management-activeactive-configuration for an example of this)

Mike

g_lee · ‎11-14-2012

Karel,

As Mike points out, it is expected that you get the group dependency violation error if you manually try to switch the groups - the dependency will offline the acceptance group when the prod group faults and tries to failover (this is different to running hagrp -switch or hagrp -online manually).

To test you would need to manually trigger a fault in the one of prod groups, and see that it takes the accept group offline before bringing that prod group up on node3.

As mentioned in the earlier post, if you only had one prod group you could use offline local; as you have 2 prod groups you cannot make them both offline local as this is a known limitation (per 3rd link, "A configuration in which multiple child service groups are offline local" is not supported); so it may be worthwhile looking into the suggestions in Mike's second post to see if this meets your requirements (using Limits and Prerequisites in combination with online remote <soft|firm> dependencies).

Karel_Bollen · ‎11-14-2012

Mike, g_lee,

Thanks for the giving the solution for my problem. I was indeed always trying a manual switch rather then simulating an error.

The offline local dependency behaves as you described when an error occurs.

I'll try Limits and Prerequisites as the final configuration will have at least 4 nodes with 10 prod applications with 10 corresponding acceptance application on one node.

Karel

mikebounds · ‎11-15-2012

If you have more service groups than nodes, then this is a good candiate for using Load based failover using System Capacity and Servicegroup Load - these are soft limits (load can go negative) which are used to determine where to fail to which can be used in conjunction with Limits and Prerequisites which are hard limits (limit cannot go negative so can prevent a service group onlining). See https://www-secure.symantec.com/connect/forums/sql-memory-management-activeactive-configuration#comm... for more info.

Mike

VOX

Service group dependency