Solved: [VCS 5.0 or under] - failover for a SG if and only...

krisofe · ‎03-29-2012

Hi,

I have a strange historical architecture on "my" SI where there is only a remote filer for both of my VCS nodes.

If I lose my connexion to the filer's @IP from my first node, VCS will switch on the second one.

But when the commun filer'@IP isn't reachable from both of two nodes, VCS doesn't need to switch because ressource will not be available any more.

So I want to make VCS to test the filer's @IP before switching :

If @IP alive from second node then switch else keeping actual status.

Configuration is :

I have a NIC ressource for the filer's@IP with the attribute "NetworkHosts" on both systems(=global) and @IP as vaule

NIC_Filer NetworkHosts global @IP

Any idea pls,

Regards,

Christophe, FR

mikebounds · ‎03-29-2012

Normal rules for VCS is if a resouce fails, it it is critical it fails group over and if it is non-critical it doesn't. If it is critical, even if the service group has nowhere to go then the service group will still offline if the resource fails - you could argue there is no point in offling service group if there is nowhere to fail to, but this is what VCS does.

One way to get round this is as follows:

Make NIC resource non-critical and do not link to any resource and use resfault trigger:

To enable resfault trigger you need to copy resfault sample from /opt/VRTSvcs/bin/sample_trigger to /opt/VRTSvcs/bin/triggers and amend trigger. The trigger will get called for EVERY resource that fails so you first need code to only take steps for NIC resource, the resfault trigger will be passed resource name and system, so logic of code should be something like:

If resource_name matches NIC monitoring filer then

Probe NIC resource on other system(s) , using hares -probe

Sleep to allow probe time to monitor NIC - you will have to experiment how long it takes for monitor to detect filer is down and then etermine state of NIC resource using hares -state. I would do this in a loop something like:

 maxtime=30 (30 is just an example)
state=UP

while time < maxtime
  check state of resource and if down then 
     state=DOWN
    break loop

  sleep 5
  time=time+5
done

If NIC resource is UP on another system(s) then switch using hagrp -switch -any (you will need to get group name using "hares -value res-name -attribute Group"), BUT if NIC resource is down on all systems then do nothing

Mike

View solution in original post

mikebounds · ‎03-29-2012

Normal rules for VCS is if a resouce fails, it it is critical it fails group over and if it is non-critical it doesn't. If it is critical, even if the service group has nowhere to go then the service group will still offline if the resource fails - you could argue there is no point in offling service group if there is nowhere to fail to, but this is what VCS does.

One way to get round this is as follows:

Make NIC resource non-critical and do not link to any resource and use resfault trigger:

To enable resfault trigger you need to copy resfault sample from /opt/VRTSvcs/bin/sample_trigger to /opt/VRTSvcs/bin/triggers and amend trigger. The trigger will get called for EVERY resource that fails so you first need code to only take steps for NIC resource, the resfault trigger will be passed resource name and system, so logic of code should be something like:

If resource_name matches NIC monitoring filer then

Probe NIC resource on other system(s) , using hares -probe

Sleep to allow probe time to monitor NIC - you will have to experiment how long it takes for monitor to detect filer is down and then etermine state of NIC resource using hares -state. I would do this in a loop something like:

 maxtime=30 (30 is just an example)
state=UP

while time < maxtime
  check state of resource and if down then 
     state=DOWN
    break loop

  sleep 5
  time=time+5
done

If NIC resource is UP on another system(s) then switch using hagrp -switch -any (you will need to get group name using "hares -value res-name -attribute Group"), BUT if NIC resource is down on all systems then do nothing

Mike

Venkata_Reddy_C · ‎04-24-2012

If you want to test some condition (in this case check the connetion to filer from target node during service group switch/failover) before you failover the service group to target node, you could use the preonline trigger at the group level by enabling the PreOnline attribute for the service group (PreOnline=1). You can write your logic in the /opt/VRTSvcvs/bin/triggers/preonline for all the checks you want to perform before the group can go online on this node. A sample preonline trigger is provided in the location /opt/VRTSvcs/bin/sample_triggers directory.

For example, if you have three node cluster with systems SysA, SysB and SysC and if the filer IP is not reachable from a node SysA you can call 'hagrp -online <SG> -sys SysB' in the preonline trigger on SysA. This invokes preonline trigger on SysB and checks filer IP reachable from SysB. If yes, you can call 'hagrp -onlone -nopre <SG> -sys SysB'. If no, you can call 'hagrp -online <SG> -sys SysC' which invokes preonline trigger on SysC to check filer IP is reachable from SysC. This can be extended to any no. of nodes in the cluster. If there is no suitable node from which you can reach the filer IP, you can simply exit the preonline script with 0 without calling 'hagrp -online'.

Regards,

Venkat

krisofe · ‎05-22-2012

Hi,

I'm so sorry not wrintten here some feedback because I have to work on others projects, no vcs ones, so I've left the forum since.

When it will be ok for me to return of my annexes works (unfortunately it is a low priority task),

I will return here some feedback.

Thanks a lot,

Christophe, FR who have to help another people on my perimeter in another VCS context, so see U soon I hope

VOX

[VCS 5.0 or under] - failover for a SG if and only if the commun remote NIC is reachable