03-29-2012 01:39 AM
Hi,
I have a strange historical architecture on "my" SI where there is only a remote filer for both of my VCS nodes.
If I lose my connexion to the filer's @IP from my first node, VCS will switch on the second one.
But when the commun filer'@IP isn't reachable from both of two nodes, VCS doesn't need to switch because ressource will not be available any more.
So I want to make VCS to test the filer's @IP before switching :
If @IP alive from second node then switch else keeping actual status.
Configuration is :
I have a NIC ressource for the filer's@IP with the attribute "NetworkHosts" on both systems(=global) and @IP as vaule
NIC_Filer NetworkHosts global @IP
Any idea pls,
Regards,
Christophe, FR
Solved! Go to Solution.
03-29-2012 04:15 AM
Normal rules for VCS is if a resouce fails, it it is critical it fails group over and if it is non-critical it doesn't. If it is critical, even if the service group has nowhere to go then the service group will still offline if the resource fails - you could argue there is no point in offling service group if there is nowhere to fail to, but this is what VCS does.
One way to get round this is as follows:
Make NIC resource non-critical and do not link to any resource and use resfault trigger:
To enable resfault trigger you need to copy resfault sample from /opt/VRTSvcs/bin/sample_trigger to /opt/VRTSvcs/bin/triggers and amend trigger. The trigger will get called for EVERY resource that fails so you first need code to only take steps for NIC resource, the resfault trigger will be passed resource name and system, so logic of code should be something like:
If resource_name matches NIC monitoring filer then
Probe NIC resource on other system(s) , using hares -probe
Sleep to allow probe time to monitor NIC - you will have to experiment how long it takes for monitor to detect filer is down and then etermine state of NIC resource using hares -state. I would do this in a loop something like:
maxtime=30 (30 is just an example) state=UP while time < maxtime check state of resource and if down then state=DOWN break loop sleep 5 time=time+5 done
If NIC resource is UP on another system(s) then switch using hagrp -switch -any (you will need to get group name using "hares -value res-name -attribute Group"), BUT if NIC resource is down on all systems then do nothing
Mike
03-29-2012 04:15 AM
Normal rules for VCS is if a resouce fails, it it is critical it fails group over and if it is non-critical it doesn't. If it is critical, even if the service group has nowhere to go then the service group will still offline if the resource fails - you could argue there is no point in offling service group if there is nowhere to fail to, but this is what VCS does.
One way to get round this is as follows:
Make NIC resource non-critical and do not link to any resource and use resfault trigger:
To enable resfault trigger you need to copy resfault sample from /opt/VRTSvcs/bin/sample_trigger to /opt/VRTSvcs/bin/triggers and amend trigger. The trigger will get called for EVERY resource that fails so you first need code to only take steps for NIC resource, the resfault trigger will be passed resource name and system, so logic of code should be something like:
If resource_name matches NIC monitoring filer then
Probe NIC resource on other system(s) , using hares -probe
Sleep to allow probe time to monitor NIC - you will have to experiment how long it takes for monitor to detect filer is down and then etermine state of NIC resource using hares -state. I would do this in a loop something like:
maxtime=30 (30 is just an example) state=UP while time < maxtime check state of resource and if down then state=DOWN break loop sleep 5 time=time+5 done
If NIC resource is UP on another system(s) then switch using hagrp -switch -any (you will need to get group name using "hares -value res-name -attribute Group"), BUT if NIC resource is down on all systems then do nothing
Mike
04-24-2012 03:12 AM
If you want to test some condition (in this case check the connetion to filer from target node during service group switch/failover) before you failover the service group to target node, you could use the preonline trigger at the group level by enabling the PreOnline attribute for the service group (PreOnline=1). You can write your logic in the /opt/VRTSvcvs/bin/triggers/preonline for all the checks you want to perform before the group can go online on this node. A sample preonline trigger is provided in the location /opt/VRTSvcs/bin/sample_triggers directory.
For example, if you have three node cluster with systems SysA, SysB and SysC and if the filer IP is not reachable from a node SysA you can call 'hagrp -online <SG> -sys SysB' in the preonline trigger on SysA. This invokes preonline trigger on SysB and checks filer IP reachable from SysB. If yes, you can call 'hagrp -onlone -nopre <SG> -sys SysB'. If no, you can call 'hagrp -online <SG> -sys SysC' which invokes preonline trigger on SysC to check filer IP is reachable from SysC. This can be extended to any no. of nodes in the cluster. If there is no suitable node from which you can reach the filer IP, you can simply exit the preonline script with 0 without calling 'hagrp -online'.
Regards,
Venkat
05-22-2012 08:18 AM
Hi,
I'm so sorry not wrintten here some feedback because I have to work on others projects, no vcs ones, so I've left the forum since.
When it will be ok for me to return of my annexes works (unfortunately it is a low priority task),
I will return here some feedback.
Thanks a lot,
Christophe, FR who have to help another people on my perimeter in another VCS context, so see U soon I hope