VCS can not only react to node failure, but can act on application failure too, so can take action on Oracle database or listener (or any other process) failure. Configurable by resouce if an application component fails, VCS can restart process, fail to other node or combinations of these 2. So for instance most DBAs have the Listener configured to restart 1 - 3 times and if this does not work, then failover whole group (database, listener, storage etc) to the other node and if database fails then don't restart, just failover straight away. In terms of detecting database failure, from 5.1SP1, you have 3 methods:
- Check processes (pmon etc)
- Use Oracle Heathcheck API
- Use an SQL scripts (an example is provided which I think checks you can write a row to a table, but you can use your own if you want).
In terms of adding Oracle to VCS, there are at least 3 options:
- Add as an Oracle resource per instance
- Add as an Aplication resource per instance
- Add as a single application resource (per service group with independent storage) controlling many Oracle instances
If you don't want to change VCS, when you add/remove/change Oralce instance name then option 3 should acheive this. The Application resource has attributes:
StartProgram
StopProgram
MonitorProcesses (list of processes to monitor) or/and MonitorProgram or/and PIDfiles
Optionally you can set:
CleanProgram: script to forcibly stop the application
User: if not root
From 5.1SP1 you also have "UseSUDash" and EnvFile attributes.
I would write you script something like:
oratab_path=$2
Read oratab file using passed $2
case "$1" in
start)
Start all databases in oratab
;;
stop)
Stop all databases in oratab
;;
monitor)
Check all database in oratab are running
if databases that should be running are down
then
exit 100
else
exit 110
fi ;;
*)
echo "Usage: $0 {start|stop|monitor} oratab_path
exit 1
;;
esac
Then configure in VCS (if script above is called /opt/VRTSvcs/bin/oragrp):
Application OraGrp (
StartProgram = /opt/VRTSvcs/bin/oragrp start /etc/oratab
StopProgram = /opt/VRTSvcs/bin/oragrp stop /etc/oratab
MonitorProgram = /opt/VRTSvcs/bin/oragrp monitor /etc/oratabb
)
Note the MonitorProgram can report offline (return code 100) if any one database is down, or if you have critical and non-critical databases then it could only report offline if one of the critical databases is down and hence only failover if a critical database is down (or in this instance you could have 2 application resources, one representing critical and the other for non-critical databases). Or if you want to completly ignore state of Oracle databases you could have Start/Stop create/remove a lockfile and MonitorProgram just checks lockfile exists.
Another alternative is to create a script that create/removes an Oracle resource and you can set-up a VCS oracle user in VCS so DBAs can run this script. Script could ask them for Sid, owner, oracle home and listener name and then script could create appropiate Oracle and listener resources. This is a very trivial script to write.
Mike