I have a support case open now, and I've narrowed it down to specific - repeatable behavior - it has to do with the clean script... but the clean script is written correctly.
I've simplified and only dealing with apache now. Scripts are below. I've modfied the clean script to be a little smarter so it wasn't trying to remove pids that were gone and things like that.
If I offline apache with the init script, or by causing a fault (rm pid) veritas complains that the clean script is not happy. If I run the clean script manually i have a return code 0.
Veritas does run the clean script successfully in terms of apache actually being offline, but it doesn't like the return code it's recieving (assuming 9 from the 0x9). If I change the clean script to exit immediately (echo hi exit 0) vcs marks the node as faulted and starts apache on the other node.
I don't know what's wrong with the clean script, and in fact, the original script used to work fine with 5.0.3. I was told by support that 5.1 has changed the Application agent quite a bit which probably attributes to the new behavior. I'm a little suspicious of the pid that seems to be written (changes per run) in the logs below.. not sure if that's causing any issues.
2011/04/01 15:24:57 VCS ERROR V-16-2-13040 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Program(/opt/VRTSvcs/bin/Application/clean) was abnormally terminated with the exit code(0x9).
2011/04/01 15:24:57 VCS ERROR 2011/04/01 15:24:57 VCS ERROR V-16-2-13040 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Program(/opt/VRTSvcs/bin/Application/clean) was abnormally terminated with the exit code(0x9).
2011/04/01 15:24:57 VCS ERROR V-16-2-13069 (linopstfg02.prod.domain.com) Resource(teamforge_app_server) - clean failed.
2011/04/01 15:25:58 VCS INFO V-16-2-13716 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Output of the completed operation (clean)
==============================================
25595
0
==============================================
2011/04/01 15:25:58 VCS ERROR V-16-2-13040 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Program(/opt/VRTSvcs/bin/Application/clean) was abnormally terminated with the exit code(0x9).
2011/04/01 15:26:59 VCS INFO V-16-2-13716 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Output of the completed operation (clean)
==============================================
25642
0
==============================================
2011/04/01 15:26:59 VCS ERROR V-16-2-13040 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Program(/opt/VRTSvcs/bin/Application/clean) was abnormally terminated with the exit code(0x9).
2011/04/01 15:28:00 VCS INFO V-16-2-13716 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Output of the completed operation (clean)
==============================================
25739
0
==============================================
2011/04/01 15:28:00 VCS ERROR V-16-2-13040 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Program(/opt/VRTSvcs/bin/Application/clean) was abnormally terminated with the exit code(0x9).
2011/04/01 15:29:02 VCS INFO V-16-2-13716 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Output of the completed operation (clean)
==============================================
25847
0
==============================================
2011/04/01 15:29:02 VCS ERROR V-16-2-13040 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Program(/opt/VRTSvcs/bin/Application/clean) was abnormally terminated with the exit code(0x9).
2011/04/01 15:30:03 VCS INFO V-16-2-13716 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Output of the completed operation (clean)
==============================================
25910
0
==============================================
2011/04/01 15:30:03 VCS ERROR V-16-2-13040 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Program(/opt/VRTSvcs/bin/Application/clean) was abnormally terminated with the exit code(0x9).
2011/04/01 15:30:12 VCS INFO V-16-1-50135 User vcsadmin fired command: MSG_RES_PROBE teamforge_app_server linopstfg02.prod.domain.com from 10.17.32.130
2011/04/01 15:30:12 VCS INFO V-16-10031-504 (linopstfg02.prod.domain.com) Application:teamforge_app_server:clean:Executed /usr/local/bin/apachekill as user root
2011/04/01 15:30:22 VCS WARNING V-16-10031-542 (linopstfg02.prod.domain.com) Application:teamforge_app_server:clean:PidFile </var/run/httpd.pid> does not exist, process will not be killed
2011/04/01 15:30:23 VCS INFO V-16-2-13716 (linopstfg02.prod.domain.com) Resource(teamforge_app_server): Output of the completed operation (clean)
==============================================
hi <----this is me echoing hi and exiting 0 immediately; basically disabling the clean script from actually doing anything it's supposed to do
==============================================
2011/04/01 15:30:23 VCS INFO V-16-2-13078 (linopstfg02.prod.domain.com) Resource(teamforge_app_server) - clean completed successfully after 6 failed attempts.
2011/04/01 15:30:23 VCS INFO V-16-1-10307 Resource teamforge_app_server (Owner: Unspecified, Group: VM_VCS_Teamforge) is offline on linopstfg02.prod.domain.com (Not initiated by VCS)
2011/04/01 15:30:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vcs_teamforge_vip (Owner: Unspecified, Group: VM_VCS_Teamforge) on System linopstfg02.prod.domain.com
2011/04/01 15:30:24 VCS INFO V-16-1-10305 Resource vcs_teamforge_vip (Owner: Unspecified, Group: VM_VCS_Teamforge) is offline on linopstfg02.prod.domain.com (VCS initiated)
2011/04/01 15:30:24 VCS ERROR V-16-1-10205 Group VM_VCS_Teamforge is faulted on system linopstfg02.prod.domain.com
2011/04/01 15:30:24 VCS NOTICE V-16-1-10446 Group VM_VCS_Teamforge is offline on system linopstfg02.prod.domain.com
2011/04/01 15:30:24 VCS INFO V-16-1-10493 Evaluating linopstfg01.prod.domain.com as potential target node for group VM_VCS_Teamforge
2011/04/01 15:30:24 VCS INFO V-16-1-10493 Evaluating linopstfg02.prod.domain.com as potential target node for group VM_VCS_Teamforge
2011/04/01 15:30:24 VCS INFO V-16-1-50010 Group VM_VCS_Teamforge is online or faulted on system linopstfg02.prod.domain.com
2011/04/01 15:30:24 VCS NOTICE V-16-1-10301 Initiating Online of Resource vcs_teamforge_vip (Owner: Unspecified, Group: VM_VCS_Teamforge) on System linopstfg01.prod.domain.com
2011/04/01 15:30:32 VCS INFO V-16-1-10298 Resource vcs_teamforge_vip (Owner: Unspecified, Group: VM_VCS_Teamforge) is online on linopstfg01.prod.domain.com (VCS initiated)
2011/04/01 15:30:32 VCS NOTICE V-16-1-10301 Initiating Online of Resource teamforge_app_server (Owner: Unspecified, Group: VM_VCS_Teamforge) on System linopstfg01.prod.domain.com
2011/04/01 15:30:32 VCS INFO V-16-10031-504 (linopstfg01.prod.domain.com) Application:teamforge_app_server:online:Executed /usr/local/bin/apachestart as user root
2011/04/01 15:30:33 VCS INFO V-16-2-13716 (linopstfg01.prod.domain.com) Resource(teamforge_app_server): Output of the completed operation (online)
==============================================
Starting httpd: [ OK ]
==============================================
2011/04/01 15:30:33 VCS INFO V-16-1-10298 Resource teamforge_app_server (Owner: Unspecified, Group: VM_VCS_Teamforge) is online on linopstfg01.prod.domain.com (VCS initiated)
2011/04/01 15:30:33 VCS NOTICE V-16-1-10447 Group VM_VCS_Teamforge is online on system linopstfg01.prod.domain.com
2011/04/01 15:30:33 VCS NOTICE V-16-1-10448 Group VM_VCS_Teamforge failed over to system linopstfg01.prod.domain.com