Kevin_Helmut
19 years agoLevel 2
The hastatus shows OFFLINE|FAULTED
Hi Folks,
We recently installed VCS 4.1 on a two-node cluster ( mars / venus ). We implemented
clean monitor offline online scripts in PERL
So far so good. We created the Service Group (APP-RG) and then the following Resource ( IP, DiskGroup, Volume, MountPoints ), and also another Application Resource whose type looks like
# cat AppSrvTypes.cf
type APPSrv (
static int MonitorInterval = 180
static int MonitorTimeout = 180
static int OnlineRetryLimit = 1
static int OnlineWaitLimit = 1
static int RestartLimit = 2
static str ArgList[] = { State, InstanceName, LogHostName, PrtStatus, DebugMode }
str InstanceName
str LogHostName
str PrtStatus
str DebugMode
)
Then I created the required dependencies. Now over to testing:
Very Simple Test:
Fail over resource APP-RG from mars to venus and vice-versa
./hagrp -switch APP-RG -to mars
./hagrp -switch APP-RG -to venus
Every thing works fine, I was able to fail over, and was able to verify this via the ./hagui and also ./hastatus.
Second Test: ( Force Fail - Over )
Problem arises: As you can see ( from Above ) our APPSrvtypes.cf has
static int RestartLimit = 2. So to force fail over, we have to kill a process ( that is being monitored more than twice ?? )
So we tried to induce fail over by killing one of the process ( that is being monitored ) more than twice. The fail over is triggered :). But :( the sad thing is that, when I check the ./hastatus -sum it shows that the node on which the (process was killed more than twice ), shows the state as OFFLINE|FAULTED.
Example, the Service Group is currently on mars. and we tried forcing the fail over, by killing the process on mars twice. This results in fail over to node venus. Even though the Service Group is ONLINE on venus, the status of the Service Group on mars is OFFLINE|FAULTED
Please help me.. as we're planning on upgrading it to VCS 5.0 eventually.
Are there any other parameters that need to be set ??. Or is our application busted
# /opt/VRTSvcs/bin/hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A mars RUNNING 0
A venus RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService mars Y N OFFLINE
B ClusterService venus Y N ONLINE
B APP-RG mars Y N OFFLINE|FAULTED
B APP-RG venus Y N ONLINE
-- RESOURCES FAILED
-- Group Type Resource System
C APP-RG AppSrv application_server_resource mars
Thanks Very Much In Adv
-Message was edited by:
DURGA TIRUNAGARI
We recently installed VCS 4.1 on a two-node cluster ( mars / venus ). We implemented
clean monitor offline online scripts in PERL
So far so good. We created the Service Group (APP-RG) and then the following Resource ( IP, DiskGroup, Volume, MountPoints ), and also another Application Resource whose type looks like
# cat AppSrvTypes.cf
type APPSrv (
static int MonitorInterval = 180
static int MonitorTimeout = 180
static int OnlineRetryLimit = 1
static int OnlineWaitLimit = 1
static int RestartLimit = 2
static str ArgList[] = { State, InstanceName, LogHostName, PrtStatus, DebugMode }
str InstanceName
str LogHostName
str PrtStatus
str DebugMode
)
Then I created the required dependencies. Now over to testing:
Very Simple Test:
Fail over resource APP-RG from mars to venus and vice-versa
./hagrp -switch APP-RG -to mars
./hagrp -switch APP-RG -to venus
Every thing works fine, I was able to fail over, and was able to verify this via the ./hagui and also ./hastatus.
Second Test: ( Force Fail - Over )
Problem arises: As you can see ( from Above ) our APPSrvtypes.cf has
static int RestartLimit = 2. So to force fail over, we have to kill a process ( that is being monitored more than twice ?? )
So we tried to induce fail over by killing one of the process ( that is being monitored ) more than twice. The fail over is triggered :). But :( the sad thing is that, when I check the ./hastatus -sum it shows that the node on which the (process was killed more than twice ), shows the state as OFFLINE|FAULTED.
Example, the Service Group is currently on mars. and we tried forcing the fail over, by killing the process on mars twice. This results in fail over to node venus. Even though the Service Group is ONLINE on venus, the status of the Service Group on mars is OFFLINE|FAULTED
Please help me.. as we're planning on upgrading it to VCS 5.0 eventually.
Are there any other parameters that need to be set ??. Or is our application busted
# /opt/VRTSvcs/bin/hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A mars RUNNING 0
A venus RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService mars Y N OFFLINE
B ClusterService venus Y N ONLINE
B APP-RG mars Y N OFFLINE|FAULTED
B APP-RG venus Y N ONLINE
-- RESOURCES FAILED
-- Group Type Resource System
C APP-RG AppSrv application_server_resource mars
Thanks Very Much In Adv
-Message was edited by:
DURGA TIRUNAGARI