cancel
Showing results for 
Search instead for 
Did you mean: 

system state FAULTED

allaboutunix
Level 6
> Hi,
>
> I am having VCS 6.0 running on solaris 10on a 2 node cluster. All the
> nodes were rebooted as part of scheduled job, during which i got number
> of messages in engine log which i am trying to understand. The sequence
> of events in log are as:
>
> 1) all nodes showed as jeopardy state after boot for a moment. Why ? Is
> it that one of the link was down for a moment just after booting
> 2) After that log says system changed state from RUNNING to FAULTED. I
> have never seen that system goes to FAULTED state, why it went to
> faulted state.
> 3) After this log shows that service groups became autodisabled on all
> these nodes.
> 4) after this, System (hostname) is in Down State - Membership: 0x4a
> 5) VCS:10451:Cleared attribute-'autodisabled' for Group  on node, does
> the autodisabled flag gets cleared on its own. Many times i have faced
> situations where i have cleared the autodisable flag for servicegroup
> manually.
>
> Does somebody know about these error messages & what could be the reason
> behind this, specifically the system going to FAULTED state and service
> groups getting autodisabled.
>
> Thanks
>
1 ACCEPTED SOLUTION

Accepted Solutions

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

Simple, a system will go to faulted state if "had" process on that system is faulted. So GAB membership was formed however port "h" membership couldn't form indicating "had" process failed to start.

You are correct about Jeopardy, it indicates that one of heartbeat was down.

When system went from running to faulted that means "had" process faulted, engine_A.log would be able to tell what happened during that period. There could be many reasons for HAD going faulted.

Because HAD on one node went down, you would have resources which are not probed on one of system & also groups going in autodisabled state, so what you are seeing here is result of what happened above.

As per VCS guide, here is description of autodisabled flag

When VCS does not know the status of a service group on a particular system, it
autodisables the service group on that system. Autodisabling occurs under the
following conditions:
■ When the VCS engine, HAD, is not running on the system.
Under these conditions, all service groups that include the system in their
SystemList attribute are autodisabled. This does not apply to systems that are
powered off.
■ When all resources within the service group are not probed on the system.
Recommended Action: Use the output of the command hagrp -display
service_group to verify the value of the AutoDisabled attribute.
Warning: To bring a group online manually after VCS has autodisabled the group,
make sure that the group is not fully or partially active on any system that has the
AutoDisabled attribute set to 1 by VCS. Specifically, verify that all resources that
may be corrupted by being active on multiple systems are brought down on the
designated systems. Then, clear the AutoDisabled attribute for each system: #
hagrp -autoenable service_group -sys system

 

G

 

View solution in original post

1 REPLY 1

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

Simple, a system will go to faulted state if "had" process on that system is faulted. So GAB membership was formed however port "h" membership couldn't form indicating "had" process failed to start.

You are correct about Jeopardy, it indicates that one of heartbeat was down.

When system went from running to faulted that means "had" process faulted, engine_A.log would be able to tell what happened during that period. There could be many reasons for HAD going faulted.

Because HAD on one node went down, you would have resources which are not probed on one of system & also groups going in autodisabled state, so what you are seeing here is result of what happened above.

As per VCS guide, here is description of autodisabled flag

When VCS does not know the status of a service group on a particular system, it
autodisables the service group on that system. Autodisabling occurs under the
following conditions:
■ When the VCS engine, HAD, is not running on the system.
Under these conditions, all service groups that include the system in their
SystemList attribute are autodisabled. This does not apply to systems that are
powered off.
■ When all resources within the service group are not probed on the system.
Recommended Action: Use the output of the command hagrp -display
service_group to verify the value of the AutoDisabled attribute.
Warning: To bring a group online manually after VCS has autodisabled the group,
make sure that the group is not fully or partially active on any system that has the
AutoDisabled attribute set to 1 by VCS. Specifically, verify that all resources that
may be corrupted by being active on multiple systems are brought down on the
designated systems. Then, clear the AutoDisabled attribute for each system: #
hagrp -autoenable service_group -sys system

 

G

 

View solution in original post