Forum Discussion

Sundar_Rajan's avatar
16 years ago

unexpected offline of a resource is not logged as FAULTED in the engine log

 
Here is a simpletest .
I have a fileonoff resource . i delete the file .The resource becomes faulted.
In the enginelog there is no mention of the resource being faulted.and the clean action is taken

This does not matter if the resource is a critical or not also the type of the resource.
this happens in 4.0, 4.1 and 5.0 as well

but when the DBG_TRACE is added it displays as RESOURCE FAULTED.

Steps to reproduce the issue 
1. create a sg with one fileonoff resource
2. online the SG .
3. rm the file configured.
check the engine log

ass tags halog -addtags DBG_TRACE

Now perform the same operation you can see the difference.

Noticed that only when "monitor times out" the faulted message is in the engine log.



From the user's guide it is not that clear whether it will hog the FAULTED message of not:

VCS considers a resource faulted in the following situations:
■ When the resource state changes unexpectedly. For example, an online
resource going offline. <<<< 

■ When a required state change does not occur. For example, a resource failing
to go online or offline when commanded to do so.
In many situations, VCS agents take predefined actions to correct the issue
before reporting resource failure to the engine. For example, the agent may try
to bring a resource online several times before declaring a fault.
When a resource faults, VCS takes automated actions to “clean up the faulted
resource. The Clean function makes sure the resource is completely shut down
before bringing it online on another node. This prevents concurrency violations.
When a resource faults, VCS takes all resources dependent on the faulted
resource offline. The fault is thus propagated in the service group



  • Hello,

    Just to clarify, VCS agent will surely take all the actions before declaring it faulted....

    So if file is deleted.... Agent will detect that something happened outside to VCS (here you should see message in Log like, "resource become offline unexpectedly on its own, followed by resource is offline (not initiated by VCS) ), I wouldn't expect VCS to declare the fault until next step mentioned below is completed...

    As soon VCS detects it, it should first complete 4 monitor cycles (by default) & then call for a clean action... If 2 attempts of clean also fails, resource should be declared faulted...

    Do you say that even after clean is called, agent is not faulting the resource ?

    Gaurav
  • THanks Gaurav for your reply.
    I could not copy the entire log.
    This is what happens.

    1. Agent detects that the resource went offile unexpectedly.
    2. Then it calls the clean action.

    when we do a hares -display it shows the resource state as "FAULTED"
    and the SG state goes to "ONLINE|PARTIAL"

    What is not happening is the the message " RESOURCE FAULTED" is not logged 
    in the engine log. It appears what the "halog -addtags DBG_TRACE" is set.

    Every things works as designed except the logging does not happen at default log level
    when the resource goes from ONLINE to UNEXPECTED OFFLINE. Th
  • [root@localhost ~]# halog -info
    Log on sundar:
    path = /var/VRTSvcs/log/engine_A.log
    maxsize = 33554432 bytes
    tags =
    flushtags =

    [root@localhost ~]# cat /etc/VRTSvcs/conf/config/main.cf
    include "vcsApacheTypes.cf"
    include "types.cf"

    cluster sun (
    UserNames = { admin = gNOgNInKOjOOmWOiNL }
    Administrators = { admin }
    CounterInterval = 5
    )

    system sundar (
    )

    group testgrp (
    SystemList = { sundar = 0 }
    AutoStartList = { sundar }
    )

    FileOnOff fileon (
    Critical = 0
    PathName = "/tmp/file1"
    )



    // resource dependency tree
    //
    // group testgrp
    // {
    // FileOnOff fileon
    // }

    2009/06/12 18:22:22 VCS NOTICE V-16-1-10446 Group testgrp is offline on system sundar
    2009/06/12 18:22:22 VCS NOTICE V-16-1-10301 Initiating Online of Resource fileon (Owner: unknown, Group: testgrp) on System sundar
    2009/06/12 18:22:22 VCS INFO V-16-1-10298 Resource fileon (Owner: unknown, Group: testgrp) is online on sundar (VCS initiated)
    2009/06/12 18:22:22 VCS NOTICE V-16-1-10447 Group testgrp is online on system sundar
    2009/06/12 18:22:24 VCS INFO V-16-6-15004 (sundar) hatrigger:Failed to send trigger for postoffline; script doesn't exist
    #########START unexpected offline #####################################

    2009/06/12 18:24:23 VCS ERROR V-16-2-13067 (sundar) Agent is calling clean for resource(fileon) because the resource became OFFLINE unexpectedly, on its own.
    2009/06/12 18:24:23 VCS INFO V-16-2-13068 (sundar) Resource(fileon) - clean completed successfully.
    2009/06/12 18:24:23 VCS INFO V-16-1-10307 Resource fileon (Owner: unknown, Group: testgrp) is offline on sundar (Not initiated by VCS)
    2009/06/12 18:24:23 VCS ERROR V-16-1-10212 TargetCount dropped below zero; setting to zero
    2009/06/12 18:24:23 VCS NOTICE V-16-1-10446 Group testgrp is offline on system sundar
    2009/06/12 18:24:23 VCS INFO V-16-6-15004 (sundar) hatrigger:Failed to send trigger for resfault; script doesn't exist
    2009/06/12 18:24:23 VCS INFO V-16-6-15004 (sundar) hatrigger:Failed to send trigger for postoffline; script doesn't exist

    [root@localhost tmp]# hares -display fileon
    #Resource Attribute System Value
    fileon Group global testgrp
    fileon Type global FileOnOff
    fileon AutoStart global 1
    fileon Critical global 0
    fileon Enabled global 1
    fileon LastOnline global sundar
    fileon MonitorOnly global 0
    fileon ResourceOwner global unknown
    fileon TriggerEvent global 0
    fileon ArgListValues sundar /tmp/file1
    fileon ConfidenceLevel sundar 0
    fileon Flags sundar
    fileon IState sundar not waiting
    fileon Probed sundar 1
    fileon Start sundar 1
    fileon State sundar FAULTED  <<<<<<<<<<<<<<<<<<<<<<<<<<<
    fileon ComputeStats global 0
    fileon PathName global /tmp/file1
    fileon ResourceInfo global State Stale Msg TS
    fileon MonitorTimeStats sundar Avg 0 TS
    [root@localhost tmp]#

    After adding the tags:
    [root@localhost tmp]# hares -clear fileon
    [root@localhost tmp]# halog -addtags DBG_TRACE
    [root@localhost tmp]# hagrp -online testgrp -any
    VCS NOTICE V-16-1-50735 Attempting to online group on system sundar
    [root@localhost tmp]#

    ############## After adding the tag DBG_TRACE #################################

    2009/06/12 18:30:21 VCS ERROR V-16-2-13067 (sundar) Agent is calling clean for resource(fileon) because the resource became OFFLINE unexpectedly, on its own.
    2009/06/12 18:30:21 VCS INFO V-16-2-13068 (sundar) Resource(fileon) - clean completed successfully.
    2009/06/12 18:30:21 VCS INFO V-16-1-10307 Resource fileon (Owner: unknown, Group: testgrp) is offline on sundar (Not initiated by VCS)
    2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 *** RESOURCE FAULTED (unexpected offline): fileon (node: sundar)
    Resource.C:perform_is_offline[7383]
    2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Trigger sent on node sundar; '"/opt/VRTSvcs/bin/hatrigger" -resfault 0 sundar fileon ONLINE'
    System.C:invoke_trigger[6631]
    2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Received message 6=Resource has faulted in state 11
    Note.C:fill_notifier_trap[1163]
    2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 fileon::state transition from ONLINE to FAULTED

    Resource.C:set_local_state[5666]
    2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Decrementing ActiveCount (prevval=1) by 1 for resource fileon on node sundar
    Resource.C:set_local_state[5724]
    2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Modifying CurrentCount (prevval=1) by -1 for testgrp
    Group.C:update_notify[11244]
    2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Modifying TargetCount (prevval=1) by -1 for testgrp
    Group.C:update_notify[11244]
    2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Trigger sent on node sundar; '"/opt/VRTSvcs/bin/hatrigger" -postoffline 0 sundar testgrp'
    System.C:invoke_trigger[6631]
    2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Received message 10=Service group is offline in state 11
    Note.C:fill_notifier_trap[1163]
    2009/06/12 18:30:21 VCS NOTICE V-16-1-10446 Group testgrp is offline on system sundar
    2009/06/12 18:30:21 VCS INFO V-16-6-15004 (sundar) hatrigger:Failed to send trigger for resfault; script doesn't exist
    2009/06/12 18:30:21 VCS INFO V-16-6-15004 (sundar) hatrigger:Failed to send trigger for postoffline; script doesn't exist


    As you can see from the above it looks like a bug in reporting the state of the resource in the engine_A.log.

    -Sundar
  • Hi Sundar,

    Well looks to be a bug.... can be reported to Technical Support & to raise a case....

    I don't see any obvious reason for reporting it unexpected offline..... Just a very raw guess, can you try creating a file somewhere else in any other directory (not /tmp)... though it shoudn't make a difference... however /tmp directory in solaris has sticky bit with it..... if that is concerning agent somewhere....

    Gaurav
  • Thanks Gaurav. It does not matter with the type of the agent you use the behaviour is the same.

    Thanks for your time.