Forum Discussion

Elvis_L_'s avatar
Elvis_L_
Level 3
14 years ago

VCS Warning for Unknown State

Hi,

I just curious why I received a Warning notification for Netlsnr Resource Group when the error is not logged into engine_A.log

I have read the VCS documentation but the only hint I have is this.

Resource state is unknown. Warning VCS cannot identify the state of the
resource.

Can anyone provide better explanation what could have caused VCS to send the warning email?

  • Hi Elvis L,

    The monitor entry point of all resources have basically 3 return values for the state of a resource.  The states are Online, Offline or Unknown.  If the Monitor entry point is not able to determine if a resource is Online or Offline then it returns Unknown.

    The Unknown warning is just to let you know that VCS was unable to determine the known state of the resource.  Given that the state of the resource is unknown by VCS, VCS will not be able to control the resource.  In other words, VCS can not online or offline a resource that is in an Unknown state.

    Most of the time there is nothing really reported in the Engine log for this.  You might find something in the agent specfic log on the node that was having problems probling the resource.  But you might not.  If you don't, then try increasing the LogDbg settings on that resource to increase logging and wait for the issue to reoccur.

    Thanks,

    Wally

     

    FYi - There are actually 10 return codes for Online state with increasing levels of confidence in the Online state of a given resource.  The 3 return states that I mentioned are for simlicity.

  • Hi Elvis,

    This information is documented in the VCS 5.1 SP2 Agent Developer's Guide (for Windows) which can be downloaded from the Symantec Operations Readiness site, https://sort.symantec.com/documents/doc_details/sfha/5.1%20SP2/Windows/ProductGuides/. The name of the guide (if you search on the page) is "Veritas Cluster Server 5.1 SP2 Agent Developer's Guide ". The information you are looking for is on page 34 (please excuse the formatting as this is a copy-and-paste):

    Entry Point                Return Values

    Monitor                     Monitor C++ Based Returns ResStateValues:

    •                               VCSAgResOnline
    •                               VCSAgResOffline
    •                               VCSAgResUnknown
    •                               VCSAgResIntentionalOffline

                                         Script-Based Exit values:

    •                                99 - Unknown
    •                                100 - Offline
    •                                101-110 - Online
    •                                200 - Intentional Offline
    •                                Other values - Unknown.

    I hope that this answers your question.

    Daniel

     

  • Can you post your Netlsnr reource config from main.cf.  If you have setup as default, then all agent is doing is looking for lsnr process, so monitor should not have any problems, and if the CPU spiked severly so system couldn't get process listing, then I would expect a lot of agents to have monitor problems.  If you have configured advanced monitoring, then a slow database can cause monitor to timeout.  In UNIX, a monitor timeout should always be reported to the engine log and I am surprised that this is not true on Windows also.  If the monitor times out 4 times in a row then the resource will fault.

     

    Mike

  • Mike,

    Netlsnr config as posted below.

    Netlsnr OracleLSNR (
    Critical = 0
    Owner = oracle
    Home = "/opt/oracle/product/8.1.7.4.0"
    TnsAdmin = "/var/opt/oracle/network/admin"
    MonScript = "./bin/Netlsnr/LsnrTest.pl"
    )

  • Looking at agent documentation, it appears the Netlsnr agent changed from 5.0 to 5.1 so that it now does detailed monitoring by default (but the Oracle agent still does basic monitoring by default).  So the agent is running the script specified by MonScript which runs some command to detect the health of the listener.  So this command probably timed out, in which case the error may be logged to NetLsnr_A.log rather than engine_A.log.

    Mike

  • Mike,

    I dare to put my confidence to pinpoint it was caused by the monitoring timeout. But can we assume logger will also bail out to record the event to Netlsnr or Engine A log?

  • If you have no messages in your engine log or Netlsnr log, then unless /var was full then I would consider this to be a bug and I would log a call with Symantec Support.  If you look at the code of the Netlsnr monitor script, there are 3 reasons you could get "Resource State unknown" (this is from 5.1 agent code, so it may be different for 4.1)

    1. VCSAG_LOG_MSG ("E","Oracle home directory $Home does not exist",1,$Home);
      exit $VCSAG_RES_UNKNOWN;
       
    2. VCSAG_LOG_MSG ( "E" ,"lsnrctl not found in $Home/bin", 4,$Home);
      exit $VCSAG_RES_UNKNOWN;               # Resource state is UNKNOWN
       
    3. VCSAG_LOG_MSG ( "E" , "lsnrctl operation timed out",14);
      exit $VCSAG_RES_UNKNOWN;

     

    I am guessing 1 and 2 are unlikey so issue is more than likely 3.  Logging the message to agent log (Netlsnr log) should be the first thing happens - then Notifier daemon gets message from queue and sends notication depending on the severity.
     
    Mike
     
     

  • I believe this is perl equivalent of the same script for VCS 5.1

    sub catch_alrm {
    if ( $AgentDebug == 1) {
    VCSAG_LOG_MSG ( "E" , "lsnrctl operation timed out",14);
    }
    exit (99); # Resource state is UNKNOWN
    }
    #
    $ret =netlsnrlib::check_oracle_client ($Home,$LSNRMGR );

    if ($ret) {
    VCSAG_LOG_MSG ( "E" ,"lsnrctl not found in $Home/bin", 4,$Home);
    exit 99; # Resource state is UNKNOWN
    }

    I would no doubt log the case to Symantec as the VCS never exhibit this issue before in our environment, rather than beating around the bush guessing it was timeout or monitoring lapse.