VCS Warning for Unknown State

Hi,

I just curious why I received a Warning notification for Netlsnr Resource Group when the error is not logged into engine_A.log

I have read the VCS documentation but the only hint I have is this.

Resource state is unknown. Warning VCS cannot identify the state of the
resource.

Can anyone provide better explanation what could have caused VCS to send the warning email?

business continuity

Cluster Server

Clustering

Wally_Heim
14 years ago
Hi Elvis L,

The monitor entry point of all resources have basically 3 return values for the state of a resource. The states are Online, Offline or Unknown. If the Monitor entry point is not able to determine if a resource is Online or Offline then it returns Unknown.

The Unknown warning is just to let you know that VCS was unable to determine the known state of the resource. Given that the state of the resource is unknown by VCS, VCS will not be able to control the resource. In other words, VCS can not online or offline a resource that is in an Unknown state.

Most of the time there is nothing really reported in the Engine log for this. You might find something in the agent specfic log on the node that was having problems probling the resource. But you might not. If you don't, then try increasing the LogDbg settings on that resource to increase logging and wait for the issue to reoccur.

Thanks,

Wally

FYi - There are actually 10 return codes for Online state with increasing levels of confidence in the Online state of a given resource. The 3 return states that I mentioned are for simlicity.

17 Replies

Daniel_Schnack
Level 4
14 years ago
Hi Elvis,

This information is documented in the VCS 5.1 SP2 Agent Developer's Guide (for Windows) which can be downloaded from the Symantec Operations Readiness site, https://sort.symantec.com/documents/doc_details/sfha/5.1%20SP2/Windows/ProductGuides/. The name of the guide (if you search on the page) is "Veritas Cluster Server 5.1 SP2 Agent Developer's Guide ". The information you are looking for is on page 34 (please excuse the formatting as this is a copy-and-paste):

Entry Point                Return Values

Monitor                     Monitor C++ Based Returns ResStateValues:

                              VCSAgResOnline

                              VCSAgResOffline

                              VCSAgResUnknown

                              VCSAgResIntentionalOffline

                                     Script-Based Exit values:

                               99 - Unknown

                               100 - Offline

                               101-110 - Online

                               200 - Intentional Offline

                               Other values - Unknown.

I hope that this answers your question.

Daniel
mikebounds
Level 6
14 years ago
Can you post your Netlsnr reource config from main.cf. If you have setup as default, then all agent is doing is looking for lsnr process, so monitor should not have any problems, and if the CPU spiked severly so system couldn't get process listing, then I would expect a lot of agents to have monitor problems. If you have configured advanced monitoring, then a slow database can cause monitor to timeout. In UNIX, a monitor timeout should always be reported to the engine log and I am surprised that this is not true on Windows also. If the monitor times out 4 times in a row then the resource will fault.

Mike
Elvis_L_
Level 3
14 years ago
Mike,

Netlsnr config as posted below.

Netlsnr OracleLSNR (
Critical = 0
Owner = oracle
Home = "/opt/oracle/product/8.1.7.4.0"
TnsAdmin = "/var/opt/oracle/network/admin"
MonScript = "./bin/Netlsnr/LsnrTest.pl"
)
mikebounds
Level 6
14 years ago
Looking at agent documentation, it appears the Netlsnr agent changed from 5.0 to 5.1 so that it now does detailed monitoring by default (but the Oracle agent still does basic monitoring by default). So the agent is running the script specified by MonScript which runs some command to detect the health of the listener. So this command probably timed out, in which case the error may be logged to NetLsnr_A.log rather than engine_A.log.

Mike
Elvis_L_
Level 3
14 years ago
Mike,

I dare to put my confidence to pinpoint it was caused by the monitoring timeout. But can we assume logger will also bail out to record the event to Netlsnr or Engine A log?
mikebounds
Level 6
14 years ago
If you have no messages in your engine log or Netlsnr log, then unless /var was full then I would consider this to be a bug and I would log a call with Symantec Support. If you look at the code of the Netlsnr monitor script, there are 3 reasons you could get "Resource State unknown" (this is from 5.1 agent code, so it may be different for 4.1)

VCSAG_LOG_MSG ("E","Oracle home directory $Home does not exist",1,$Home);

exit $VCSAG_RES_UNKNOWN;

VCSAG_LOG_MSG ( "E" ,"lsnrctl not found in $Home/bin", 4,$Home);

exit $VCSAG_RES_UNKNOWN; # Resource state is UNKNOWN

VCSAG_LOG_MSG ( "E" , "lsnrctl operation timed out",14);

exit $VCSAG_RES_UNKNOWN;

I am guessing 1 and 2 are unlikey so issue is more than likely 3. Logging the message to agent log (Netlsnr log) should be the first thing happens - then Notifier daemon gets message from queue and sends notication depending on the severity.

Mike
Elvis_L_
Level 3
14 years ago
I believe this is perl equivalent of the same script for VCS 5.1

sub catch_alrm {
if ( $AgentDebug == 1) {
VCSAG_LOG_MSG ( "E" , "lsnrctl operation timed out",14);
}
exit (99); # Resource state is UNKNOWN
}
#
$ret =netlsnrlib::check_oracle_client ($Home,$LSNRMGR );

if ($ret) {
VCSAG_LOG_MSG ( "E" ,"lsnrctl not found in $Home/bin", 4,$Home);
exit 99; # Resource state is UNKNOWN
}

I would no doubt log the case to Symantec as the VCS never exhibit this issue before in our environment, rather than beating around the bush guessing it was timeout or monitoring lapse.

Forum Discussion

VCS Warning for Unknown State

17 Replies

Related Content

resource state UNKNOWN

Unknown Job State

Unknown SLP name

URGENT: Restore unknown NetBackup tape

OpsCenter State deleted or unknown

Recent Discussions

Configure two Mount type resources of nfs FStype attribute using the same share

order

key registration and reservation

Verifying that primary and dr clusters replication is synced

vcs can create logical nic