cancel
Showing results for 
Search instead for 
Did you mean: 

Veritas Cluster Resource Failures

london_cluster
Not applicable

Hi All,

I am new to this forum. really could do with some help from you out there.  We have Veritas Cluster running on 2 database server (solaris 8). Over the past 2 weeks, I have seen a few reource failures, can someone exaplin more about why these may be happening.  I have checked Veritas logs, got this one this morning,

 

TAG_E 2011/04/30 00:54:41 (uk-crmdbs002) VCS:15002:hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/nfs_restart CRMdbs    successfully
TAG_E 2011/04/30 00:54:41 (uk-crmdbs002) VCS:15002:hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postonline uk-crmdbs002 CRMdbs   successfully
TAG_D 2011/05/03 01:01:38 (uk-crmdbs001) VCS:13067:Agent is calling clean for resource(VCSweb) because the resource became OFFLINE unexpectedly, on its own.
TAG_E 2011/05/03 01:01:51 (uk-crmdbs001) VCS:150023:VRTSWebApp:VCSweb:clean:Output of completed operation - 'Context "/vcs" removed.'
TAG_D 2011/05/03 01:01:51 (uk-crmdbs001) VCS:13068:Resource(VCSweb) - clean completed successfully.
TAG_E 2011/05/03 01:01:52 VCS:10307:Resource VCSweb (Owner: unknown, Group: ClusterService) is offline on uk-crmdbs001
        (Not initiated by VCS.)
TAG_E 2011/05/03 10:40:30 VCS:50135:User root fired command: hares -clear VCSweb  from 127.0.0.1
TAG_E 2011/05/03 10:40:30 VCS:10307:Resource VCSweb (Owner: unknown, Group: ClusterService) is offline on uk-crmdbs001


I have cleared the fault by running

/opt/VRTS/bin/hares -clear VCSweb uk-crmdbs001

I now have hastatus -summary like this:

-- SYSTEM STATE
-- System               State                Frozen

A  uk-crmdbs001         RUNNING              0
A  uk-crmdbs002         RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  CRMdbs          uk-crmdbs001         Y          N               OFFLINE
B  CRMdbs          uk-crmdbs002         Y          N               ONLINE
B  CRMnic          uk-crmdbs001         Y          N               ONLINE
B  CRMnic          uk-crmdbs002         Y          N               ONLINE
B  ClusterService  uk-crmdbs001         Y          N               PARTIAL
B  ClusterService  uk-crmdbs002         Y          N               OFFLINE

Can someone please explain why its not saying PARTIAL and also why these reource are failures, the listener resource failed a few times last week. 

 

Thanks

Amreek

5 REPLIES 5

mikebounds
Level 6
Partner Accredited

The VRTSweb is not an important resource - it is the backend for the VCS Web interface which is rarely used ( has no effect on using Java interface) .  The VCS web interface is not supported in 5.1 so if you have 5.1, that could be why it is failing.  Support for web interface started to be removed in 5.0MP3 so that if you installed 5.0MP3 from scratch, the VRTSweb component would not be installed and was only supported if you upgraded from a previous version.  The logs are telling you the process is dying and if you are seeing Listener resource fauting, then this is a completely different issue and this would suggest there is a problem with the Oracle Listener or VCS is incorrectly determining listener resource is down when it is not.

When you clear resource then the resource remains down so this means for the resources in the ClusterService service group, at least one resource is offline and at least one other resource is online so the group reports as in a partial state.  

If you don't use the web interface then I would delete this resource.

Mike

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

If OS is Solaris 8, I guess the VCS version will be just as old....

Please check version with one of these commands and let us know:

hasys -display |grep -i version
had -version
haclus -value EngineVersion

This error says that someone/something other than VCS killed the process:

 
the resource became OFFLINE unexpectedly, on its own.
....
(Not initiated by VCS.)

 

VCSweb  was used in older versions to enable a Web interface to monitor/manage clusters in addition (or as alternative) to the Java Console. The resource does not seem to be configured as Critical, meaning the rest of the ClusterService Service Group was not taken offline and failed over to the second node, leaving the SG in PARTIAL state.

B__Havey
Level 3
Partner Accredited

After you clear a fault, you must online the resource using the "hares -online VRTSWeb -sys uk-crmdbs001" command.

Gaurav_S
Moderator
Moderator
   VIP    Certified

I think you would have fair idea about VRTSweb resource (which you be VCSweb in your case) & the probably causes of fault... few things which I may want to add:

-- VRTSweb is a web server provided by Veritas for WebGUI (not java GUI) as explained by all friends above .. if you do a "ps -ef |grep -i VRTSweb" , you should be able to monitor the process in the process tree. If the resource is getting faulted again & again, have a check on process to see if process is getting killed. If this is the case, you may need to dig more on your server to find what is causing process  kill.

 

--Coming to Listener fault .... can you paste the logs around the time listener faults ?

-- Have you created any dependency of any groups ?  (paste "hagrp -dep" output.. )

 

G

ktandel
Level 4
Partner Accredited Certified

As explain by other VRTSweb is web server provided by veritas for webgui.

The listener's involvement is summarized as:

  • the listener catches the request
  • spawns or requests a database process/thread
  • redirects or passes the connection to the process/thread, usually on a different port
  • gets out of the way

 Oracle Listener. The database would depend on a number of resources - mostly storage related, however the Oracle Listener would be dependent on the Oracle Database.

First check the dependecy of VCS as per previus post paste "hagrp -dep" output.