05-03-2011 08:17 AM
Hi All,
I am new to this forum. really could do with some help from you out there. We have Veritas Cluster running on 2 database server (solaris 8). Over the past 2 weeks, I have seen a few reource failures, can someone exaplin more about why these may be happening. I have checked Veritas logs, got this one this morning,
TAG_E 2011/04/30 00:54:41 (uk-crmdbs002) VCS:15002:hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/nfs_restart CRMdbs successfully
TAG_E 2011/04/30 00:54:41 (uk-crmdbs002) VCS:15002:hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postonline uk-crmdbs002 CRMdbs successfully
TAG_D 2011/05/03 01:01:38 (uk-crmdbs001) VCS:13067:Agent is calling clean for resource(VCSweb) because the resource became OFFLINE unexpectedly, on its own.
TAG_E 2011/05/03 01:01:51 (uk-crmdbs001) VCS:150023:VRTSWebApp:VCSweb:clean:Output of completed operation - 'Context "/vcs" removed.'
TAG_D 2011/05/03 01:01:51 (uk-crmdbs001) VCS:13068:Resource(VCSweb) - clean completed successfully.
TAG_E 2011/05/03 01:01:52 VCS:10307:Resource VCSweb (Owner: unknown, Group: ClusterService) is offline on uk-crmdbs001
(Not initiated by VCS.)
TAG_E 2011/05/03 10:40:30 VCS:50135:User root fired command: hares -clear VCSweb from 127.0.0.1
TAG_E 2011/05/03 10:40:30 VCS:10307:Resource VCSweb (Owner: unknown, Group: ClusterService) is offline on uk-crmdbs001
I have cleared the fault by running
/opt/VRTS/bin/hares -clear VCSweb uk-crmdbs001
I now have hastatus -summary like this:
-- SYSTEM STATE
-- System State Frozen
A uk-crmdbs001 RUNNING 0
A uk-crmdbs002 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B CRMdbs uk-crmdbs001 Y N OFFLINE
B CRMdbs uk-crmdbs002 Y N ONLINE
B CRMnic uk-crmdbs001 Y N ONLINE
B CRMnic uk-crmdbs002 Y N ONLINE
B ClusterService uk-crmdbs001 Y N PARTIAL
B ClusterService uk-crmdbs002 Y N OFFLINE
Can someone please explain why its not saying PARTIAL and also why these reource are failures, the listener resource failed a few times last week.
Thanks
Amreek
05-03-2011 09:22 AM
The VRTSweb is not an important resource - it is the backend for the VCS Web interface which is rarely used ( has no effect on using Java interface) . The VCS web interface is not supported in 5.1 so if you have 5.1, that could be why it is failing. Support for web interface started to be removed in 5.0MP3 so that if you installed 5.0MP3 from scratch, the VRTSweb component would not be installed and was only supported if you upgraded from a previous version. The logs are telling you the process is dying and if you are seeing Listener resource fauting, then this is a completely different issue and this would suggest there is a problem with the Oracle Listener or VCS is incorrectly determining listener resource is down when it is not.
When you clear resource then the resource remains down so this means for the resources in the ClusterService service group, at least one resource is offline and at least one other resource is online so the group reports as in a partial state.
If you don't use the web interface then I would delete this resource.
Mike
05-03-2011 11:49 AM
If OS is Solaris 8, I guess the VCS version will be just as old....
Please check version with one of these commands and let us know:
hasys -display |grep -i version
had -version
haclus -value EngineVersion
This error says that someone/something other than VCS killed the process:
the resource became OFFLINE unexpectedly, on its own.
.... (Not initiated by VCS.)
VCSweb was used in older versions to enable a Web interface to monitor/manage clusters in addition (or as alternative) to the Java Console. The resource does not seem to be configured as Critical, meaning the rest of the ClusterService Service Group was not taken offline and failed over to the second node, leaving the SG in PARTIAL state.
05-03-2011 02:10 PM
After you clear a fault, you must online the resource using the "hares -online VRTSWeb -sys uk-crmdbs001" command.
05-03-2011 06:23 PM
I think you would have fair idea about VRTSweb resource (which you be VCSweb in your case) & the probably causes of fault... few things which I may want to add:
-- VRTSweb is a web server provided by Veritas for WebGUI (not java GUI) as explained by all friends above .. if you do a "ps -ef |grep -i VRTSweb" , you should be able to monitor the process in the process tree. If the resource is getting faulted again & again, have a check on process to see if process is getting killed. If this is the case, you may need to dig more on your server to find what is causing process kill.
--Coming to Listener fault .... can you paste the logs around the time listener faults ?
-- Have you created any dependency of any groups ? (paste "hagrp -dep" output.. )
G
06-19-2011 10:37 PM
As explain by other VRTSweb is web server provided by veritas for webgui.
The listener's involvement is summarized as:
Oracle Listener. The database would depend on a number of resources - mostly storage related, however the Oracle Listener would be dependent on the Oracle Database.
First check the dependecy of VCS as per previus post paste "hagrp -dep" output.