11-24-2011 10:06 PM
Environment
OS = Win2008R2
SFHA version = 5.1SP2
Cluster = Two Nodes
Clustered application = SQL Server 2008
Any Tool which can analyze the VRTSexplorer logs and give result or any detail/basic report. Actually the agenda is I want to see/monitor the the health of my Storage Foundation(SF) and Veritas Cluster(HA) on timely basis(atleast each month).
OR
Any idea/Proper way to do the health check of Storage Foundation(SF) and Veritas Cluster(HA).
Solved! Go to Solution.
11-28-2011 06:28 AM
The collectors collect information from VCS, so if VCS did not recognise fault, then neither will collector.
A few other points:
If a resource stops working (process dies, filesystem umounted, ip unplumbed, etc), then if it is restarted outside of VCS control and VCS did not run a monitor routine (by default every 60 seconds) while resource was not working, then VCS will not know it was down, unless you are using IMF (Intelligent Monitoring Framework introduced in 5.1SP1). If VCS does run a monitor while resource is not working, then if RestartLimit is set to greather than zero, then VCS will restart and this is NOT classed as a VCS "Fault" and I THINK the severity of this event is just "Info" in the logs. This event may not show in any reports and if it does it should show in a "Resource restarting" category and this SHOULD not be shown in a "VCS Faults" category.
Mike
11-24-2011 10:13 PM
Not to my knowledge, VRTS explorers logs are analyzed manually.
Regarding health check, there are many custom scripts developed by people, you can have a look at downloads section of connect for health check scripts..
G
11-24-2011 10:28 PM
you can have a look at downloads section of connect for health check scripts..
Meanwhile I am checking any recommanded script used by you :) ?
11-25-2011 01:14 AM
Hi Zahid,
You can take a look at Veritas Operations Manager (VOM). It is a single pane of glass for monitoring and managing operations of both SF and HA .
In the beginning you can use it only for monitoring and also schedule periodic health checks that are built into the tool (you can have the reports emailed to the ids that you specify).
Later you might want to use it for managing SFHA as well.
Link: http://go.symantec.com/vom
Regards,
Amit.
11-25-2011 01:53 AM
Have a look at https://sort.symantec.com/welcome/reports. I believe this uses a "data collector" and then you can run reports so in the example for Risk Assessment it looks at critical resources in VCS. If you click on "View Reports", it also says on the right hand side:
Compare configurations
The Compare Configurations feature lets you compare different system scans by the data collector. When you sign in, you can choose a target system, compare reports run at different times, and easily see how the system's configuration has changed.
You may also be able to get more sophisticated offerings by contacting Symantec Global Consulting. I used to work in Consulting in the UK and I developed a tool to examine explorers from UNIX and Windows to provide a 100+ Healthcheck list. Local Consulting was outsourced last year and so Consultants were made redundant world-wide, but the healthcheck tool was taken on by Global Consulting and I believe this department may still exist at Symantec.
Mike
11-25-2011 05:02 AM
@ Ashirodk and Mike Thanks for your kind replies
Any VDO to understand this please ? Actually I got the guide but its too detailed. So share any VDO/Tutorial which can explain VOM for health Check
========================================================================================
Second:
What is the difference between Custom Reports and Risk Assessment Checklist. Would you kindly throw some light on this.
11-26-2011 10:36 PM
TRY them both and see!
'Custom report' needs you to download and run the tool on the cluster nodes and will examine installed components and hardware. It will provide a report at the end.
'Risk assessment' will give you a 'general' report based on the information that you supply.
11-28-2011 04:34 AM
I used the data collector and upload the xml file. It given me useful information.
Is that possible that like if the resource faults and UP again in just few seconds and the cluster did not recognize the fault / No failover occured. So my question is : Does anyway which records the faults of last 3 OR 4 OR custom days ?
11-28-2011 06:28 AM
The collectors collect information from VCS, so if VCS did not recognise fault, then neither will collector.
A few other points:
If a resource stops working (process dies, filesystem umounted, ip unplumbed, etc), then if it is restarted outside of VCS control and VCS did not run a monitor routine (by default every 60 seconds) while resource was not working, then VCS will not know it was down, unless you are using IMF (Intelligent Monitoring Framework introduced in 5.1SP1). If VCS does run a monitor while resource is not working, then if RestartLimit is set to greather than zero, then VCS will restart and this is NOT classed as a VCS "Fault" and I THINK the severity of this event is just "Info" in the logs. This event may not show in any reports and if it does it should show in a "Resource restarting" category and this SHOULD not be shown in a "VCS Faults" category.
Mike
11-28-2011 07:25 AM
Thanks for your kind info mike. I think this may be a good option if VCS keep record of faults/error and show in the report produce by Data Collertor. (I think Faults/errors may keep in VCS logs)...This may be a good option
11-28-2011 09:32 PM
One more point
VCS keeps record of faults/error occured earlier in the logs(may be yesterday or day before yesterday) Does the report produce by Data Collertor show/notify us ?
Otherwise I think this may not be fit 100% for health check of those incidents which happened in the past and also happening in a specific time and then got fine/OK and for instance when we run the DataCollector it miss to collect. Because of this we can go in a big disaster.
Any input will be appriciated which may overcome this complexity
11-29-2011 10:11 AM
"Because of this we can go in a big disaster."
Your imagination is really running away with you.... You have VCS installed so that action can be taken (failover/restart).
Go through the installation and/or Admin Guide to see how to set up notification.
Also see in Admin Guide how VCS responds to resource and/or system faults.
You may also want to look into IMF (Intelligent Monitoring Framework) that is available in your VCS version.
11-29-2011 08:40 PM
I agree with your points that how can VCS responds But what if the storage had temp problem(which is single point of failure)
As far as IMF (Intelligent Monitoring Framework) is concern, this functionality will come in version 6.0 for windows.
Any way thanks alot for your usual kind support :)
11-30-2011 02:17 AM
Dear Zahid,
VOS data collector may not be 100% reliable, but even then it is a very usefull tool that can give you detailed report about your cluster and systems health.
Secondly if you want to analyze the VRTSexplorer log, then it is just a compressed form of the cluster configuration and log files. You can extract the VRTSexplorer and you will se a directory structure similar to your installation having the log files and config files.
You can then analyze the required file in case of any issue.
Regards,
Mohsin Mansoor
Technical Consultant
11-30-2011 02:23 AM
Thanks Mohsin for your kind reply. I agreed with you but thats not my question :)...Actually I think at this time we have to see the VRTSexplorer to check the past errors/faults occured with the Veritas Cluster and this is offcourse a time consuming task.
11-30-2011 09:21 AM
I THINK VOM (not VOS) will show you historical errors - from what I remember, there is an "uptime" report which will show you when you service groups have been down over a user specified period.
For more help on VOM and VOS you should post query to "SFHA Management (VOM, SORT)" forum
Mike
12-02-2011 08:12 PM
Just one more thing about IMF - it is already available in 5.1 as from SP1.
Please see this blog: https://www-secure.symantec.com/connect/blogs/vcs-instantaneous-notification-and-fast-failover-imfamf