VCS Resource Fault

Level 0

Im New to this VCS system. Usually we will try to stop and start the VCS early morning and somedays , when we perform this action.Seems some resources are went to faulted state then we are making the faulted resource online manually.

My basic question is why the resource are  moving to faulted state. What could be the potential reason behind that.

Thanks in Advance.


Level 4


This is not a common activity to perform with a cluster (VCS). You should not need to restart VCS, it is very uncommon to have to restart the cluster. The service groups however, which relate to the application, might need to be restarted but that is really just an indication that there is something wrong with the application. You need to remember that VCS doesn't do anything except start, monitor and stop the application. If there is nothing wrong with the app, VCS wont' ever need to take any action.

The fact that some resource don't come online, of fault, would need to be investigated. A resource is just one component of the application e.g. a file system, an IP address, or a process. The fault could be due to many reasons.

Which resources are failing? 

You can post the engine_a.log and the log for the specific agent for review.

Level 6
Partner    VIP    Accredited Certified

"Usually we will try to stop and start the VCS early morning "

How exactly are you trying to stop and start VCS? 

More important - WHY? 


To find out why a resource is faulted, you need to do the followings

1. Check /var/VRTSvcs/logs/engine_A.log to find why the resource faulted

If the resource faulted by caused by online timeout, you need to consider tuning VCS

2. Check service group resource dependency to make sure it is correctly set so the resources in the service group are brought up online following a logical sequence.

also as other people pointed out, VCS is a HA solution meaning application high availability, no need to frequently restart it up (unless after applying OS kernel patch or firmware/software upgrade etc)   It's very usual to keep a cluster up running for over one year for mission critical applications.