Forum Discussion

allaboutunix's avatar
9 years ago

DB failures in VCS node

Hi Team,

 

We have two servers in a VCS (Planet - Space), in planet DB's are experiencing DB failures, they reported to Unix team to look out from their side.

I am not able to paste the output of errors, but could you please suggest what best steps we should have to  take to resolve the issue?

  • Hi Allaboutunix,

     

    Information provided is too less to suggest anything. We need more information to narrow down issue and suggest steps to resolve it.

     

    Thanks & Regards,
    Sunil Y

  • Kindly check if the system is heavy loaded. If the Oracle monitor's are continuously timing out and the nuuber of the monitor timeouts equals the value set on the FaultOnMonitorTimeout attribute then they may lead to VCS marking the resource as fault and bringing down the resource. Also check if you have SecondLevelMonitor enabled for the Oracle resource and the output of the SQL script run will be seen in the agent/engine log. Enable appropriate Debug log level.

    Also if there is a dependent child resource that is faulted it may cause all the parent resource to be taken offline.

    This may be some of the causes, to know the exact issue you may need to check the VCS engine/agent logs and the Oracle database alert log. Kindly provide the log files if you need any help in locating the issue.

     

    Regards,

    Sudhir

  • Hi,

    Are you seeing the failures in VCS or failures at DB layer & VCS just acting to it ?

    I would suggest to collaborate with DB team & find out why DBs are failing. VCS will act to failures as expected or defined.

    If you see that DBs are failing outside of VCS, then get the DBs validated outside of VCS (freeze VCS groups) & ensure DBs are setup appropriately.

    I wouldn't suggest to touch any VCS config unless you are sure about failure.


    G

3 Replies

  • Hi Allaboutunix,

     

    Information provided is too less to suggest anything. We need more information to narrow down issue and suggest steps to resolve it.

     

    Thanks & Regards,
    Sunil Y

  • Kindly check if the system is heavy loaded. If the Oracle monitor's are continuously timing out and the nuuber of the monitor timeouts equals the value set on the FaultOnMonitorTimeout attribute then they may lead to VCS marking the resource as fault and bringing down the resource. Also check if you have SecondLevelMonitor enabled for the Oracle resource and the output of the SQL script run will be seen in the agent/engine log. Enable appropriate Debug log level.

    Also if there is a dependent child resource that is faulted it may cause all the parent resource to be taken offline.

    This may be some of the causes, to know the exact issue you may need to check the VCS engine/agent logs and the Oracle database alert log. Kindly provide the log files if you need any help in locating the issue.

     

    Regards,

    Sudhir

  • Hi,

    Are you seeing the failures in VCS or failures at DB layer & VCS just acting to it ?

    I would suggest to collaborate with DB team & find out why DBs are failing. VCS will act to failures as expected or defined.

    If you see that DBs are failing outside of VCS, then get the DBs validated outside of VCS (freeze VCS groups) & ensure DBs are setup appropriately.

    I wouldn't suggest to touch any VCS config unless you are sure about failure.


    G