Are we monitoring the right object?

Just some quick questions.  We use Veritas Clustering for providing an HA solution for database servers.  We have set up the vcs montoring to try to connect to the database every 3 minutes.  After 4 attempts the VCS kills the primary pid and fails over to the other side.  The box is self is fine, just vcs was unable to connect for whatever reason.  The theory is that if the database is unresponsive for 12 minutes it must be in a hung state.  Correctly or not, there are many reasons that a database might be unresponsive other than hung.

My question is this, should we be checking the host (hardware) not the database for whether or not to fail over.  What are other companies that use VCS for HA doing to ensure that failovers do not happen for unnecessary or unintend reasons?  Is VCS intended for checking application availability?

 

Jim

2 Solutions

Accepted Solutions
Highlighted
Accepted Solution!

Hi, VCS is supposed to check

Hi,

 

VCS is supposed to check all your resources, disk, ip, mounts, and the the database (oracle, sybase, etc).

 

It is not required to enable second level monitoring, that is purely up to you and your organization. The thing is, as you stated, a database might be "online" (for instance pmon process in the case or Oracle are running), but it is not OPEN and ready for transactions.

 

The point of the second level monitoring is to inform you of such instances.

 

Why would your database not be available for the 2nd level monitor to work?

 

You could set the restart / tolerance limits so that it can handle these situations, but as you said, why is your database not able to respond for 12 minutes?

 

Is it really "hung" or is the monitor failing for some reason?

View solution in original post

Highlighted
Accepted Solution!

Agree totally with Riann, 2nd

Agree totally with Riann, 2nd level monitoring is optional and default is Process check only, so if you don't want this behaviour then set back to the default (MonitorOption = 0)

Just to give a little more info:

  1. The default script provided for 2nd level monitoring (which you may amend if you wish) updates a row in a dummy table.
     
  2. The 2nd level monitor can run at different intervals to the primary monitor by setting LevelTwoMonitorFreq.  So for example if your MonitorInterval is 60 seconds and LevelTwoMonitorFreq is set to 3, then VCS wil check processes every minute and do a level 2 check every 3 minutes.
     
  3. If the 2nd level falis (not times out), then the resource will fault (or restart) unless ToleranceLimit is set.
     
  4. If the 2nd level times out, then it is not seen as a fault until it times out 4 times in a row (4 by default - this can be amended by setting FaultOnMonitorTimeouts.

If Oracle cannot update a row for 12 minutes, then many applications cannot survive in such a scenrio as the application will timeout, but if this is ok in your environment, then as earlier, disable this check.

Mike

View solution in original post

3 Replies
Highlighted
Accepted Solution!

Hi, VCS is supposed to check

Hi,

 

VCS is supposed to check all your resources, disk, ip, mounts, and the the database (oracle, sybase, etc).

 

It is not required to enable second level monitoring, that is purely up to you and your organization. The thing is, as you stated, a database might be "online" (for instance pmon process in the case or Oracle are running), but it is not OPEN and ready for transactions.

 

The point of the second level monitoring is to inform you of such instances.

 

Why would your database not be available for the 2nd level monitor to work?

 

You could set the restart / tolerance limits so that it can handle these situations, but as you said, why is your database not able to respond for 12 minutes?

 

Is it really "hung" or is the monitor failing for some reason?

View solution in original post

Highlighted
Accepted Solution!

Agree totally with Riann, 2nd

Agree totally with Riann, 2nd level monitoring is optional and default is Process check only, so if you don't want this behaviour then set back to the default (MonitorOption = 0)

Just to give a little more info:

  1. The default script provided for 2nd level monitoring (which you may amend if you wish) updates a row in a dummy table.
     
  2. The 2nd level monitor can run at different intervals to the primary monitor by setting LevelTwoMonitorFreq.  So for example if your MonitorInterval is 60 seconds and LevelTwoMonitorFreq is set to 3, then VCS wil check processes every minute and do a level 2 check every 3 minutes.
     
  3. If the 2nd level falis (not times out), then the resource will fault (or restart) unless ToleranceLimit is set.
     
  4. If the 2nd level times out, then it is not seen as a fault until it times out 4 times in a row (4 by default - this can be amended by setting FaultOnMonitorTimeouts.

If Oracle cannot update a row for 12 minutes, then many applications cannot survive in such a scenrio as the application will timeout, but if this is ok in your environment, then as earlier, disable this check.

Mike

View solution in original post

Highlighted

Please reply or mark the

Please reply or mark the solution if it helped you.