cancel
Showing results for 
Search instead for 
Did you mean: 

VCS Cluster freezing at specific time of day

kingcap3
Not applicable
Wer have VCS Cluster Version 5.0 and veritas Storage Foundation 5.1, with a 2 node cluster sitting on x64 W2003 servers.

We have a very strange issue where our clutser freezes at a specific time of the day, 4.20 to 4.30pm Tuesdays.

The server itself is up but we notice you cant login to the cluster and all access to the Cluster Drives freeze, as this is cluster is a Fle Share environment. There are no backups taking place and we use Symantec Netbackup version 6.0MP7. We disabled the Netbackup agents. Ourt Sorage is from a EMC CX3-40 and we have no issues with any other servers connected to this storage during this time. We are puzzled why this happens at the same time each week, and why it freezes for the exact period of time and then returns and we can carry on. We have looked to see if there is anything running on the Server specifically at this time but we cant see anything noticeable, it's like it performs a freeze of some sort or a process runs and then finishes or timesout.

I wonder if anybody has seen anything similar

Thanks

Andy
1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Level 6
Partner    VIP    Accredited Certified
Is there a possibility that system resources (cpu and/or memory) are sitting at 100% utilization during this period?

Have a look at the cluster engine log - located at %VCS_HOME%\log\engine_A.txt.
Look for the time period in this log during which you notice the frozen status.
There should also be evidence of what is happening in Event Viewer Application and/or System log.

View solution in original post

3 REPLIES 3

Marianne
Level 6
Partner    VIP    Accredited Certified
Is there a possibility that system resources (cpu and/or memory) are sitting at 100% utilization during this period?

Have a look at the cluster engine log - located at %VCS_HOME%\log\engine_A.txt.
Look for the time period in this log during which you notice the frozen status.
There should also be evidence of what is happening in Event Viewer Application and/or System log.

Gaurav_S
Moderator
Moderator
   VIP    Certified
Hi,

Sounds like performance impact only..... as you are saying that you can't even login to the server that gives an indication of a sort that server is becoming irresponsive...

As indicated by Marianne above, check for system resources...

also, do you use any sort of snapshots ? or BCV ? which are being taken at same time ?

Gaurav

Wally_Heim
Level 6
Employee

Hi Andy,

To add to what Marianne mentioned (perfmon and log review), here are some things to look at.

1. Are there any scheduled jobs that run on the server at all during the afternoon on Tuesdays?  Even if it is not right at the time of the event it might be involved.  if there are scheduled tasks say from noon to 4:20PM try changing their schedules to run an hour sooner.  If they are related then the event should move forward in time by an hour.  if not there should be no change in the time the event happens. 

2. Simplify.  Stop any services that you can to see if they affect the event at all. 

3.  In the extreme case, get a full memory dump when the issue happens.  Then have Microsoft review it to determine what is going on at that time. 

Thanks,
Wally