Fail over is not happening when any of the volumes...

Sridhar_sri · ‎06-29-2009

Hi All,
I am using VSF HA 5.0 for both HA and DR setup`s. In Windows / solaris platforms, if in case any of my replicated volume i.e., for example : I am using an volume named as PRODUCT which is mounted with E: in windows (or) /opt/PRODUCT in solaris platform, due to some reasons anyone of my key process running in this volume goes down in primary server and due to this my customised agent resource which is montioring the process ,treats this as failover and initiates the failover in the primary server. During the time of failover , all of my reseouces in the primary server goes down except MountV resource in windows platform / Mount & Volume resources in secondary server. Due to this even though few of the resources started coming up in the secondary
server , since this mount resources are not completely went offline in primary server, my failover process results with Faulted state in both primary and secondary servers.

Is this is a known issue ?? If so , is this issue fixed in latest releases ?

Can anyone guide me to solve this problem.

Thanks in Advance.

With Regards,

Sri.

rhanley · ‎07-02-2009

Hi Sri,
On Windows, I would check the %VCS_HOME%\log\mountv_a.txt log file and see what error it provides when the resource fails to offline. Normally when we see this, it is because there is an application or process outside of the cluster that has a lock on the volume. SFW needs an exclusive lock on the volume in order for it to go down properly. In these cases, you will see the error 'Failed to lock volume' in the MountV_a.txt log file

On the MountV, you can set the ForceUnmount attribute to READ_ONLY if not already set. This will allow the volume to go offline if only read access is occurring to the volume. If this does not address the issue (or if it is already set), then it is recommended to use a utility such has handles to determine what process has an open handle to the volume. Once determined, that process will either need to brought into the cluster or handled some other way to avoid it from locking the volume.

I'm guessing you state that the MountV and possibly VMDg resources fail to online on the secondary server? Again, I would take a look at the Mountv_a.txt log file (and also the VMDg_A.txt log file if you're having issues with the VMDg resource) on the secondary server.

Once you know the specific errors, I would recommend searching the Symantec Knowledgebase to see if you can find any additional information. Otherwise, please post the errors you see back to this thread and I'll see if I can provide any further assistance.

I hope this helps,
Robert

Sridhar_sri · ‎07-05-2009

Hi Robert,
I know the reason why its not failing over properly.These are the situations where my failover is not happening properly:

* In Windows, when i installed my PRODUCT in E: and created my cluster. After all my setup procedures, i kept a window opened with E drive (either a command window / GUI window) , which is displaying the contents in the E drive. In this situation, I am facing this issue. Same as case for solaris , like if a telnet session to the mounted drive kept open, failover is not happening.

Do u have any suggestion on this ?

Regarding the error statements, I think i got those error statements only.

Regards,
Sri.

rhanley · ‎07-17-2009

Hi Sri,
Unfortunately, if an Explorer window is opened on the server to the drive that is controlled by the MountV resource, this will be enough to cause the MountV resource to fail when attempting to lock the volume during the offline operation. When this occurs, your best option is to close the explorer window to the drive, then restart had:

From a command prompt:

haconf -dump -makero
hastop -local -force
hastart

That will clear the 'offline pending' on the MountV resource and it will show properly as 'online' again. At that point, an offline would then successfully unmount the volume and allow a failover to occur. Normally this is only an issue during testing as long as users don't stay logged into the server exploring the drives locally from the server.

The only way around this would be to set the ForceUnmount attribute to ALL, but if some application was actively writing to that volume when it went down, you could cause NTFS corruption. It is strongly recommended never to use the ALL option on databases, but customers do occassionaly choose the ALL option when working with File Shares and other items that they're willing to risk data corruption in order to ensure the Volume can always be unmounted in case of a failover.

rjhanley

VOX

Fail over is not happening when any of the volumes are kept open