Re: Veritas Cluster for Netapp Snapmirror for Wind...

Daniel_Bass · ‎04-25-2006

Hi All
Does anybody has some experience with the subj?
I am interested to get the more details about the functionality of netapp agents. Tips/Tricks for implementation.
The official documentation is really sucks :(

Carsten_Hennig · ‎04-25-2006

Hi Daniel,

I just recently installed such a beast for a customer. Now, with 4.3MP1 plus private patch it really works...

What exactly do you want to know ?

Regards,
Carsten
(Carsten.Hennig@StorConcepts.com)

Tom_Maher · ‎04-25-2006

Hi Daniel,

We've just implemented on both Exchange and our SQL platforms. My tips would be;

1) Reverse the documented dependencies so SnapDrive depends on SnapMirror otherwise the SnapDrive agent will always make SnapDrive perform a SFSR on failover to your remote site.
2) Make sure your filernames are lowercase or SnapDrive will never perform an SFSR.
3) As Carsten says, make sure your revisions are good (4.3MP1+private), ONTAP 7.0+, iSCSI 2.0 and SnapDrive 3.2R1.
4) Be prepared to work long and hard getting NetApp to acknowledge this product.

Daniel_Bass · ‎04-26-2006

Tom
Thanks for your response.
1) I not fully understood you comment about resources dependences..So, SnapDrive agent is performing the SFSR in order to get the consistent exchange image on DR site. What wrong with it? Could you please explain?
2) What version of SnapManager for Exchange are you running?

Thanks in advance,
Daniel. (daniel@mbi.co.il)

Tom_Maher · ‎04-26-2006

Hi Daniel,

That's correct. During the VCS Exchange Setup wizard, there is a point where the wizard needs to mount the Information Store on a DR node. The wizard manages this by getting SnapDrive to break-off SnapMirror and perform a single file snap restore from the last consistent snapshots of your exchange LUNs. This in turn makes your DR LUNs writeable and hence the Exchange IS can mount.

It's this process that will likely timeout and fail unless you're patched up to SnapDrive 3.2R1 and the VCS private here - http://seer.support.veritas.com/docs/281668.htm - I'd let support know you've applied the private to keep their records straight.

My comment about the resource dependencies really only applies AFTER you've got Exchange highly-available under VCS control through the setup wiz. You'll see in the sample resource dependency tree (page 123 of the "VCS for NetApp SnapMirror Installation and Configuration Guide for Microsoft Exchange" - http://seer.support.veritas.com/docs/277937.htm) - that SnapMirror depends on SnapDrive.

As you've rightly said, the resources set this way will perform an SFSR. In a DR scenario where your source filers are unavailable because of a site outage this is desirable as the SFSR will break the mirror, make the vol writeable, roll you back to your last verified snapshot and mount it. That's great (although SFSRs take a hideously long time to complete).

If however like us, you want the capability to perform an administrative failover to your DR site and subsequently failback without data loss, reverse the dependencies so SnapDrive depends on SnapMirror. In this instance the SnapMirror agent breaks the mirror, resyncs the data, reverses the mirror and makes your original source filer the mirror destination. SnapDrive onlines next in the resource tree and simply mounts the now writeable LUN (rather than mess about with an SFSR). You're looking at a tenth of the time to recover and zero data-loss versus the pain of an SFSR and data loss back to your last consistent snap.

Basically VCS is doing this - http://now.netapp.com/NOW/knowledge/docs/ontap/rel7001_gf/html/ontap/onlinebk/mirror22.htm (you'll need a NOW account to get to this, if you haven't, I'll mail it.)

Tom_Maher · ‎04-26-2006

Sorry, SME is version 3.1

Carsten_Hennig · ‎04-26-2006

Hi Tom,

if you change the dependencies of SnapDrive and SnapMirror resources, does that have a negative impact on a real DR scenario ?

Regards,
Carsten

Tom_Maher · ‎04-27-2006

Hi Carsten,

We've lab tested a real DR scenario (killing the power to our production source filer) and found no detrimental impact with the dependencies set to Snapmirror > SnapDrive.

First off, SnapDrive on your production app servers will start letting you know you've lost the filer

Could not retrieve configuration information from the filer for LUN (\\?\scsi#disk&ven_netapp__&prod_lun_____________&rev_0.2_#1&2afd7d61&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}) at Port number 3, Bus number 0, Target number 0, LUN number 0.
Error code : The specified network resource or device is no longer available.

along with errors from your production apps (SQL in this case)

17053 :
LogWriter: Operating system error 2(error not found) encountered.

The monitor routine of the NetAppFiler agent will pick up the fact the source filer is dead and then attempt to failover to the DR cluster. As SnapMirror is onlining first you'll see it attempt to contact the source production filer for a migrate, fail and initiate a mirror break-off instead

2006/04/27 10:27:19 VCS ERROR V-16-20031-1001 NetAppSnapMirror:MIRROR_DB:online:Error 13011: RPC Error - The RPC server is unavailable.
2006/04/27 10:27:19 VCS NOTICE V-16-20031-55 NetAppSnapMirror:MIRROR_DB:online:Unable to connect to remote filer filer1. Will attempt to takeover
2006/04/27 10:27:21 VCS INFO V-16-20031-57 NetAppSnapMirror:MIRROR_DB:online:Takeover successful. Snapmirror volume DB on filer filer3 is now broken-off

SnapDrive onlines next and sees the volume is broken-off and unmaps the iSCSI mappings previously held by the now dead production filer

2006/04/27 10:28:53 VCS INFO V-16-20031-96 NetAppSnapDrive:LUN_DB:online:Volume DB is in a 'broken-off' state
2006/04/27 10:28:53 VCS NOTICE V-16-20031-35 NetAppSnapDrive:LUN_DB:online:Lun /vol/DB/db.lun is mapped to initiator group viaRPC.iqn.1991-05.com.microsoft:prodsql01.testlab.local. Will attempt to unmap it

and initiates a SnapDrive SFSR from the last(most up-to-date) successfuly mirrored snapshot

Starting Single File SnapRestore of the virtual disk from the snapshot( sql_db_snap.0 ).

This is the bit that takes the time dependent on the size of the snap from which the SFSR is being performed. Once this is complete the rest of the service group should online and start serving the application again. In terms of data, you're good up to that last snapshot - anything that was written to the filer subsequent to that is obviously lost.

You'll need to safely recover and failback to your production filers when they're back online but that's relatively straightforward.

I must stress at this stage this config isn't supported AFAIK. We're hoping to see a technote on the issue soon but until that stage I would strongly recommend leaving this to the lab and well away from production.

Tom_Maher · ‎04-18-2007

Apologies for resurrecting such an old thread but for anyone with any interest, VCS5.0 appears to have resolved the issue of the SnapDrive - SnapMirror depends. Funny we never saw the technote though...

VOX

Veritas Cluster for Netapp Snapmirror for Windows/Exchange