cancel
Showing results for 
Search instead for 
Did you mean: 

NetApp snapmirror resource not getting online with VCS integration

uvahpux
Level 4
Partner

Dear All,

I would like to have some help to identify following error while onlining a NetApp SnapMirror agent that was intergrated with VCS.

The filer name and its credential s were given properly and even i can able to ssh it from the servers but for some reason the resource is not getting online and shows following error in the engine.log

2014/09/30 15:02:44 VCS WARNING V-16-20059-1002 (node)NetAppSnapMirror:testSM:online:Encountered errors while decrypting password!
2014/09/30 15:02:44 VCS ERROR V-16-20059-1000 (node11) NetAppSnapMirror:testSM:online:ONTAPI 'system-get-version' failed on filer node1.sys  Error : in Zapi::invoke, cannot connect to socket

Actually i am doing global Clustering configuration where i have two node cluster and a filer at primary site, In dr i have a single node cluster and a filer. the replication is async.

Any support highly appreacited.

Thanks.

Uvi.

 

12 REPLIES 12

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

Explain the environment more, what are the nodes called? Have you onlined it in prod and its not working in DR, or nowhere at all?

Its complaining about encryption, have you encrypted any of passwords used in the cluster configuration?

uvahpux
Level 4
Partner

Hi Riaan,

Thanks for the responce, This is a new deployment for Global Cluster.We have two sites Primary and DR. In the primary site we have two node vcs cluster connected netapp filer1. in the DR site we have a single node vcs cluster connected separate filer.

The replication is snapmirror over the SAN dark fibe connectivity between the sites. The replication mode is async.

Now the SnapMirror integration is pending. SnapMirror agent for VCS is installed on all nodes.

follwoing is the software informatino :

OS : Linux 6.x

SFHA : SFHA5.1SP2RP4

H/W : HP DL380Gen8

Actually during the integration we have entered the password from the vcs java console. so the password is not encrypted. I am not sure whether the password should be encrypted or not.moreover the vcsencrypt command to encrypt the password is not available.

Aslo is there any port to be opened beween vcs nodes and the filer apart from the ssh port ? coz earlier it was complaining to connect the filer. so we opened the ssh port. now it is saying about encrypt passowrd.

Any hint in this issue highly appreciated.

Thanks in advance.

Uvi

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

The passwords in the configuration should be encrypted. /Should look like this in main.cf

 

NetAppSnapMirror sm (
                          VolumeName = unixvol
                          SnapMirrorSchedule = "0-59/5 * * *"
                          LocalFilerName = netapp1
                          LocalFilerUserName = root
                          LocalFilerPword = aoaMboDodOhmPocMe
                          RemoteFilerName = netapp2
                          RemoteFilerUserName = root
                          RemoteFilerPword = aoaMboDodOhmPocMe
                          )

 

The vcsencrypt should be located in the $VCS_HOME/BIN

uvahpux
Level 4
Partner

Hi Riaan,

Thank for the support, I have encypted the passwd using vcsencrypt -agent command. However the LocalFilerUserName attribute is not availabe it seems. now my config is looing like below.

NetAppSnapMirror testSM (
                VolumeName = testvol
                SnapMirrorSchedule = "0 12 * *"
                LocalFilerName = "netapp1"
                LocalFilerPword = BLwerdPBjBPeHGpE
                RemoteFilerName = "netapp2"
                RemoteFilerPword = BLkdskBjBPeHGpE
                )

So now there is no encrypt password error. Is there any tcp port to be open from firewall as now the logs showing some socket error.

2014/10/01 10:27:58 VCS ERROR V-16-20059-1000 (node1) NetAppSnapMirror:testSM:online:ONTAPI 'system-get-version' failed on filer ddci-netapp1-ctrl2.qidc.sys. Error : in Zapi::invoke, cannot connect to socket

2014/10/01 10:27:58 VCS ERROR V-16-20059-2002 (node1) NetAppSnapMirror:testSM:online:Unable to connect to filer netapp1

Thanks in advance.

Uv

 

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

ok, well that is progress. Can you check/post what is reported in the NetAppSnapMirror logs and not just the engineA.log.

 

The cluster is going to try and connect to the netapp filer so it needs what ever ports you would when doing that. Can you connect manually and perform the command 'system-get-version' as it is trying?

uvahpux
Level 4
Partner

Hi Riaan,

Thank You for the responce.

following is the message when i enters into netapp

netapp1> system-get-version

system-get-version not found.  Type '?' for a list of commands

 

following is the engine log message i am still getting

2014/10/02 10:17:28 VCS ERROR V-16-20059-1000 (node1) NetAppSnapMirror:testSM:online:ONTAPI 'system-get-version' failed on filer netapp1. Error : in Zapi::invoke, cannot connect to socket
2014/10/02 10:17:28 VCS ERROR V-16-20059-2002 (node1) NetAppSnapMirror:testSM:online:Unable to connect to filer netapp1

I am trying to get some help from support.

Thanks

Uv

 

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Cool, let me know what the problem is.

uvahpux
Level 4
Partner

Hi Ryan,

After a long time case with NetAapp the issue is got resolved. The Symantec support has provided a tool which is apitest.pl to validate the communication between nodes and filers. however the commnication is was failing so we logged a case with NetApp, The netapp support and our storage team they were trying many options in filer at last it worked. 

Now my service group is coming online properly at Primary and DR site.

I have one doupt that the data is visible at DR filer as expected in async replication, However after a failover to DR site the DR filer is in Write mode and the service group is also in online but the data written during the DR time is not visible at PR filer. 

please somebody explain whether it is normal behaviour or do i need to configure snapmirror in DR filer as well. i am not sure how the netapp snapmirror works but in hp eva ca it works.

Thanks.

Uv

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

NetApp must configure it for you so that if you write in prod, its replicated (written) to DR. Ff you switch to DR, then its should replicate all the writes to PR. Once you failover to PR it should all be there.

 

I usually do a simple test

In PR make test_file

echo 1 >> test_file

switch to DR (using cluster or just netapp commands if you're not sure the netapp is working)

cat test_file (should show 1)

echo 2 >> test_file

switch to PR (using cluster or just netapp commands if you're not sure the netapp is working)

cat test_file (should show 1 and 2)

 

uvahpux
Level 4
Partner

Hi Riaan,

Let me try it tomorrow and will update you.

 

Thanks.

Clifford_Barcli
Level 3
Employee Accredited Certified

uvahpux:

Check the netappsnapmirror agent log.

 

Normally, on graceful switch, after the application is shutdown and volumes unmounted, the snapmirror agent reverses the roles of the snapmirror pair.   From the netapp console, you will see the status as "snapmirror" but the source/destination will be reversed.

If, however, the filers are not talking to each other properly, then the remote site will assume that the primarly is down, so, it forcefully breaks the relationship.  We call this "take over".    If you were to failback, then you would not see any updates at the primary.

Check the status from netapp console.  Most likely you will see the status as "broken" for that pair.   To fix, right-click on the pair and perform a resync.  Make sure you select the correct direction :)

 

I am working with another customer who is having a similar issue.   You mention that NetApp support fixed a communications problem.  Can you give some details on what they found and what they did to fix it?

 

I will update this post if I fix my own problem.

 

Cheers

uvahpux
Level 4
Partner

Hi Clifford,

The snapmirror configuration file at DR filer is similar to following ( fas1:vol1 fas2:vol2 ) so in this case the replication is working properly after the resync from console.  However after the switch over to DR site the snapmirror resource did not come and but onlineing the service group from DR site is working. So i am assuming that the replications is not happening from DR to Primary and i need to put some entrty similiar to following to Primary filer snapmirror.conf ( fas2:vol2 fas1:vol1 ) in order to relplicate the changes being written at DR site.

 

2014/11/17 14:40:46 VCS ERROR V-16-20059-1109 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:Failed to get details for connection name -netapp1-ctrl2, SnapMirror error 13001 :: Unable to get connection from registry.
2014/11/17 14:40:46 VCS ERROR V-16-20059-1106 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:SnapMirror over multiple paths is configured but failed to get details of connection -netapp1-ctrl2
2014/11/17 14:40:46 VCS ERROR V-16-20059-1100 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:Failed to configure connection  -netapp1-ctrl2:vol_fc on filer -netapp1-ctrl2.
2014/11/17 14:40:46 VCS WARNING V-16-20059-2004 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:Migrate failed for volume vol_fc; administrative intervention required
2014/11/17 14:40:47 VCS INFO V-16-2-13716 (node3) Resource(NNMHA-SnapMirror): Output of the completed operation (online)
==============================================

2014/11/17 14:40:09 VCS INFO V-16-20059-2007 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:Remote filer netapp1-ctrl2 is alive. Will attempt to migrate
2014/11/17 14:40:11 VCS NOTICE V-16-20059-1028 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:snapmirror update initiated for volume netapp4-ctrl1:vol_fc_SnapMirrored
2014/11/17 14:40:20 VCS INFO V-16-1-10298 Resource NNMHA-ip (Owner: Unspecified, Group: NNMHA) is online on node3 (VCS initiated)
2014/11/17 14:40:42 VCS NOTICE V-16-20059-1043 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:Attempting to quiesce snapmirror location netapp4-ctrl1:vol_fc_SnapMirrored
2014/11/17 14:40:42 VCS INFO V-16-20059-1047 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:snapmirrored location netapp4-ctrl1:vol_fc_SnapMirrored has been successfully quiesced
2014/11/17 14:40:42 VCS NOTICE V-16-20059-1023 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:Attempting to break snapmirror location netapp4-ctrl1:vol_fc_SnapMirrored
2014/11/17 14:40:46 VCS INFO V-16-20059-1030 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:snapmirrored location netapp4-ctrl1:vol_fc_SnapMirrored has been successfully broken-off
2014/11/17 14:40:46 VCS ERROR V-16-20059-1109 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:Failed to get details for connection name netapp1-ctrl2, SnapMirror error 13001 :: Unable to get connection from registry.
2014/11/17 14:40:46 VCS ERROR V-16-20059-1106 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:SnapMirror over multiple paths is configured but failed to get details of connection netapp1-ctrl2
2014/11/17 14:40:46 VCS ERROR V-16-20059-1100 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:Failed to configure connection  netapp1-ctrl2:vol_fc on filer netapp1-ctrl2..
2014/11/17 14:40:46 VCS WARNING V-16-20059-2004 (node3) NetAppSnapMirror:NNMHA-SnapMirror:online:Migrate failed for volume vol_fc; administrative intervention required
2014/11/17 14:40:47 VCS INFO V-16-2-13716 (node3) Resource(NNMHA-SnapMirror): Output of the completed operation (online)