cancel
Showing results for 
Search instead for 
Did you mean: 

Netapp Filer resource stuck in an odd state

Ashon1
Level 4
Partner

Hi,

 

I'm trying to setup a SQL cluster running VCS.  It has two servers and a netapp for storage.  The cluster seems to be happy but I can not get the SQL service group to come online.  When I run hares -state NETAPP -sys CLUSTER it says online|state unknown.  I have probed it, tried to online it, offline it, reboot it and I can't change this state.  What am I missing? Is there something else that I am supposed to do?

Also, I have a bunch of Agentframework 99 error messages in the event viewer.  The filer is configured on the correct IP address with the correct name.  The DNS name is registered to the correct address.  I can ping it.  I have dumped the DNS cache and successfully tried to ping it again.  Still get the same message.

13 REPLIES 13

Wally_Heim
Level 6
Employee

Hi Ashon1,

The NetAppFiler resource (or any other resource for that matter) reports an "Unknonw" state when it is not able to probe correctly. 

In a new installation, this is typically a configuration issue.  However, the NetAppFiler resource is very simple to configure and it sounds like you have already checked it's two attributes for correct values.

You might get more details on what the NetAppFiler resource is having problems with by checking the %vcs_home%\log\NetAppFiler_A.txt log.  This might tell you exactly what the issue is.

From the Bundled Agents Guide for 5.1 SP2, the NetAppFiler resource does two things during a Monitor. 

1. It pings the NetApp Filer

2. It connects to the filer and checks the ONTAP version to see if it is supported.

Since you said you can ping the Filers from the server, my guess would be that you are running an unsupported version of ONTAP on the Filer.  Please check the ONTAP version to see if it supported.

 

Supported versions of the ONTAP software and other NetApp required software, are listed in the VCS Installation and Admin Guide on page 21.  Here is the excerpt from that page:

 

Supported applications
The supported versions of Network Appliance applications and other other
applications are as follows:
■ Network Appliance SnapManager for Exchange 3.2 with Exchange Server 2003
■ Network Appliance SnapManager for Exchange 4.0, 5.0, 6.0 with Exchange
Server 2007
■ Network Appliance SnapManager for SQL 2.0, 2.1, and 5.0
■ Network Appliance Data ONTAP 7.3, 7.3.3
■ Network Appliance SnapDrive 4.1, 4.2.1, 5.0, 6.0, 6.1, and 6.2
When installing SnapDrive, you must specify a user account in the SnapDrive
Service Credentials dialog box. The user account must be a domain user and
part of the Administrators group of the local system and the filer.
■ Data ONTAP DSM for Windows MPIO 3.1, 3.2, 3.3, 3.3.1
■ Microsoft iSCSI software initiator version 2.03 or later versions

 

Please check the VCS Installation and Upgrade guide for the specfic version of VCS that you are running as required Netapp software versions have changed with different versions of VCS.

Thanks,

Wally

Ashon1
Level 4
Partner

No dice.  We have this working on one of our clusters... and it isn't working on the secondary cluster.  Both netapps are running 7.3.3P1 ONTAP.  Both sets of servers are now running snapdrive 6.2.  User for snapdrive is domain admin and netapp admin.  There is no MPIO.  iSCSI initiator is version 2.08.  Everything has been rebooted.  connections and service group were deleted and recreated.

Ashon1
Level 4
Partner

Litterally hundreds of messages like this.  I have checked the DNS server, the DNS cache, and pinged it.  It does resolve.

 

2011/07/11 13:37:05 VCS ERROR V-16-20031-99 NetAppFiler:HBSQL2008_SERVICEGROUP-NetAppFiler:monitor:The filer name hbnetapps.HAPPYBANK.local does not resolve to the assigned storage IP XX.YY.3.ZZ
2011/07/11 13:37:35 VCS ERROR V-16-20031-99 NetAppFiler:HBSQL2008_SERVICEGROUP-NetAppFiler:monitor:The filer name hbnetapps.HAPPYBANK.local does not resolve to the assigned storage IP XX.YY.3.ZZ

Wally_Heim
Level 6
Employee

Hi Ashon1,

Have you checked to see what is reported in the %vcs_home%\log\NetAppFiler_A.txt log? 

This is the agent log for the NetAppFiler resource and it should have captured the error that it had when doing the probe.  If nothing is there then you might need to increase the logging level or open a case with Symantec Technical Support to troubleshoot this issue further.

Thanks,

Wally

Wally_Heim
Level 6
Employee

HI Ashon1,

It looks we were both posting at the same time.

You say that it does resolve in DNS.  Have you checked both forward and reverse lookup zones?  VCS sometimes double checks the reverse lookup zone to ensure that the IP resolves back to the correct name.

Other than that I would recommend opening a support case and providing hagetcf outputs from both the working and non-working clusters.

Thanks,

Wally

Ashon1
Level 4
Partner

Posted separately earlier... bunches of messages just like this.

2011/07/11 13:37:05 VCS ERROR V-16-20031-99 NetAppFiler:HBSQL2008_SERVICEGROUP-NetAppFiler:monitor:The filer name hbnetapps.HAPPYBANK.local does not resolve to the assigned storage IP XX.YY.3.ZZ
2011/07/11 13:37:35 VCS ERROR V-16-20031-99 NetAppFiler:HBSQL2008_SERVICEGROUP-NetAppFiler:monitor:The filer name hbnetapps.HAPPYBANK.local does not resolve to the assigned storage IP XX.YY.3.ZZ

Ashon1
Level 4
Partner

Running hagetcf now...  didn't check reverse.  Will check now.

Ashon1
Level 4
Partner

any particular file that would be interesting?

Wally_Heim
Level 6
Employee

Hi Ashon1,

I would check the NetAppFiler_A.txt log that you have already checked.  In addtion, I would compare the entries in the error message with the NetAppFiler resource to make sure that the message and resource are for the same server and IP addresses.

From there, I would compare the network.txt files from the working and non-working nodes to see if they are using different DNS servers.

After that I would try increasing logging for the NetAppFiler resource type to see if I can get more details of why it thinks the name and IP don't match.

Thanks,

Wally

Ashon1
Level 4
Partner

checked name and ip address from NetAppFiler_A.txt before.  They are correct.  Everyone is using the same DNS servers.  I am going to increase the logging to see what information I get.

Ashon1
Level 4
Partner

IP and server names and DNS configured correctly.

two things

1) logging doesn't seem to be adding anything.  maybe i did this in the wrong place?

2) looking at hares -display for both clusters... these things are different

  • the arglistvalues are different.  the cluster that works only has "".  the one that doesn't has FilerName 1 <<storage device name>> StorageIP
  • flags are different. the cluster that works is empty.  the one that doesn't |STATE UNKOWN|
  • state is also different.  the cluster that works says ONLINE.  the one that doesn't ONLINE|STATE UNKNOWN

Wally_Heim
Level 6
Employee

Hi Ashon1,

The logging should be the LogDBg attribute and you should add DBG_1, DBG_2, DBG_20 and DBG_21 to it on seperate lines.

I'm not sure if the arglistvalues that you are looking at are a problem or not.  The other two items are not a problem they are just the difference in the resource states and are not unexpected in your current state.

You should open a case with support so that we can work more directly with you to resolve this issue.

Thanks,

Wally

joseph_dangelo
Level 6
Employee Accredited

Ashon,

I recently encountered the same error when deploying VCSW 5.1 SP2 for one of my customers.  We eventually configured the Filer Agent using an alternate IP address for the StorageIP attribute. We also had to use the FQDN for the FilerName attribute This resolved the UNKNOWN state. 

Hope this helps.

 

Joe D