07-11-2011 10:47 AM
Hi,
I'm trying to setup a SQL cluster running VCS. It has two servers and a netapp for storage. The cluster seems to be happy but I can not get the SQL service group to come online. When I run hares -state NETAPP -sys CLUSTER it says online|state unknown. I have probed it, tried to online it, offline it, reboot it and I can't change this state. What am I missing? Is there something else that I am supposed to do?
Also, I have a bunch of Agentframework 99 error messages in the event viewer. The filer is configured on the correct IP address with the correct name. The DNS name is registered to the correct address. I can ping it. I have dumped the DNS cache and successfully tried to ping it again. Still get the same message.
07-11-2011 12:12 PM
Hi Ashon1,
The NetAppFiler resource (or any other resource for that matter) reports an "Unknonw" state when it is not able to probe correctly.
In a new installation, this is typically a configuration issue. However, the NetAppFiler resource is very simple to configure and it sounds like you have already checked it's two attributes for correct values.
You might get more details on what the NetAppFiler resource is having problems with by checking the %vcs_home%\log\NetAppFiler_A.txt log. This might tell you exactly what the issue is.
From the Bundled Agents Guide for 5.1 SP2, the NetAppFiler resource does two things during a Monitor.
1. It pings the NetApp Filer
2. It connects to the filer and checks the ONTAP version to see if it is supported.
Since you said you can ping the Filers from the server, my guess would be that you are running an unsupported version of ONTAP on the Filer. Please check the ONTAP version to see if it supported.
Supported versions of the ONTAP software and other NetApp required software, are listed in the VCS Installation and Admin Guide on page 21. Here is the excerpt from that page:
Supported applications
The supported versions of Network Appliance applications and other other
applications are as follows:
■ Network Appliance SnapManager for Exchange 3.2 with Exchange Server 2003
■ Network Appliance SnapManager for Exchange 4.0, 5.0, 6.0 with Exchange
Server 2007
■ Network Appliance SnapManager for SQL 2.0, 2.1, and 5.0
■ Network Appliance Data ONTAP 7.3, 7.3.3
■ Network Appliance SnapDrive 4.1, 4.2.1, 5.0, 6.0, 6.1, and 6.2
When installing SnapDrive, you must specify a user account in the SnapDrive
Service Credentials dialog box. The user account must be a domain user and
part of the Administrators group of the local system and the filer.
■ Data ONTAP DSM for Windows MPIO 3.1, 3.2, 3.3, 3.3.1
■ Microsoft iSCSI software initiator version 2.03 or later versions
Please check the VCS Installation and Upgrade guide for the specfic version of VCS that you are running as required Netapp software versions have changed with different versions of VCS.
Thanks,
Wally
07-11-2011 01:36 PM
No dice. We have this working on one of our clusters... and it isn't working on the secondary cluster. Both netapps are running 7.3.3P1 ONTAP. Both sets of servers are now running snapdrive 6.2. User for snapdrive is domain admin and netapp admin. There is no MPIO. iSCSI initiator is version 2.08. Everything has been rebooted. connections and service group were deleted and recreated.
07-11-2011 01:43 PM
Litterally hundreds of messages like this. I have checked the DNS server, the DNS cache, and pinged it. It does resolve.
2011/07/11 13:37:05 VCS ERROR V-16-20031-99 NetAppFiler:HBSQL2008_SERVICEGROUP-NetAppFiler:monitor:The filer name hbnetapps.HAPPYBANK.local does not resolve to the assigned storage IP XX.YY.3.ZZ
2011/07/11 13:37:35 VCS ERROR V-16-20031-99 NetAppFiler:HBSQL2008_SERVICEGROUP-NetAppFiler:monitor:The filer name hbnetapps.HAPPYBANK.local does not resolve to the assigned storage IP XX.YY.3.ZZ
07-11-2011 01:45 PM
Hi Ashon1,
Have you checked to see what is reported in the %vcs_home%\log\NetAppFiler_A.txt log?
This is the agent log for the NetAppFiler resource and it should have captured the error that it had when doing the probe. If nothing is there then you might need to increase the logging level or open a case with Symantec Technical Support to troubleshoot this issue further.
Thanks,
Wally
07-11-2011 01:49 PM
HI Ashon1,
It looks we were both posting at the same time.
You say that it does resolve in DNS. Have you checked both forward and reverse lookup zones? VCS sometimes double checks the reverse lookup zone to ensure that the IP resolves back to the correct name.
Other than that I would recommend opening a support case and providing hagetcf outputs from both the working and non-working clusters.
Thanks,
Wally
07-11-2011 01:49 PM
Posted separately earlier... bunches of messages just like this.
2011/07/11 13:37:05 VCS ERROR V-16-20031-99 NetAppFiler:HBSQL2008_SERVICEGROUP-NetAppFiler:monitor:The filer name hbnetapps.HAPPYBANK.local does not resolve to the assigned storage IP XX.YY.3.ZZ
2011/07/11 13:37:35 VCS ERROR V-16-20031-99 NetAppFiler:HBSQL2008_SERVICEGROUP-NetAppFiler:monitor:The filer name hbnetapps.HAPPYBANK.local does not resolve to the assigned storage IP XX.YY.3.ZZ
07-11-2011 01:57 PM
Running hagetcf now... didn't check reverse. Will check now.
07-11-2011 02:08 PM
any particular file that would be interesting?
07-11-2011 02:18 PM
Hi Ashon1,
I would check the NetAppFiler_A.txt log that you have already checked. In addtion, I would compare the entries in the error message with the NetAppFiler resource to make sure that the message and resource are for the same server and IP addresses.
From there, I would compare the network.txt files from the working and non-working nodes to see if they are using different DNS servers.
After that I would try increasing logging for the NetAppFiler resource type to see if I can get more details of why it thinks the name and IP don't match.
Thanks,
Wally
07-11-2011 03:16 PM
checked name and ip address from NetAppFiler_A.txt before. They are correct. Everyone is using the same DNS servers. I am going to increase the logging to see what information I get.
07-11-2011 04:56 PM
IP and server names and DNS configured correctly.
two things
1) logging doesn't seem to be adding anything. maybe i did this in the wrong place?
2) looking at hares -display for both clusters... these things are different
07-12-2011 06:10 AM
Hi Ashon1,
The logging should be the LogDBg attribute and you should add DBG_1, DBG_2, DBG_20 and DBG_21 to it on seperate lines.
I'm not sure if the arglistvalues that you are looking at are a problem or not. The other two items are not a problem they are just the difference in the resource states and are not unexpected in your current state.
You should open a case with support so that we can work more directly with you to resolve this issue.
Thanks,
Wally
07-12-2011 05:52 PM
Ashon,
I recently encountered the same error when deploying VCSW 5.1 SP2 for one of my customers. We eventually configured the Filer Agent using an alternate IP address for the StorageIP attribute. We also had to use the FQDN for the FilerName attribute This resolved the UNKNOWN state.
Hope this helps.
Joe D