cancel
Showing results for 
Search instead for 
Did you mean: 

VCS SQL Anonymous Logon?

phil_e
Level 2

Hi,

I have a two node SQL cluster running under VCS (Storage Foundation HA 5.1 SP1 for Windows).  The cluster comprises of two service groups each hosting a named SQL 2005 instance.

I am seeing anonymous login failures in the SQL logs for each instance.  These are occurring every 6 hours (pretty much to the second) and the source ip address is that of the opposing cluster node.

For example, instance A, hosted on node1 logs the error:

"Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. [CLIENT: x.x.x.x]"

where x.x.x.x is the ip address of physical node2

 

instance B, hosted on node2 logs the error:

"Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. [CLIENT: y.y.y.y]"

where y.y.y.y is the ip address of physical node1

 

A couple of points to note:

- The source IP addresses relate to the physical VCS hosts and not to the hosting SQL Servers

- With the exception of the SQL Browser all SQL Services are running as named user accounts and not as system users

- If I switch the service groups (move instance A to node2 and instance B to node1) the errors reverse

- There are no scheduled SQL jobs and a SQL profile trace does not pick up any additional information

- The errors do not occur with both service groups online on the same node

 

Whilst I appreciate this appears to be more of a SQL related issue than a VCS one, I wanted to be sure before potentially logging this with MS.

Does anyone recognise these symptons or have any idea whether this could be VCS/SFHA related? 

These login failures do not appear to be causing any further errors. 

Any advice/suggestions greatly appreciated....

 

Phil

1 ACCEPTED SOLUTION

Accepted Solutions

Wally_Heim
Level 6
Employee

Hi Phil,

 

VCS would be monitoring SQL every 60 seconds on the online node and every 300 seconds on the offline node.  If you have detailed monitoring enabled then VCS will log into SQL during the monitor interval and run the SQL script that was specified to run.

 

Other than that VCS does not interact with SQL and there is nothing is VCS that happens at a 6 hour interval.

 

I would look more toward SQL for this event.  Are your SQL instances linked?  If so, I wonder if the link is what is causing the failed login attempt?

 

If you can afford the down time, you might want to see if you still get the login failures with one of the SQL instances offline for over 6 hours.  I know that if this is production this may not be possible.  But it is a simple way to check if SQL is the cause here providing this configuration is still in the pre-production stage and you can take the extended down time of one of the instances.

 

Thanks,

Wally

View solution in original post

3 REPLIES 3

Wally_Heim
Level 6
Employee

Hi Phil,

 

VCS would be monitoring SQL every 60 seconds on the online node and every 300 seconds on the offline node.  If you have detailed monitoring enabled then VCS will log into SQL during the monitor interval and run the SQL script that was specified to run.

 

Other than that VCS does not interact with SQL and there is nothing is VCS that happens at a 6 hour interval.

 

I would look more toward SQL for this event.  Are your SQL instances linked?  If so, I wonder if the link is what is causing the failed login attempt?

 

If you can afford the down time, you might want to see if you still get the login failures with one of the SQL instances offline for over 6 hours.  I know that if this is production this may not be possible.  But it is a simple way to check if SQL is the cause here providing this configuration is still in the pre-production stage and you can take the extended down time of one of the instances.

 

Thanks,

Wally

phil_e
Level 2

Hi Wally,

 

Thank you for your reply, this has confirmed what I thought to be the case with regards to VCS accessing SQL.

 

For the record I am not using detailed monitoring or linked servers and it would seem that the login errors do not occur if the SQL instance on the originating node is offline (as I can now predict when the errors will occur (on a 6 hour schedule) I was also to take the instance offline just before the next error was due on the opposing instance).  I still find it odd that the source IP address for the error is that of the physical node as oppose to the SQL virtual server, I guess this could be attributed back to the SQL install? but that aside I think I am now confident enough that it is not a VCS component causing the errors.

 

The hosting servers are also Netbackup media servers, however the error still occurs with the Netbackup services stopped so I guess all angles now point towards SQL...

Looks like its over to MS for this one...

Thanks again,

Phil

Wally_Heim
Level 6
Employee

Hi Phil,

 

The source IP being the physical node IP is not odd.  This is typical for they way that Microsoft Windows handles outbound IP traffic.  The source IP could be any one of the IP addresses on the server for the outbound network.  It is typically the first bound IP address which is always the physical node IP in a cluster configuration.

All I can say is start shutting down services on the server at the 6 hour window until you findout which one is causing the failed log in.

The other way to look at this is possible from login security standpoint.  I would suggest looking to see if you have Kerberos errors during that sametime as the log in failures.  If you do you, check your Lanman resources to see if the DNS and AD update items are checked.  These are needed for Kerberos to function correctly but Kerberos would not be needed when connecting with the groups running on the same physical server.

Thanks,

Wally