I have a strange thing going and can't figure out where to look.
I have is a SQL-agent that only starts manually but not when under control of VCS. When VCS starts the service it changes the state to "Starting". After a couple of seconds it stops and generates an error messages.The rest of the servicegroup (also the SQLServer service) works just fine under VCS.
Event Type: Error
Event Source: SQLAgent$SAPSQL07
Event Category: Service Control
Event ID: 103
Time: 8:14:49 AM
SQLServerAgent could not be started (reason: Unable to connect to server 'XXXXXXXXXSQL07\SAPSQL07'; SQLServerAgent cannot start).
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
This also happens on one of the two nodes.???
I'm using SFWHA for windows on a W2K3 SP2 enterprise x64 server and have installed SQL2005 instances
when you try to start the agent, notice the amount of time it takes to start the agent .. post that have a check for how much "onlineTimeout" value is defined for this resource..
quite possible that VCS times out the resource (default 300 sec) if resource is not started by then..
If its happening on only one node, I would suggest to compare the onlinetimeout value for both the nodes..
hatype -display <SQL-agent>
VCS may attempt to restart the service depends on the value of retrylimit variable ..
Can you paste the online & monitor script of the SQL agent here ? Also, I would suggest to compare them once ..
As you say, it is able to connect fine manually, I trust the connectivity part is OK, & also one node works fine .. so something is different in one node, trying to see what is that ..
The red question mark says to me that the resource could not be fully probed.
This is normally due to some missing required attribute(s).
SQL Server 2005 Agent service agent required attributes:
SQLServer2005ResName : The name of the SQLServer2005 resource on which the SQL Server 2005 Agent service resource depends.
LanmanResName : The Lanman resource name on which the SQL Server 2005 resource depends.
Please post your main.cf. (normailly in C:\Program Files\VERITAS\cluster server\ conf\config)
This is likely to be an issue with the connection parameters specified in the SQLAgent resource.
Have you checked the credentials supplied for the VCS resource to ensure that they have the correct permissions etc? Is the service set to start as LocalSystem under Service Control Manager or does it have user account details specified?
You could also check the GenericService_A log for additional information on why the SQLAgent failed to online.
Hi Dave and eu22106,
You are getting your versions of SQL confused. With SQL 2005 the SQL Agent service is controlled by the SQLAgService2005 resource and not the GenericService resource.
The SQLAgService2005 resource only has two attributes SQLServer2005ResName and LanmanResName. These two attribute should point to the names of the SQLServer2005 resource and the name of the Lanman resource. In eu22106's example, these attributes should be set to "SG_SAPSQL07-SQLServer2005" and SG_SAPSQL07_Lanman" respectively.
The SQLServerAg2005_A.txt log and the event logs should help with more details on what is the problem. Since the SQL Agent service can be started manually, I would think that VCS is getting an access denied when it tries to start it up in the cluster.
The main difference with VCS starting up the service verse manually starting the service SCM would be what server context the service starts up in. VCS will try to start the service up in the virtual server context where SCM will start it up in the phyiscal node name context.
Check the Lanman resource to ensure that the DNSUpdateRequired, DNSCriticalForOnline, ADUpdateRequired and ADCriticalForOnline attributes are all set to True and that the Lanman resource can be onlined with these settings. This will setup the virtual server to be able to perform kerberos security which should be what the SQL Agent service needs to connect to SQL.
If the Lanman attributes are set or do not resolve the issue when set, check the SQL server client and server network settings to see what if the network connection security settings have been altered from its default settings. If needed try restoring the settings back to default values until the the SQL Agent service is able to start and connect to the running SQL instance. It is sometimes a bit of trial and error with these settings until you find out exactly what is causing the problem.
I have in rare instances replaced the SQLAgServer2005 resource with the GenericService resource. so that I can setup the User account details as menitoned by Dave. But this should be considered a last resort type of step.
Let us know if you make any progress or if you still need further assitance.
Your main.cf configuration looks good. I would recommend that you open a support case at this time. They will want to look at your agent logs and the event logs to see if they can determine why the agent service is not starting.
You might need to increase the logging level for the SQLAGService2005. I'm sure they will want to see those logs before recommending addtional logging levels.
Please elaborate on this Registry entry? If this is done outside of VCS, you will probably have the same issue when the cluster is failed over to the 2nd node.
It should not be necessary to perform any non-VCS workarounds - The Sqlserver needs to be able to connect to SQLserver virtual name that is specified in the Lanman resource:
Lanman SG_SAPSQL01-Lanman ( VirtualName = WNTSV520SQL01
As per Wally's post - verify that the Service Account has the necessary permissions to update DNS & AD.