Several days ago I was asked by a user as to whether I knew of any issues regarding high I/O on the SQL Server after upgrading Enterprise Vault, the simple answer is that I don’t know of any issues, but I do know how to help identify any issues.
The easiest way to find out what is actually happening is to use a combination of SQL Server Profiler and Performance Monitor, the servers we will be concentrating on are Windows Server 2008 and SQL Server 2008.
So first of all, when we’re talking about I/O, for the purposes of this demonstration will be Disk Queue Lengths as opposed to Network Traffic. First if we are using Windows Server 2008 to host the SQL Servers, we will check the Performance Monitor to see exactly what files are heavily in use.
Task Manager and the Performance Monitor
Here you can now see which files the sqlserver.exe has open and in the File category, you can see exactly which file is being used the most, so if you are seeing something other than an EV Database (such as EnterpriseVaultDirectory, vault store or fingerprint databases), then you should investigate those files first.
Figure 1: Resource Monitor showing Files in Use on the Disk
The resource monitor is useful for seeing whether files are heavily being used by Anti-Virus, Backup software (such as BPKAR for netbackup) or some other disk intensive process other than SQL Server.
Again for the purposes of this demonstration, it is an Enterprise Vault database that has been identified, so we can zero on it on these databases and find out what queries or stored procedures are to blame.
Setting up the Performance Monitor
Before setting up the performance monitor, there are a couple of things to note, best practices dictate that you run the performance monitor from a server other than the one you wish to monitor.
This is because the performance monitor can have an adverse affect on CPU, Network and Disk Use due to the fact that the perfmon will be taking its own resources and writing out to disk. If you wish to run the Performance Monitor on the local server that is fine, if you do it remote, it has to be in the same time zone for it to be accurate
To start the performance monitor, you would then Right Click the counter that you just created and then click “Start” and then right click and press “Stop” when you are ready to, however do not start the counter just yet as we need to set up the SQL Profiler
Figure 2: Performance Monitor showing the “DiskQueues” monitor we just set up
Setting up the SQL Profiler Trace
SQL Profilers can be deceptively difficult to set up properly when you first start to get to grips with this tool, as there are just a multitude of options, templates, and ways to filter results, it is however one of the most important tools with in SQL to master.
For this demonstration we will not be including Execution plans or anything like that, we will be keeping to as Vanilla a trace as possible.
If you do start to build up complex traces, I would highly suggest saving it out as a template once you have completed setting it up just the way you want.
The reason we took out the ExistingConnection and Audit Login/Logout is that they’re useful for a specific set of troubleshooting maybe when you have an application that is exhausting the amount of allowed connections and you’re not sure whether its logging out properly. However in this circumstance its needless noise.
Figure 3: SQL Server Profiler set up to run the traces we need
Running the Traces and Performance Collector
Figure 4: Selecting the Performance Counters to load
Analyzing the Combined Trace and Performance Counters
Now that we have loaded up the Trace file and imported the performance data, we will now see a graph of each disks that has been collected along with its peaks that you would normally see in a regular performance monitor.
The real value in this however is that you can select a peak in the graph and based on the times matching up it will show you which SQL Query was being run at that time. This is why it was important to make sure the server you ran the PerfMon from has the same time stamps as the SQL Server itself, if they are off by a couple of minutes, this can skew the results and give you some false positives.
Note that the more counters you load up, the messier the graph will be and also the slower performance you will see, so if you’re seeing CPU spikes, then make sure you only select CPU, if you’re seeing Network Traffic then use the network adapters, also note that in this test it contains 268 counters, which will be a mess if you load them all up, so we’re concentrating only on Avg Disk Queue Lengths.
Figure 5: Combined SQL Profiler Trace and Performance Monitor Counters
From the above we can see the different spikes where the Disk Queue Lengths have grown, also note that you can see the Minimum and Maximum values for each disk listed. The really cool thing is though, that when you select a peak in the perfmon, it will highlight the SQL Command that coincides with that timeline. So for this demonstration, we are going to select the middle tall blue peak
Figure 6: Selecting one of the peaks as shown by the Performance Monitor graph
Now that we’ve selected the peak, the very next thing that will occur is that in the top half of the screen we will then see the coinciding SQL event, in this case it is a large query that I used to simulate a large read that would cause a bump in the avg disk queue lengths.
Figure 7: Profiler highlighting the SQL Query that coincides with the Peak
A couple of things to note about the above image, is not only can you see what query was performed, in this case it was a SELECT TOP 5000 query going across multiple databases, but you can also see the Application Name and which user ran it.
This is useful for a couple of things, if you have people running queries manually, you can see the user running it and more importantly where they are running it from? Is it from a script? From the management studio etc?
When Enterprise Vault is running queries and commands, the ApplicationName will be shown as “Enterprise Vault” and the user will be the EVAdmin, so this is useful for knowing that it really is EV that is causing the bottlenecks.
That being said, the SQL Profiler and Perfmon combination can be misleading or yield few results if you have had the processes and queues at a particular high the entire time, so if you have a system where the disks are being highly utilized and you start this, you will see that nearly every query could be accountable.
In this scenario, you would be best to look in the Resource Manager, determine the files that are heavily utilized, and then filter your sql profiler to only include those databases, and then to stop the EV Services, and then start them and wait for it to become highly active again and then you can see what particular query or set of queries is initiating the high bandwidth.
Figure 8: SQL Profiler showing the “Enterprise Vault” Application Name
From the above image we can see that the Application Name is now Enterprise Vault and that it is doing a fairly common queries for getting tasks and their ID’s etc and that it is running under the context of the EVAdmin.
Another notable aspect is that when you select a query any where in the profiler, it will also move the line in the graph to show exactly where in time that query is executed and the effect it has on the disk queues, in this instance it shows a small peak but nothing too drastic.
The last trick that we have, is that in your list of counters at the bottom, it will show a minimum count, max count, average count, and the selected count. When you select a peak the “Selected Count” will change to what ever we have selected.
But if you right click the drive, you can then select “Go to Max Value”, when you do this, you will then be taken to the SQL Command executed and also in the graph what that peak represents
Figure 9: Selecting the Max Value on a particular drive