Solved: DA scheduled searches best practices

NaturesRevenge · ‎09-20-2013

EV/DA 904. SQL 2008.

We have a separate Discovery Accelerator instance for our internal General Counsel group. They have nearly 100 open DA cases in their Litigation instance. The paralegals insist on running DAILY scheduled searches on each one of these cases. Needless to say, this much processing cripples their DA instance. All of these scheduled searches will stall and not move. Immediate searches can be issued but they, too, sit their with zero progress. I have created three different windows of time that they can assign their scheduled searches to, but this does not seem to help.

Where do I start? Is thing going to be a combination of modifying the way they perform their searches, in addition to perhaps thinking of a second (or third) Discovery Accelerator instance? IMHO, daily scheduled searches is excessive, but I'm just the technology nerd. :)

I appreciate any insight!

AJ

Kenneth_Adams · ‎09-23-2013

Hello, AJ;

The maximum number of scheduled searches that can be successfully run at any given time really depends upon the DA environment as a whole. This includes things such as (but not limited to) the DA customer database health, SQL Server hardware configuration, SQL Server software configuration, EV server hardware configuration, EV server recommended optimizations implemented, number of index volumes being searched, health and size of the index volumes being searched, DA server hardware configuration, DA server recommended optimizations implemented, DA Customer recommended optimizations implemented, and complexity of the search criteria.

If you've not already done so, please review and implement our recommended optimzations on the EV, DA, and SQL Servers as noted in article TECH56172, "Recommended steps to optimize performance on Enterprise Vault (EV), Compliance Accelerator (CA), Discovery Accelerator (DA), and SQL Servers in an EV environment", available at http://www.symantec.com/docs/TECH56172. Note the recommendations are for 32-bit Windows servers and we identify where some are not needed for 64-bit Windows servers.

The search criteria complexity and the size of the index volumes being searched are important because EV 9 still uses the 32-bit search engine provided with the Alta Vista indexing engine. The IndexServer process is the process that conducts the searches. Each IndexServer process that is spawned to conduct a search is limited to one index volume at a time AND a maximum of 2 GB of RAM to use. When the amount of RAM used by an IndexServer process gets to 1.5 GB or larger, the process can become unstable (this is a Windows limitation, not an IndexServer process limitation).

Into the 2 GB of RAM for each IndexServer process must fit
1) the search criteria
2) the index volume contents
3) the hits matching the search criteria.

We've found that index volumes larger than 3 GB do not do as well during DA searches as the index volume contents must be paged in and out of memory more frequently and more times that if the index volume was under 3 GB. You can check on the physical size of any index volume by right clicking on its folder in Windows Explorer and selecting the Properties option. The result will tell you the size on disk of the folder, which is the size of the index volume that needs to be loaded into memory.

The health of the index volume can play largely into the amount of time needed to search against it. A fragmented index volume can take significantly longer to load into memory. The 32-bit index engine uses the concept of buckets and teirs for containing the indexing information. As items are added to and index volume, they are placed into the buckets and teirs, starting with the top-most buckets and teirs and working down toward the lowest. An optimal index volume has all of its data in the top-most buckets and teirs. The 32-bit index engine routinely performs minor and major compactions of this data for active index volumes, but older index volumes may be left in a fragmented condition due to insufficient activity occurring before the roll over of the volumes. Creating an empty file named 'Compact.task' in an index volume's folder and then running the "Update Index Volume" option against the volume will cause a major compaction to occur. Note that no search can be run against an index volume that is undergoing a major compaction, but the end result will allow much more efficient searching.

Also, searches with several wild cards cause the search complexity to grow quickly, causing the space needed in the 2 GB of RAM to be taken by more of the search criteria. To check the compexity of any search's criteria, obtain a dtrace of the IndexServer process on the EV indexing server while a search is running. Let the dtrace run for about 10 minutes, or until you see index volumes show in the status pane of the running search. Look in the dtrace log for 'Parsed Query' to see the search criteria. Any search that has multiple 'Parsed Query' lines could be considered a complex query. The more 'Parsed Query' lines a search has for a single index volume, the more complex the search criteria.

Please go through the optimizations and let us know if you had to change anything. Of particular note for optimizations are:
1) 'Optimize Searches based on oldest and youngest items' DA configuration setting will allow the search to skip any index volumes that we know do not contain items within the date range of the search criteria, thereby making the DA search more efficient by only searching the index volumes that do contain data within the search date span.
2) 'Maximum number of consecutive searches against the same index volume' option prevents multiple searches from trying to search any given index volume at one time as multiple searches trying to run against the same index volume can cause timeouts that could lead to the index volume being marked as failed.
3) The 'AVSMaxLog' registry value on the EV Indexing Server, when set to 500 million (decimal) will keep the index volume size to about 3 GB maximum.
4) The SQL Server's maximum memory configuration setting needs to be set to some value under the amount of RAM installed on the SQL server. We recommend having this set to 75% of the maximum amount of physical RAM if only 1 instance of SQL is installed. If multiple instances of SQL are installed on the same physical machine, the total maximum memory settings for all instances should be 75% of the physical RAM.

I hope all of this information helps. I know it's a log of information, but it has been collected over the last 6 years that I've been working with DA. Please let us know if any of this helps to get your scheduled searches working again.

View solution in original post

Kenneth_Adams · ‎09-23-2013