cancel
Showing results for 
Search instead for 
Did you mean: 

Discovery Accelerator search results inaccurate

ImAlwaysSmiling
Level 4

We continue to experience inaccurate results which consist of items for a non-selected user that does not meet the criteria in the search.  We have tested with two different searches that produce very different results.  We use the following search criteria for a search that gives us not only the specific users hits for the day, but another user is included in the hits which appears to only meet the date criteria and not the to or from criteria selected.  This is created using a specific date range which is Oct. 3rd - October 3rd. Search terms to or from and we select a single user name by clicking on the target and custodian icon on the right.  Miscellaneous - Message type is set to all content sources.

On the other hand, when we run the same search by typing the name in the to and from field, "first last" then we do not get the additional user listed in the hit however, we do not get the Instant Messages either.

Has anyone heard of this?

Any assistance is greatly appreciated.

Thanks. 

1 ACCEPTED SOLUTION

Accepted Solutions

Kenneth_Adams
Level 6
Employee Accredited Certified

Mr. Adams?  Please call me Ken.  Mr. Adams was my Dad.  wink

The "Use Historical Information for Custodians and Custodian Groups" option can have an affect on what is included in the search criteria.  To provide some background to answer your question about this option, let me explain a little of how a DA search using Custodians is processed on the DA server prior to being submitted.

Each Custodian is referenced to their SMTP address or addresses by the use of an AddressOwnerID that is unique to each Custodian.  The SMTP address or addresses are kept in an address table.  The Custodian names are kept in an address user table and a principal table, with the address user table containing each Custodian's display name, last name, first name, etc.  When certain administrative actions are applied to a Custodian - such as removal of an SMTP address, deactivation or deletion - information for the custodian will be placed into an address history table.

When a DA search using a Custodian is created, the Custodian is referenced by their AddressOwnerID.  That AddressOwnerID is translated into the SMTP address or addresses that we have for that Custodian in the Custodian Manager database.  The SMTP address or addresses, which are obtained from the address table, are then configured into the search criteria before being passed to the search engine on the EV indexing server(s).

Now, when a search is being configured with the "Use Historical Information for Custodians and Custodian Groups" option enabled (which is enabled by default), the search creation process will also look in the address history table for any SMTP address or addresses that are associated with the Custodian's AddressOwnerID.  So, if your selected Custodian has an address history table entry of "0" for an address, you'll get this "0" address included in the search criteria.

It sounded like your Custodian had the "0" address in the address table, not the address history table.  That "0" address could have been manually added as the SMTP Address field in Custodian Manager is configured to take any text.  We occassionally recommend adding a Custodian's Display Name and variations of it into the SMTP Address field to allow for more matches to message.  This would be in the event that an e-mail was sent with only a Display Name indexed for the author or recipient (as in an archived message from the user's Drafts folder before the message was configured with any recipients).

I know checking all of the Custodians for a Display Name or Address of 0 can be very time consuming.  I also know that you may have other instances of a Display Name or Address of other numbers.  Here is a SQL query to run against your Custodian Manager database to see all instances of an address or display name of 0, 1, 2, 3, 4 or 5.  You can adjust the WHERE clause condition to include more numbers if you need by just adding a comma, space, and the next number surrounded by single quotes as you can see in the query.

SELECT ta.Address AS 'Address or Display Name'
     , tat.Name AS 'Address Type'
     , ta.AddressOwnerID
     , tau.DisplayName AS 'Custodian Display Name'
     , tau.FirstName AS 'Custodian First Name'
     , tau.Surname AS 'Custodian Last Name'
FROM tblAddress AS ta
JOIN tblAddressUser AS tau
  ON ta.AddressOwnerID = tau.AddressOwnerID
JOIN tblAddressType AS tat
  ON ta.AddressTypeID = tat.AddressTypeID
WHERE ta.Address IN ('0', '1', '2', '3', '4', '5')
ORDER BY ta.Address

You can run the same query against the address history table by changing the FROM line's table name from tblAddress to tblAddressHistory.

You can also change the WHERE statement to list all entries that have a 0 in the Address column to be
WHERE ta.Address LIKE '%0%'

I offer the above change to the query so you can see all Custodians that have a 0 in their address or display name.

You can use the query to quickly identify any Custodian(s) that may need to be accessed through Custodian Manager to remove the number entry from the Address and / or Display Name field.

Let us know if you find this information useful or if you need more clarification, please.  We'll be glad to help as we are able.

Ken

 

View solution in original post

12 REPLIES 12

TonySterling
Moderator
Moderator
Partner    VIP    Accredited Certified

What version of EV and DA?

Is this a target or custodian?

Have you verified the properties of the custodians\targets that are in the results?

You can also enable DA to save the search criteria.  This is very useful in troubleshooting issues like this.

ImAlwaysSmiling
Level 4

EV 9.0.1.1073

DA 9.0123

Custodian - The results include a user who the administrator has not allowed to be searched.  Basically that person does not even show up as a custodian.  Yes, we verified in the To and From since that along with the date are the only criteria that was set for the search.  It is a simple search and the requested custodian shows up as well as one that should not and is not in that criteria.

We did two searches with the same Oct. 3 date and then chose the custodian by checking the box c:first last and the other by typing the name "first last" and the latter of the two only has the correct user, but does not find any of the Instant Messages even though the all content sources is checked.

Thank you for your time.

 

 

Kenneth_Adams
Level 6
Employee Accredited Certified

A couple of suggestions:

  1. When you create the search, ensure the option to include historic information is not checked.  This option is checked by default and can cause searches to include custodians that you don't expect.
  2. Enable the 'Save Search Criteria' option as Tony recommended so you can obtain a copy of the search criteria that is actually sent to the search engine.  To enable this option:
  • Click on the Configuration tab.
  • Click on the Settings sub-tab.
  • Expand the Diagnostics section.
  • Click on the check box in the Value column of the 'Save Search Crirteria' option.
  • Click the Save button.
  • Click the OK button to acknowledge the pop-up stating the need to restart the Customer Background Tasks.
  • Close the DA Client.
  • On the DA server, restart the Enterprise Vault Accelerator Manager Service (we call this EVAMS for short).
  • When EVAMS has finished the restart, the DA server should have a new folder named "SearchCriterias" in the DA installation folder.
  • Launch the DA Client.
  • Run the search.
  • When the search has started processing the index volumes, check on the DA server in the "SearchCriterias" folder for a new folder named after the DA Customer.
  • Look in the folder named after the DA Customer for 3 files that were created (2 .txt and 1 .xml).
  • Look in either of the .txt files for the 'Native Query' section.
  • Look at the information in this section to see the actual search criteria passed to the search engine.

I suspect you'll see something like "Smith" OR "John", which would return results for every Smith and every John in your Custodian Manager database.  Of course, what you'll actually see will be showing the names you specified instead of "Smith, John".

Actually seeing the search criteria in this manner can help you determine the criteria specifications that you'll need to make to fine tune the search results.

 

ImAlwaysSmiling
Level 4

Thank you Ken.  I enabled the save search feature and I am still baffled as to why if I request a search for one user for one day that I get a that info and more.  Am I missing something?  It is a simple search.  I have to add that the custodian that should never come up in any searches sometimes come up in other searches as well.  This same person.  In this case, I can duplicate the problem over and over.  I attached my txt files from my search.  I would appreciate a second set of eyes.

Thanks.

name (ESQany - 0): 0 jdoe@domain.org
date (0): 10/3/2013 12:00:00 AM to 10/3/2013 11:59:59 PM
anum (0): 0 to 0
Vault.PolicyAction (ESQall): -EXCLUDE
***** The last 5 are And

Native Query:
-------------
((name:0 OR name:jdoe.domain.org) AND date:[03/10/2013-03/10/2013] AND anum:[anum:0-0] AND (ssid:{**} AND NOT qeaultzkxolicykiction:exclude) AND snum:[snum:1-4294967294]) AND (anum:[anum:0-0])
#rank  #rankend

Raw Query:
----------
"name" 0 "0 jdoe@domain.org"
"date" 0 [20131003:000000-20131003:235959]
"anum" 0 0-0
"VaultPolicyAction" 1 "-EXCLUDE"
"snum" 0 1-fffffffe
0,5

ImAlwaysSmiling
Level 4

I completed another search but instead of selecting the custodian from the right, I typed in the email address.  basically came up with this txt file.  However, I see a line that differs and not sure what that means.

((name:0 OR name:jdoe.domain.org)

(name:jdoe.domain.org AND date:[03/10/2013-03/10/2013] AND anum:[anum:0-0] AND (ssid:{**} AND NOT qeaultzkxolicykiction:exclude)

 

 

name (ESQany - 0): jdoe@domain.org
date (0): 10/3/2013 12:00:00 AM to 10/3/2013 11:59:59 PM
anum (0): 0 to 0
Vault.PolicyAction (ESQall): -EXCLUDE
***** The last 5 are And

Native Query:
-------------
(name:jdoe.domain.org AND date:[03/10/2013-03/10/2013] AND anum:[anum:0-0] AND (ssid:{**} AND NOT qeaultzkxolicykiction:exclude) AND snum:[snum:1-4294967294]) AND (anum:[anum:0-0])
#rank  #rankend

Raw Query:
----------
"name" 0 "jdoe@domain.org"
"date" 0 [20131003:000000-20131003:235959]
"anum" 0 0-0
"VaultPolicyAction" 1 "-EXCLUDE"
"snum" 0 1-fffffffe
0,5

Kenneth_Adams
Level 6
Employee Accredited Certified

You are correct in that the "(name:0 OR name:jdoe.domain.org)" is what is causing the unexpected results to your search.  If you've only selected one Custodian for that search, I recommend looking at the properties of that Custodian in Custodian Manager to see what is going on.  Look closely at the Display Name as well as all SMTP addresses that are present for that Custodian.  We're getting that 0 somewhere in those properties.

I suspect you'll find something similar in your other searches using Custodians where you receive unexpected results.  Something in Custodian Manager for those Custodians is causing this.  As Custodian Manager synchronizes with Active Directory or Lotus Note Directory OR can be updated manually, you'll need to look closely at the entries you're finding unexpected results returned.  Check their AD or LD properties to ensure nothing is out of the ordinary (i.e., all SMTP addresses are properly formatted with the user idenifying name and domain name).

Please continue to let us know what you find.

FYI, to explain what you see in the search criteria file, we'll actually look at the 'Raw Query' portion to provide details.

Raw Query:
----------
"name" 0 "0 jdoe@domain.org"
"date" 0 [20131003:000000-20131003:235959]
"anum" 0 0-0
"VaultPolicyAction" 1 "-EXCLUDE"
"snum" 0 1-fffffffe
0,5

The "name" parameter is the indexed item name for the From, To, Cc and Bcc fields combined into one.  The query is looking for any instance of the number 0 or the SMTP address of jdoe@domain.org.

The "date" parameter is the indexed item name for the message date.  This should be the message creation date, but could be a few other dates if the creation date is somehow missing from the message.  In your query, you're looking for a message date between midnight (00:00:00) on 03 Oct 2013 and 11:59:59 PM on 03 Oct 2013 (a full, single day).

The "anum" parameter is the indexed item name for the number of attachments.  The 0-0 means any number of attachments.

The "Vault PolicyAction" is the policy tagging that can be obtained through our Automatic Classification Engine (ACE) for EV 7, 8, or 9, or Data Classification Service (DCS) for EV 10.  The "-EXCLUDE" means to ignore any exclusion tagging that would be used to automatically exclude items from our Compliance Accelerator Random Sampling processing.

The "snum" is the index name for the Item Sequence Number (ISN) that is used in our database to identify items in the index.  Your query looks for any possible ISN.

The training "0,5" on the last line are just identifiers used to note the end of the query.

One last thing. The "Native Query" contains the RAW query data with sometimes different looking labels.  It also includes "AND (ssid:{**}".  The "ssid{**}" is just stating to include all SavesetID values.  We do have the ability to look for a specific item based on its SavesetID that we assign at the time of archiving.  Part of a native DA query includes the ability to specify a specifice SavesetID (ssid) through the use of a custom attribute.  Until that custom attribute is specified, our query includes the criteria to include all SavesetIDs as possible hits when combined with the other search criteria.

 

ImAlwaysSmiling
Level 4

Mr. Adams,

You are absolutely correct...  There was a "0" in the email address field for that particular Custodian.  Once I removed the "0" from the Custodian email address field in Custodian Manager, we were good to go.  I did not see anything in AD that would have put that "0" there.  This has happened to us on a couple occasions so I just have to look back and see who else may have that "0" as well.  Can you tell me if that field has anything to do with the Historical Data option when creating a search?

smiley

Raw Query:
----------
"name" 0 "jdoe@domain.org"
"date" 0 [20131003:000000-20131003:235959]
"anum" 0 0-0
"VaultPolicyAction" 1 "-EXCLUDE"
"snum" 0 1-fffffffe
0,5
 

ImAlwaysSmiling
Level 4

Mr. Adams,

Can I add that when there is a "0" in that field the results will pull up users who have been excluded from Discovery searches every time that this has happened.  Never did the "0" result is a custodian that is searchable.  Hope this info is of some use.

Thanks

Kenneth_Adams
Level 6
Employee Accredited Certified

Mr. Adams?  Please call me Ken.  Mr. Adams was my Dad.  wink

The "Use Historical Information for Custodians and Custodian Groups" option can have an affect on what is included in the search criteria.  To provide some background to answer your question about this option, let me explain a little of how a DA search using Custodians is processed on the DA server prior to being submitted.

Each Custodian is referenced to their SMTP address or addresses by the use of an AddressOwnerID that is unique to each Custodian.  The SMTP address or addresses are kept in an address table.  The Custodian names are kept in an address user table and a principal table, with the address user table containing each Custodian's display name, last name, first name, etc.  When certain administrative actions are applied to a Custodian - such as removal of an SMTP address, deactivation or deletion - information for the custodian will be placed into an address history table.

When a DA search using a Custodian is created, the Custodian is referenced by their AddressOwnerID.  That AddressOwnerID is translated into the SMTP address or addresses that we have for that Custodian in the Custodian Manager database.  The SMTP address or addresses, which are obtained from the address table, are then configured into the search criteria before being passed to the search engine on the EV indexing server(s).

Now, when a search is being configured with the "Use Historical Information for Custodians and Custodian Groups" option enabled (which is enabled by default), the search creation process will also look in the address history table for any SMTP address or addresses that are associated with the Custodian's AddressOwnerID.  So, if your selected Custodian has an address history table entry of "0" for an address, you'll get this "0" address included in the search criteria.

It sounded like your Custodian had the "0" address in the address table, not the address history table.  That "0" address could have been manually added as the SMTP Address field in Custodian Manager is configured to take any text.  We occassionally recommend adding a Custodian's Display Name and variations of it into the SMTP Address field to allow for more matches to message.  This would be in the event that an e-mail was sent with only a Display Name indexed for the author or recipient (as in an archived message from the user's Drafts folder before the message was configured with any recipients).

I know checking all of the Custodians for a Display Name or Address of 0 can be very time consuming.  I also know that you may have other instances of a Display Name or Address of other numbers.  Here is a SQL query to run against your Custodian Manager database to see all instances of an address or display name of 0, 1, 2, 3, 4 or 5.  You can adjust the WHERE clause condition to include more numbers if you need by just adding a comma, space, and the next number surrounded by single quotes as you can see in the query.

SELECT ta.Address AS 'Address or Display Name'
     , tat.Name AS 'Address Type'
     , ta.AddressOwnerID
     , tau.DisplayName AS 'Custodian Display Name'
     , tau.FirstName AS 'Custodian First Name'
     , tau.Surname AS 'Custodian Last Name'
FROM tblAddress AS ta
JOIN tblAddressUser AS tau
  ON ta.AddressOwnerID = tau.AddressOwnerID
JOIN tblAddressType AS tat
  ON ta.AddressTypeID = tat.AddressTypeID
WHERE ta.Address IN ('0', '1', '2', '3', '4', '5')
ORDER BY ta.Address

You can run the same query against the address history table by changing the FROM line's table name from tblAddress to tblAddressHistory.

You can also change the WHERE statement to list all entries that have a 0 in the Address column to be
WHERE ta.Address LIKE '%0%'

I offer the above change to the query so you can see all Custodians that have a 0 in their address or display name.

You can use the query to quickly identify any Custodian(s) that may need to be accessed through Custodian Manager to remove the number entry from the Address and / or Display Name field.

Let us know if you find this information useful or if you need more clarification, please.  We'll be glad to help as we are able.

Ken

 

ImAlwaysSmiling
Level 4

Wonderful...  Ken it is. laugh

Thank you for all the helpful info.  The query worked like a charm and actually shed additional light on what we believe to be the origin of the problem.  We migrated from Groupwise to Exchange a while ago.  The SMTP field in some cases had a "0" along with the email address listed in the EMC.  We have cleaned most of them up when we see them.  However, the Custodian Manager has that "0" listed even on those that we corrected in EMC.  (Historical)  Hope that makes sense.  So after running the script you so graciously provided, 152 results showed up in the database.  We are in the process now of cleaning up those remaining "0" SMTP entries in both areas.  I appreciate all your help and time invested in the explanations you provided me.

Thanks so much!

Juli

 

Kenneth_Adams
Level 6
Employee Accredited Certified

You are welcome, Juli.  I'm glad to have been of assistance.

A couple of new things to note:

  1. Changing Custodian information in EMC does not immediately synchronize to Custodian Manager.  That synchronization process occurs when the Enterprise Vault Accelerator Manager Service (EVAMS) is started and then every 8 hours thereafter by default.  If a synchronization has not completed within 8 hours, the next synchronization is delayed until after the previous synchronization completes and a waiting time has expired.  These synchronization and wait times are configurable through the Configuration tab, settings sub-tab, Profile Synchroniztion section.
  2. When you remove the 152 results through either the next profile synchronization or manually through the Custodian Manager web site, the addresses that you remove will be moved to the address history table.  Please be aware of this as the default selection of "Use Historic Information for Custodians and Custodian Groups" will want to include these addresses for the affected Custodians.  You'll want to uncheck that option to prevent those addresses from being included in your searches.
  3. When you've finished removing these addresses from EMC, you can go into the Custodian Manager web site and initial an immediate synchronization so you won't have to wait for the next, automatic synchronization to occur.  Just remember that the Custodian Manager web site won't show all of the changes immediately as there is still a required time needed by all of the synchronization processing for it to complete.  For small Active Directory or Domino domains, that synchronization is typically fast (i.e., within a few minutes).  For very large domains, that synchronization can take several hours.  Just be patient and you should eventually see these addresses removed from the current Custodian properties (but they will be in their history).

Ken

ImAlwaysSmiling
Level 4

Great Ken.  Thank you.  I completed the immediate synchronization in Custodian Manager.  Results look great when I did the final run with the query you provided.  We are good now.

Thanks again!

Juli smiley