Forum Discussion

Bilgore's avatar
Bilgore
Level 2
10 years ago

Deduplication in Discovery Accelerator

Hi-

I am running Discovery Accelerator 9.0.  I am searching for 4 email addresses in the To/From fields.  The search is limited to a journal archive.  My question is, when the search results are presented, are they deduplicated automatically?  The search results returned 109,000 items.  Does Discovery Accelerator, behind the scenes, perform deduplcation during the search process?  I've read a ton of material, but its not clear to me whether deduplication is something that I have to elect to do, or if it is done by default.

 

  • Deduplication can happen on export (settings), but in review all the items are presented. 

  • Wit Send is partially correct.  Deduplication in Discovery Accelerator does not occur during the search or accepting operations.  Deduplication will occur during the Review processing IF - and only IF - the Stacking option is set to Similar.  In the case has had Analytics enabled and allowed enough time for the Analytics processing to complete, the Stacking option will also have a Duplicate option to go along with the Similar option.  Similar uses specific metadata information and the first 125 or so characters in the message body to determine a hash value for similarity.  Analytics will use that same metadata hash and add to is a content hash of the entire contents of the message for use in comparing different items to determine a true duplicate condition. So, using just the metadata allows for similar, but including the content allows for duplicate determinations.

    So, during a review, you will see the total count of items that would include the duplicates.  Selecting a Stacking option of Similar or Duplicate will decrease the number of top-level messages you see, but any duplicates will be expandable for viewing by clicking on the '+' sign to the left of the randomly selected 'primary' message of each duplicate set.

    Now, an export also allows for deduplication similar to the review.  Without Analytics enabled, you'll have the option to exclude similar items.  With Analutics enabled, you'll have the similar and duplicates options available to sevect to exclude items.

    Kind regards,

    Ken

     

2 Replies

Replies have been turned off for this discussion
  • Wit Send is partially correct.  Deduplication in Discovery Accelerator does not occur during the search or accepting operations.  Deduplication will occur during the Review processing IF - and only IF - the Stacking option is set to Similar.  In the case has had Analytics enabled and allowed enough time for the Analytics processing to complete, the Stacking option will also have a Duplicate option to go along with the Similar option.  Similar uses specific metadata information and the first 125 or so characters in the message body to determine a hash value for similarity.  Analytics will use that same metadata hash and add to is a content hash of the entire contents of the message for use in comparing different items to determine a true duplicate condition. So, using just the metadata allows for similar, but including the content allows for duplicate determinations.

    So, during a review, you will see the total count of items that would include the duplicates.  Selecting a Stacking option of Similar or Duplicate will decrease the number of top-level messages you see, but any duplicates will be expandable for viewing by clicking on the '+' sign to the left of the randomly selected 'primary' message of each duplicate set.

    Now, an export also allows for deduplication similar to the review.  Without Analytics enabled, you'll have the option to exclude similar items.  With Analutics enabled, you'll have the similar and duplicates options available to sevect to exclude items.

    Kind regards,

    Ken

     

  • Deduplication can happen on export (settings), but in review all the items are presented.