cancel
Showing results for 
Search instead for 
Did you mean: 

Deduplication in Discovery Accelerator

Bilgore
Level 2

Hi-

I am running Discovery Accelerator 9.0.  I am searching for 4 email addresses in the To/From fields.  The search is limited to a journal archive.  My question is, when the search results are presented, are they deduplicated automatically?  The search results returned 109,000 items.  Does Discovery Accelerator, behind the scenes, perform deduplcation during the search process?  I've read a ton of material, but its not clear to me whether deduplication is something that I have to elect to do, or if it is done by default.

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

WiTSend
Level 6
Partner

Deduplication can happen on export (settings), but in review all the items are presented. 

View solution in original post

Kenneth_Adams
Level 6
Employee Accredited Certified

Wit Send is partially correct.  Deduplication in Discovery Accelerator does not occur during the search or accepting operations.  Deduplication will occur during the Review processing IF - and only IF - the Stacking option is set to Similar.  In the case has had Analytics enabled and allowed enough time for the Analytics processing to complete, the Stacking option will also have a Duplicate option to go along with the Similar option.  Similar uses specific metadata information and the first 125 or so characters in the message body to determine a hash value for similarity.  Analytics will use that same metadata hash and add to is a content hash of the entire contents of the message for use in comparing different items to determine a true duplicate condition. So, using just the metadata allows for similar, but including the content allows for duplicate determinations.

So, during a review, you will see the total count of items that would include the duplicates.  Selecting a Stacking option of Similar or Duplicate will decrease the number of top-level messages you see, but any duplicates will be expandable for viewing by clicking on the '+' sign to the left of the randomly selected 'primary' message of each duplicate set.

Now, an export also allows for deduplication similar to the review.  Without Analytics enabled, you'll have the option to exclude similar items.  With Analutics enabled, you'll have the similar and duplicates options available to sevect to exclude items.

Kind regards,

Ken

 

View solution in original post

2 REPLIES 2

WiTSend
Level 6
Partner

Deduplication can happen on export (settings), but in review all the items are presented. 

View solution in original post

Kenneth_Adams
Level 6
Employee Accredited Certified

Wit Send is partially correct.  Deduplication in Discovery Accelerator does not occur during the search or accepting operations.  Deduplication will occur during the Review processing IF - and only IF - the Stacking option is set to Similar.  In the case has had Analytics enabled and allowed enough time for the Analytics processing to complete, the Stacking option will also have a Duplicate option to go along with the Similar option.  Similar uses specific metadata information and the first 125 or so characters in the message body to determine a hash value for similarity.  Analytics will use that same metadata hash and add to is a content hash of the entire contents of the message for use in comparing different items to determine a true duplicate condition. So, using just the metadata allows for similar, but including the content allows for duplicate determinations.

So, during a review, you will see the total count of items that would include the duplicates.  Selecting a Stacking option of Similar or Duplicate will decrease the number of top-level messages you see, but any duplicates will be expandable for viewing by clicking on the '+' sign to the left of the randomly selected 'primary' message of each duplicate set.

Now, an export also allows for deduplication similar to the review.  Without Analytics enabled, you'll have the option to exclude similar items.  With Analutics enabled, you'll have the similar and duplicates options available to sevect to exclude items.

Kind regards,

Ken

 

View solution in original post