Forum Discussion

ANDREY_FYODOROV's avatar
13 years ago

DA. Analytics. Can we enumerate Duplicates BEFORE Export?

Hi all.

We have a very large case in Discovery Acclerator, analytics for the case was enabled and has completed.

We also have SQL reporting services installed, and the DA report templates have been uploaded.

I can see that we can use the option to exclude duplicates from an export.  Adn then also we can run a report that would show the duplicates AFTER the export.

I hope there is a way to find out how many duplicates there are BEFORE we run an export (the export will be huge and will take many days, so it would be nice to know this information sooner, because many people are asking this question - "how many items are going to be exported?")

 

Is it possible to enumerate duplicates via a SQL query?  Or can it be done via a custom report?

 

Thanks for any advice.

  • Do you plan to export all items in the case?  And are you going to choose Similiar and Duplicate or Duplicate only?

    If doing Duplicate only this should give you a good idea:

    Runs against your DA Customer database:

    SELECT CaseName, COUNT(Distinct KVSSaveSetID)
    FROM view_DiscoveredItems
    GROUP BY CaseName

  • Do you plan to export all items in the case?  And are you going to choose Similiar and Duplicate or Duplicate only?

    If doing Duplicate only this should give you a good idea:

    Runs against your DA Customer database:

    SELECT CaseName, COUNT(Distinct KVSSaveSetID)
    FROM view_DiscoveredItems
    GROUP BY CaseName

  • Thanks. But this is not quite what I am after.

     

    This query is just going to display the count of all the unique savesets that were search hits for each case.

    Well, all the savesets are unique  :)

    It is returning the number of all discovered items.   Same number that the DA GUI shows me already.

     

    I would like to run something against the case's analytics index that will tell me what items are duplicates (according to the analytics).

     

    The program knows how to do it while it is running an export.

     

    I wish there was a dry run for an export or something like this where it would show the dupes quickly without actually exporting the items just yet.

  • Actually not all savesets in a DA case will be unique if Single Instancing has occurred.  Are you searching Journal and Mailbox archives or only Journal?

    Regardless, since the query returned the same number I guess you will have to run the export but I would suggest you post it in the Ideas section.  It would be good if there were a Report Mode on the Export so you could make sure you have enough storage free for the export. 

     

  • We do have single instancing, but we have seven vault stores in different geographic locations. And single instancing only applies within each vault store (not across vault stores).

    We don't have Journal archives, only mailbox ones.