cancel
Showing results for 
Search instead for 
Did you mean: 

SIS and DA dedupe

Vikes_2
Level 6
Hello,
I have seen some info about SIS and DA deduping but am still not sure about what I am seeing in my EV 2007 Sp2 environment.
I have a single Vault store with two partitions for mailbox journal archiving. I just ran a DA search and saw a bunch of duplicates. My understanding was that if all mail is on the  new partition and over 50k in size it should only show up once, and the ones that did show up more than once should be deduped by DA when you accept the results. If anyone has some details about this I would greatly appreciate it :)

thanks,
Travis
1 ACCEPTED SOLUTION

Accepted Solutions

Wayne_Humphrey
Level 6
Partner Accredited Certified
http://www.axisdiscovery.com/products.html

View solution in original post

5 REPLIES 5

Joseph_Rodgers
Level 6
Partner
Travis, The Vault Store will SIS however when you use DA it will search against all archives. Each archive (journal, mailboxes, fsa, etc) have their own indexes. In your case if you have 2 journals going to 2 archives than any message captured by both journals will be archived and single-instanced in the vault store BUT each archive's index will have references to the item and a search of both archives will return 2 results. If both journal mailboxes were writing to the same archive this would not occur. To my knowledge, DA does not dedupe but Symantec has a tool available. I couldn't find a quick reference to it online so you might have to dig. Symantec PST De-Duplicator powered by Axis Discovery -Joe

Liam_Finn1
Level 6
Employee Accredited Certified

Travis,

The Vault Store will SIS however when you use DA it will search against all archives. Each archive (journal, mailboxes, fsa, etc) have their own indexes.

In your case if you have 2 journals going to 2 archives than any message captured by both journals will be archived and single-instanced in the vault store BUT each archive's index will have references to the item and a search of both archives will return 2 results.

If both journal mailboxes were writing to the same archive this would not occur.

To my knowledge, DA does not dedupe but Symantec has a tool available. I couldn't find a quick reference to it online so you might have to dig.

Symantec PST De-Duplicator powered by Axis Discovery

-Joe


Joe,

The tool you are speaking of is Axis dedup this tool is provided free if you have a license for DA.

Axis dedup will dedup the pst files after they have been produced, or exported from DA.

 

Wayne_Humphrey
Level 6
Partner Accredited Certified
http://www.axisdiscovery.com/products.html

Vikes_2
Level 6
Thanks guys! I will check the tool out. Also I wanted to note that there is only one journal, there are two partitions though. Of course the first partition is closed and the data in question was archived only on the second (open) partition which should make SIS for all items that are over 50k right?
Once again thanks!!
Travis

Joseph_Rodgers
Level 6
Partner
Travis,

note: In EV 2007 SIS is on a per message basis (unless using Centera) so each message, regardless of size will be SIS'ed.  (ie if a journal archive shares space with a mailbox archive and the same message is written by both tasks).  EV 8.0 replaces this with SiS parts and is much more efficient on storage.  I am not familiar with the 50K you mention unless you have a Centera (but I believe the default is 20K).

If I read you correctly you have 1 vault store with 2 partitions (1 closed, 1 open) and you are journaling?

Are you using collections?  Collections can have a reduce SIS effectiveness.

If all data is being written by 1 EV server to 1 vault store and collections is disabled than you should not be seeing duplicates.

Journaling will only send 1 copy.  Can you elaborate on your configuration?  How many journal mailboxes, VSP storage type, etc.

Regards
Joe