cancel
Showing results for 
Search instead for 
Did you mean: 

Journal Message De-Duplication

Simon_Butler
Level 5
Certified
Hi,

Just a quick poll to see if anyone has implemented any work-arounds to the duplication caused by having multiple journal mailboxes in Exchange.

An idea that has been passed around is to point Exchange to a single mailbox and auto-forward the mail from this mailbox on a round-robin basis to multiple journal mailboxes in turn.  This does mean that the active journal mailbox will be receiving more mail than EV can handle, but when the forwarding moves on to the next mailbox EV finish the backlog before it's next turn.

As far as we can tell the only drawback is that the journal mailbox appears as a recipient in the journal envelope and is subsequently indexed, displayed etc. And if you're using APM or CA and have rules based on number of recipients the results can be skewed.  Benefits are great - reducing from even just 3 active to 1 round-robin can reduce Index and Storage burn rates by half, and in turn reduce the number of EV servers required.  DA/CA searches have no duplicates which cuts investigation costs by half.

If there's a way to retrospectively remove duplication that would be great....but that's a bit harder....


Cheers,
Si
11 REPLIES 11

MichelZ
Level 6
Partner Accredited Certified
Hi there I don' really understand what you are trying to do. EV will deduplicate messages as long as you have the stores in the same vault store group... ? Maybe you can elaborate further on what environment/config you have. Cheers

cloudficient - EV Migration, creators of EVComplete.

Liam_Finn1
Level 6
Employee Accredited Certified
 I agree with Michel.

All you need to do is have each of the jorunal Vaults in the same Vault Store group and EV will take care of the de-dupe for you.


GertjanA
Moderator
Moderator
Partner    VIP    Accredited Certified
We have 9 journal mailboxes. These are handled by 9 journal archiving servers. These servers are placed in 1 group (VSG), and are set to ' share within group' .

Gertjan
Regards. Gertjan

GregRountree
Level 4
Partner
Centera will allow duplicates but NTFS will use SIS now to keep from duplicating items. If you want to get rid of duplicates from DA then you will need to export them to PST and then run a program to delete the duplicates.

9 Journal Archive servers is a lot. How much email are you Journaling per day? 2 Quad Core processors with 8 GB ram can handle 60,000 per hour. The least amount of Journal Mailboxes the better.

JesusWept3
Level 6
Partner Accredited Certified
I think you've all missed the point, he's trying to say that items are technically being treated as different because they have different routes to get to different to exchange servers, so even though the message is the same, from the same person to the same people, its not single instanced because the message headers are different.
https://www.linkedin.com/in/alex-allen-turl-07370146

MichelZ
Level 6
Partner Accredited Certified
OK, maybe not at the message level.
But at least at the attachment level, as EV8 stores them seperately, and even if the messages do not match because of the headers, the attachments still get single instanced.

About how many mails do we actually talk about here?
And how many EV Servers?

Maybe you are able to tune EV to get it to successfully archive everything from one Journal mailbox...

Cheers
Michel

cloudficient - EV Migration, creators of EVComplete.

Batmanfail
Level 4
Is this an EV issue or an Exchange issue?  Looks more like Exchange to me.  Are you in a mixed environment like this post on Technet.....

http://blogs.technet.com/ramm/archive/2009/10/15/exchange-2003-exchange-2007-mixed-mode-journaling.a...

more info would help Simon...

Simon_Butler
Level 5
Certified
Just to clarify:

I've over 100,000 messsages per hour to journal & must use 3 to 4 active EV servers to ingest to ensure sufficient resources are available for daytime searches.  We have a high usage of DA so maximising ingest to 1 to 2 servers is not possible - searches will grind to a halt.

Multiple end-points (even with EV's own de-dup tech) will give you duplicate records even though the data may be de-duped.  Multiple records costs $$££ to review.

Using a single mailboxes and forwarding that to many mailboxes in round-robin allows the load to be spread & prevent any dups from being created..


Batmanisalesbian - Theres a fix available from MS for the mixed mode dup issue - also fixed in SP2 RU1 I believe.

Simon_Butler
Level 5
Certified
I guess I confused everyone,...or everyone with larger environments is happy with duplicate records... :) 

Michael_Bilsbor
Level 6
Accredited
Hi Simon,

This from a large corporation who wanted to resolve the issue as well

"
We have 1 journal mailbox defined within Exchange. Once that destination journal mailbox reaches 30,000 msgs we change the "alternate recipient" on that same journal mailbox and so on every 30,000 msgs....So since Exchange single instance is per journal mailbox, that journal mailbox is always the same and therefore maintains single instance before EV even gets it.
"

Not sure the details of exactly how they go abou this but hopefully this will put you on the right path.

let me know if you need more information.

Cheers,
Mike

BigAnvil
Level 5
We have a product that takes Blackberry SMS data, in the form of CSV files, and converts it to message files for ingestion into EV.  Because the BB SMS data is duplicated in the CSV files (a single SMS message is typically broken into up to four parts - one each for tx_pending-add, tx_sending-update, tx_sent-add, and tx_sent-update), the exact same message is created four times within the mailbox being used for archiving this data.

When EV archives this data, it's my understanding that it will de-duplicate it.  My question is whether these duplicated messages will be returned as results when searches are performed against this archive?