Question on Compression
Hi all, hoping somebody might be able to shed some light here.
I'm basically wondering how the compression works on Enterprise Vault. I'm looking to use this solution to 1) save space on Exchange server and 2) take away the PST nightmare. However, on trial run I've noticed that the space saved by mailboxes is more than made up for by the space taken by EV storage.
As an example, I archived a portion of one of our development mailboxes. While I managed to shave 112MB from the total mailbox size, the drive with EVStorage located on expanded by 276MB.
Obviously as this is only a trial run, I am going with mostly default so have no idea whether I'm getting the best out of this setup. However I would like to get to the bottom of how this works, as so far this isn't going to get to purchase stage as I won't be saving any disk space, infact I'll be needing more!
Thanks in advance.
OK So the EV Storage in EV8 and beyond is made up of three parts for large emails
You have
1. The DVS File
2. The DVSSP File
3. The DVSCC File
The DVS File is typically the main part of the message body, senders, recipients, date sent, received, modified, message class, who archived the email, when it was archived, what folder it was in, what mailbox etc, all the main properties of the Email
The DVSSP File is usually the attachments, so its just a compressed form of what ever attachment was there.
The DVSCC File is the Converted Content of the attachment, so when the item is archived, EV will store the attachment as a DVSSP file and then convert the attachment (if it is a valid file that is available for conversion) in to either HTML or in to Text (not all attachments can be converted to HTML, so you may find some that just go to raw text instead)
The DVSCC file is what is passed to the indexing engine so that it can add elements (keywords and phrases) in to the indexing so it can be searched correctly, and the DVSCC is typically what is displayed when you do a preview of a message, so for instance if you go to the search.asp page and you want to preview an email, you click the link to the message and you see an html rendering of the email or the attachment, this is what is in the DVSCC
So lets say you send an email to 100 poeple with the same PDF file thats 2mb
In Exchange 2010 for instance that has no SIS, you are most likely going to have 200MB of email there overall from this one email.
EV then goes ahead and archives it, and it then turns it in to a 4k shortcut, so now in your mailbox you have 400K of shortcuts overall in peoples mailboxes.
On the EV Side of things, you will have
100 x DVS files (each person who archived that email will have their own DVS as it will contain all their user information)
1 x DVSSP file (this is the PDF file that was sent to all of them, because its SIS'd we only need to store the attachment once)
1 x DVSCC file (this is the PDF in html format, minus elements such as embedded images that cannot be converted)
So you may end up with
100 x 8kb DVS files (800k)
1 x DVSSP (lets say 2mb and its not compressed, just to make it easier)
1 x DVSSP (80k, the rest may just be images)
So now on disk with EV you have 2,928KB on disk
and on Exchange you have 800kb, altogether thats 3,728KB
From the 200MB that you originally had in your mailbox, its now a lot lot smaller
(Note, small email will not be single instanced as it will take more room to create these seperate parts than you would theoretically save, so small email like 20kb in size will have all these elements in to one single DVS file, only large email will have the DVS/DVSSP/DVSCC files)
The prolem can be however if you have just lots of small email, depending on how you have your shortcuts set up , the email can become a lot larger
For instance, lets say you set the custom properties to add a link to the archived email (the banner) and to keep the entire text message
Well if you send just a small email called "test" and nothing else, well the html added for the link may make that email go from 2k unarchived to 3k once it is turned in to a shortcut
Also if you choose to keep the entire message body you will not save any space, so you may have 100KB of text with a 10k attachment, you might find your space saving is only 5k or so, due to the fact that you're leaving behind the whole message body which will still be 100KB
And to the points i made above, you have to watch out for the white space, so that when you archive small emails, you may turn a 4K email in to a 3K email, the problem is, the way that exchange allocates spaces, although the shortcut appears to be smaller, it may take up the next smaller chunk allocated so its 3K of email and another 1K set to white space, which is why you need to do offline defrags of the exchange database every once in a while
Once you star
t archiving more and more people, you roll in shortcut expiry so that you only keep newer archived shortcuts and have them use archive explorer or virtual vault for legacy email, then you can keep the exchange stores smaller from that very fact and hopefully OSIS is doing its job and everything rolls along