cancel
Showing results for 
Search instead for 
Did you mean: 

Question on Compression

vivavilla
Level 3

Hi all, hoping somebody might be able to shed some light here.

 

I'm basically wondering how the compression works on Enterprise Vault. I'm looking to use this solution to 1) save space on Exchange server and 2) take away the PST nightmare. However, on trial run I've noticed that the space saved by mailboxes is more than made up for by the space taken by EV storage.

 

As an example, I archived a portion of one of our development mailboxes. While I managed to shave 112MB from the total mailbox size, the drive with EVStorage located on expanded by 276MB.

 

Obviously as this is only a trial run, I am going with mostly default so have no idea whether I'm getting the best out of this setup. However I would like to get to the bottom of how this works, as so far this isn't going to get to purchase stage as I won't be saving any disk space, infact I'll be needing more!

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

JesusWept3
Level 6
Partner Accredited Certified

OK So the EV Storage in EV8 and beyond is made up of three parts for large emails

You have
1. The DVS File
2. The DVSSP File
3. The DVSCC File

The DVS File is typically the main part of the message body, senders, recipients, date sent, received, modified, message class, who archived the email, when it was archived, what folder it was in, what mailbox etc, all the main properties of the Email

The DVSSP File is usually the attachments, so its just a compressed form of what ever attachment was there.

The DVSCC File is the Converted Content of the attachment, so when the item is archived, EV will store the attachment as a DVSSP file and then convert the attachment (if it is a valid file that is available for conversion) in to either HTML or in to Text (not all attachments can be converted to HTML, so you may find some that just go to raw text instead)

The DVSCC file is what is passed to the indexing engine so that it can add elements (keywords and phrases) in to the indexing so it can be searched correctly, and the DVSCC is typically what is displayed when you do a preview of a message, so for instance if you go to the search.asp page and you want to preview an email, you click the link to the message and you see an html rendering of the email or the attachment, this is what is in the DVSCC

So lets say you send an email to 100 poeple with the same PDF file thats 2mb
In Exchange 2010 for instance that has no SIS, you are most likely going to have 200MB of email there overall from this one email.
EV then goes ahead and archives it, and it then turns it in to a 4k shortcut, so now in your mailbox you have 400K of shortcuts overall in peoples mailboxes.

On the EV Side of things, you will have

100 x DVS files (each person who archived that email will have their own DVS as it will contain all their user information)
1 x DVSSP file (this is the PDF file that was sent to all of them, because its SIS'd we only need to store the attachment once)
1 x DVSCC file (this is the PDF in html format, minus elements such as embedded images that cannot be converted)

So you may end up with

100 x 8kb DVS files (800k)
1 x DVSSP (lets say 2mb and its not compressed, just to make it easier)
1 x DVSSP (80k, the rest may just be images)

So now on disk with EV you have 2,928KB on disk
and on Exchange you have 800kb, altogether thats 3,728KB

From the 200MB that you originally had in your mailbox, its now a lot lot smaller
(Note, small email will not be single instanced as it will take more room to create these seperate parts than you would theoretically save, so small email like 20kb in size will have all these elements in to one single DVS file, only large email will have the DVS/DVSSP/DVSCC files)


The prolem can be however if you have just lots of small email, depending on how you have your shortcuts set up , the email can become a lot larger

For instance, lets say you set the custom properties to add a link to the archived email (the banner) and to keep the entire text message
Well if you send just a small email called "test" and nothing else, well the html added for the link may make that email go from 2k unarchived to 3k once it is turned in to a shortcut

Also if you choose to keep the entire message body you will not save any space, so you may have 100KB of text with a 10k attachment, you might find your space saving is only 5k or so, due to the fact that you're leaving behind the whole message body which will still be 100KB

And to the points i made above, you have to watch out for the white space, so that when you archive small emails, you may turn a 4K email in to a 3K email, the problem is, the way that exchange allocates spaces, although the shortcut appears to be smaller, it may take up the next smaller chunk allocated so its 3K of email and another 1K set to white space, which is why you need to do offline defrags of the exchange database every once in a while
Once you star
t archiving more and more people, you roll in shortcut expiry so that you only keep newer archived shortcuts and have them use archive explorer or virtual vault for legacy email, then you can keep the exchange stores smaller from that very fact and hopefully OSIS is doing its job and everything rolls along

https://www.linkedin.com/in/alex-allen-turl-07370146

View solution in original post

9 REPLIES 9

JesusWept3
Level 6
Partner Accredited Certified

Enterprise Vault compresses using ZLib which does offer pretty decent compression, the real space savings come through the OSIS though, where attachments are shared out and typically save quite a lot of space.

The real issue is though if you are mostly archiving lots and lots of small email, mostly text, then none of this will be shared out, they will be 8k or 10k in size, which itself will be compressed, however Enterprise Vault will also make an HTML or text rendering for purposes of displaying in things like Search.asp or Archive Explorer as well as making their contents easily added to the Indexing.

Also dependant on the version of Exchange you have, Exchange does do some form of SIS itself (except for Exchange 2010 where SIS is not present at all for performance reasons). Also you may need to do offline maintenance on your Exchange mailbox stores as a lot of space may be taken up by whitespace, so you may have shaved off more than 113mb, its just thats all the space that could be reclaimed in an online maintenance plan

https://www.linkedin.com/in/alex-allen-turl-07370146

vivavilla
Level 3

This isn't an issue with Exchange storage space, this is an issue with Enterprise Vault storage space. I left my development mailbox to finish archiving over the weekend, and was a little alarmed when I checked the numbers this morning:

 

Mailbox space saved: 368MB

EVstorage space taken: 1,120MB

 

How can I make this solution viable? It just seems like it's giving with one hand but taking a whole lot more with another. Are you saying I should archive JUST mail with attachments? Or mail over a certain size?

 

Please shed some more light on this, as unless I can find a way to dramatically change the ratio of space saved to space required, I can no longer pursue Enterprise Vault as a viable option.

Maverik
Level 6

How are you gathering the above figures?

vivavilla
Level 3

As you would imagine - by checking the properties of the EVStorage folder and the size on disk. Same for the mailbox, checking the total size of the folder.

 

Windows compression has helped massively, it's almost halved the disk space taken by the EVStorage folder, but it's still taking up more space than has been saved on the mailbox.

JesusWept3
Level 6
Partner Accredited Certified

are you doing online defrag?
Looking at the physical size of an exchange database and pinpointing where your space savings come from is wholly inaccurate, like i said before, it's more likely than not an issue with reclaiming space in exchange, for which you will *really* need to do an *offline* defrag of exchange

https://www.linkedin.com/in/alex-allen-turl-07370146

vivavilla
Level 3

I'm not checking the size of the Exchange database, I'm checking within Outlook itself.

 

For example, I archive an email and it reduces from 1MB to 6KB - thus, my overall mailbox size has reduced from 20MB to 19MB.

 

This has what produced the numbers I have noted above.

JesusWept3
Level 6
Partner Accredited Certified

OK So quick question, you've only enabled your mailbox for archiving thus far right?
What happens if you were to export your archive through the Vault Admin Console to a PST file
What size does the PST file become at that point?

Again though really, i think its a bit difficult with a single mailbox as you will have little to no sharing being done for the one mailbox, the real space savings come in when lots of people share the same large items which is quite often the case

https://www.linkedin.com/in/alex-allen-turl-07370146

vivavilla
Level 3

Ok, this is beginning to make more sense now. I will continue testing with more mailboxes to see if I can gain more of a beneft that way.

 

Quick question - why does the EV storage contain so many files? I've began to archive my own mailbox now, and although I've only archived a dozen or so emails so far, there are over 200 items stored on the server?

JesusWept3
Level 6
Partner Accredited Certified

OK So the EV Storage in EV8 and beyond is made up of three parts for large emails

You have
1. The DVS File
2. The DVSSP File
3. The DVSCC File

The DVS File is typically the main part of the message body, senders, recipients, date sent, received, modified, message class, who archived the email, when it was archived, what folder it was in, what mailbox etc, all the main properties of the Email

The DVSSP File is usually the attachments, so its just a compressed form of what ever attachment was there.

The DVSCC File is the Converted Content of the attachment, so when the item is archived, EV will store the attachment as a DVSSP file and then convert the attachment (if it is a valid file that is available for conversion) in to either HTML or in to Text (not all attachments can be converted to HTML, so you may find some that just go to raw text instead)

The DVSCC file is what is passed to the indexing engine so that it can add elements (keywords and phrases) in to the indexing so it can be searched correctly, and the DVSCC is typically what is displayed when you do a preview of a message, so for instance if you go to the search.asp page and you want to preview an email, you click the link to the message and you see an html rendering of the email or the attachment, this is what is in the DVSCC

So lets say you send an email to 100 poeple with the same PDF file thats 2mb
In Exchange 2010 for instance that has no SIS, you are most likely going to have 200MB of email there overall from this one email.
EV then goes ahead and archives it, and it then turns it in to a 4k shortcut, so now in your mailbox you have 400K of shortcuts overall in peoples mailboxes.

On the EV Side of things, you will have

100 x DVS files (each person who archived that email will have their own DVS as it will contain all their user information)
1 x DVSSP file (this is the PDF file that was sent to all of them, because its SIS'd we only need to store the attachment once)
1 x DVSCC file (this is the PDF in html format, minus elements such as embedded images that cannot be converted)

So you may end up with

100 x 8kb DVS files (800k)
1 x DVSSP (lets say 2mb and its not compressed, just to make it easier)
1 x DVSSP (80k, the rest may just be images)

So now on disk with EV you have 2,928KB on disk
and on Exchange you have 800kb, altogether thats 3,728KB

From the 200MB that you originally had in your mailbox, its now a lot lot smaller
(Note, small email will not be single instanced as it will take more room to create these seperate parts than you would theoretically save, so small email like 20kb in size will have all these elements in to one single DVS file, only large email will have the DVS/DVSSP/DVSCC files)


The prolem can be however if you have just lots of small email, depending on how you have your shortcuts set up , the email can become a lot larger

For instance, lets say you set the custom properties to add a link to the archived email (the banner) and to keep the entire text message
Well if you send just a small email called "test" and nothing else, well the html added for the link may make that email go from 2k unarchived to 3k once it is turned in to a shortcut

Also if you choose to keep the entire message body you will not save any space, so you may have 100KB of text with a 10k attachment, you might find your space saving is only 5k or so, due to the fact that you're leaving behind the whole message body which will still be 100KB

And to the points i made above, you have to watch out for the white space, so that when you archive small emails, you may turn a 4K email in to a 3K email, the problem is, the way that exchange allocates spaces, although the shortcut appears to be smaller, it may take up the next smaller chunk allocated so its 3K of email and another 1K set to white space, which is why you need to do offline defrags of the exchange database every once in a while
Once you star
t archiving more and more people, you roll in shortcut expiry so that you only keep newer archived shortcuts and have them use archive explorer or virtual vault for legacy email, then you can keep the exchange stores smaller from that very fact and hopefully OSIS is doing its job and everything rolls along

https://www.linkedin.com/in/alex-allen-turl-07370146