cancel
Showing results for 
Search instead for 
Did you mean: 

PST Migrations

Donnal_Spence
Level 4
When you import a PST file to a users vault, is it stored as a single instance or are the attachments in the messages all stored separately?
22 REPLIES 22

David_Messeng1
Level 6
Attachements are single instanced across a store partition (unless you use the HSM element in which case within a .CAB within a partition). One of the main reasons for using EV is it's ability to identify messages that are the same across multiple PST files or Exchange databases and re-single instance them. lots of other products can't (or couldn't anyway) do this.

Donnal_Spence
Level 4
If I use HSM (Enterprise Vault) to do collections, then it does not single instance storage the attachments?

David_Messeng1
Level 6
My understanding is that it only single instances within the .CAB file. The "sweet spot" for .CAB collections is many 000s of files so you will still have some single instancing -- and (of course) they are doubly compressed.

You need to test to figure out which is best for you. Remember .CABs are read only and to return a single file EV has to open the whole .CAB, decompress and cache it and then extract the .DVS you want. Tread carefully with .CABbing.

You can almost certainly work out the SI ratio from SQL.

Tremaine
Level 6
Employee Certified
Hiya David

You got it the wrong way round there mate. Single instancing is done outside of the CAB and withing the saveset before it is collected into a cab file. Once in the cab any newly archived item that may match one already in a cab will create a new DVS file.

The idea of the CAB file is only to provide better performance for folder level backups so that there are less files that need to be read by the backup product. Ideally you only want to implement a policy that collects files older than 3+months. This should normally allow for any shared (SIS) messages to be caught and saved appropriately before being collected.

There is no additional compression taking place within the CAB file. The Saveset is already optimally compressed.

The performance overhead of extracting the saveset from the cab is negligable.
However, having said that if you are using offline vault or rebuilding an index......well you can imagine.............(as we found out the hard way by filling up a closed partition with temp extracted savesets..we kinda changed our strategy after that to block level backups and no collections)

David_Messeng1
Level 6
Kinda what I said (although you are right - it's really badly explained!)

Compression is totally the wrong word for .CABs. Apologies. The space gain is getting back the wasted NTFS blocks and that's only a theory (although we'd anticipate 25% space reclaimed).

__Drache__
Level 3
Just to address the original question, David, I don't believe that single-instancing doesn't quite work the way you mentioned.

Single-instancing happens on the message level, but not the attachment level. So yes, whether it's a PST or mailbox, only one instance of a message is ever saved across a Vault Store.

However, if the same attachment is placed within two different messages and archived into one Vault Store, then two instances of that attachment are saved.

If you're using a Centera, then of course the story is different since it does its own single-instancing and does it down to the attachment level. Last I heard, however, this was the only way to get that sort of SIS.

David_Messeng1
Level 6
Drache,

try reading it again and you'll see that is pretty much exactly what I've said.

But if we want to be pedantic, I think you are wrong on 2 points:

- You get SIS across Partitions not Stores.
- You get SIS on attachments not messages.

Either way the concensus is you get a strong SIS within an EV partition which could well join back up a lot of split SIS from multiple .pst files and Exc databases. Let us all rejoyce in this.



David

__Drache__
Level 3
David,

Perhaps we should find some sources for clarification then, since I'm fairly sure of my statements.

It has always been my understanding that, in Enterprise Vault, SIS is handled within each Vault Store database. Since the Vault Store database handles all partitions within the Vault Store, it doesn't make sense that SIS would break down at the Partition level.

Furthermore, messages are saved within a Vault Store partition as entire messages. Since there is no breakdown of the message in a standard NTFS Vault Store such as what you see in a Centera, there is no way that Enterprise Vault can single-instance attachments within said files.

Is there any material you can refer to that shows otherwise? I know that the training material up to V5, at least, has supported my point.

Cheers,
D

David_Messeng1
Level 6
I'll take a look next time I have 10 mins. Not something I've ever really bothered about as it'll be a long time before we close any partitions.

You agree about the NTFS blocks then?

David_Messeng1
Level 6
By the way, I assume the suggestion that we look for documentation was a joke?

Tremaine
Level 6
Employee Certified
On the SIS side of things again. SIS is only at the message level for NTFS partitions, and then also only within the current open partition for that vault store.

If the partition is closed then no new per user information is added to the saveset in a the closed partition. A new saveset is created in the open partition. The reasoning behind this is that a closed partition is static and only available for reads/deletes

__Drache__
Level 3
David,

With regards to the NTFS blocks, your theory seems to be a good one. I'd really be interested to see if you actually recover any space using collections. I haven't heard of any companies over in the States recovering space with collections, but I'm not sure if anyone has been looking at it that closely.

The developers were always telling us that, theoretically, the overhead of making the CAB files would actually increase the size, even if it was by the slightest amount, so we should never expect any recovery of space. If someone could prove otherwise, I think there'd be much rejoicing!

With regards to the documentation, I wasn't trying to be funny, no. There is documentation out there; it's just a matter of figuring out where it is and how to get it. Granted, KVS never made it easy, but hopefully Symantec will be changing that soon.

D

__Drache__
Level 3
> On the SIS side of things again. SIS is only at the
> message level for NTFS partitions, and then also only
> within the current open partition for that vault
> store.

*edit*

I did a little checking, and yeah, I was wrong. The SIS is definitely based on the open partition. Quite interesting that the saveset is modified that way after it's been archived; I was under the impression that, once an item was archived, it was left unchanged.

A shame that they can't extend that ability to changing the retention category within the saveset, eh?

DMessage was edited by:
Drache

David_Messeng1
Level 6
Good stuff but I'm disappointed about the CABs... I'll take a peek sometime (just blown away all my test data so it'll be a wee while to get enough back to make it viable, sigh). I'm guessing that the NBU integration in v6 must use some sort of compilation technology though? NBU hates lots of little files doesn't it?

Anyone ever tried to work out the SIS ratio in EV using a SQL query?

Michael_Bilsbor
Level 6
Accredited
Doesn't EV store per user info in the files? So I'm guessing will need to update the DVS file to add this. That would make sense as well with it dealing with open partitions only since you wouldn't want it updating a closed partition becase by definition you don't want it to change

David_Messeng1
Level 6
I thought there was a link - saveset to user (for ACLs if nothing else). Also I'd assumed SIS passed thru Retention Categories (i.e. same .DVS could have > 1 RC). But these are assumptions only. Another one on the big list of things to look at...

Michael_Bilsbor
Level 6
Accredited
Yes same item can have multple retention categories. Take a look at the saveset table for proof of that.

TomerG
Level 6
Partner Employee Accredited Certified
I believe the NBU integration has to do with using Storage Migrator so that NBU migrates data (old collected data, aka those CAB files) to 3rd tier - usually tape - storage. I don't believe that NBU integration has anything to do with collection, which still has only two options:

1. None: no collection, which means lots of small files, and which NBU - and other backup software - definately wouldn't like as far as backup times

2. Enterprise Vault: this is where EV collects the files into .CABs as y'all been talking about, and is much preferable as far as any backup software is concerned (unless you're doing raw partition backups, which have other problems associated with them; unless using something like NetBackup's FlashBackup so you get raw partition backup speeds with single file restore capability)

Leon_Funnell
Level 2
We have a situation where we have a lot of e-mail already in our Vault - about 5 years worth. This is mostly Journal mail, and a few enabled mailboxes. We now need to enable about 300 users, who's mail already exists as journaled items in the Vault in CAB files. This is almost definitely going to double the size of the data on the vault, which currently stands at 700GB. The options we have are:

1. Export all existing mail out and into a new server, enable all the mailboxes and let all the mail archive before running a collection. This is a huge job, and in the time before the collection runs our backup system will really struggle to back up millions of DVS files

2. Find some way to get SIS back post migration - is there some utility that can recover SIS? This would be a great feature.

3. Export the mail out of the Vault and in to a product that does true SIS. There are products that store the message body in SQL and the attachments on a file store - the one I was looking at actually creates a hash of each attachment, so even if someone forwards an existing file to someone else, only one copy is created. This is least favoured, but could have long term benefits in terms of storage.


One more thing - due to previous bad practices, we have mail items with HUGE attachments - sometimes 30-40MB each. This is why the existing Vault is so huge, and why it will grow so much. We know this is bad - but we have to live with it...

Any comments on the above?