cancel
Showing results for 
Search instead for 
Did you mean: 
JesusWept3
Level 6
Partner Accredited Certified

One feature of Enterprise Vault is the use of Collections, where Enterprise Vault will collect multiple items into a Collection which is a Microsoft Cabinet (CAB) file. The main use for doing this is to help with backup times.

For instance if you have 1 x 10MB cab file, this will backup quicker than say 100 x 10KB files, one thing to note however, is that the CAB files are not compressed, meaning that if you extract a 10MB CAB file, it will result in 10MB of DVS Files. The reason for this is that DVS files are already highly compressed, and when you attempt to compress something that’s already compressed, it results in a bigger file.

Enterprise Vault Collections are configured on the Collections tab, where you can figure when the collections run, how big the CAB files can be, and how old the items have to be before they can be collected in to a CAB file.

Note that when you enable Collections, you cannot disable them. The best you can do is either make the age of the files to collect so old that nothing would get archived, or you can limit the amount of time that the collections process can run (i.e setting the start AND end time to be at 11:00AM).

A word of caution on the second method though, when an item is retrieved from a CAB file, it is put in its original location and named as an ARCHDVS (or ARCHDVSSP, ARCHDVSCC etc on EV8(, those files are not automatically deleted after the user has finished reading the email.

Instead, it is the Collections process itself that goes behind and deletes the ARCHDVSxx files after a certain period of time, if the collections period is set too short or has 0 seconds to run, then then archdvs files cannot be cleaned up and you will end up duplicating space unnecessarily.


Where are CAB Files stored?
Collections themselves are stored in different places dependent on your version of Enterprise Vault.

Enterprise Vault 2007 and below:
The following folder structure is used to store DVS files, and the Collections are placed in the “Day” folder.

Files are stored in a yyyy\mm\dd\hh format. For example
E:\Enterprise Vault Stores\Journal Vault\ptn1\2010\01\30\17\<saveset>.dvs

The above would symbolize an item archived at 5pm on 30th January 2010
The CAB files are stored in the \dd\ section..so it may look like
E:\Enterprise Vault Stores\Journal Vault\ptn1\2010\01\30\Collection12345.cab

In Enterprise Vault 8 however, the locations are stored in a little different format.
it stores it in \yyyy\mm-dd\LETTER

Example:
E:\Enterprise Vault Stores\Journal Vault\2010\01-11\A\074\<saveset>.dvs

The above would suggest an item is archived on 11th January 2010.
However rather than storing in an additional hour folder as it used to in EV2007, it now uses parts of the file name of the DVS.

In this example we have a file name called A07465CEEC2320A040210B08E3549781.DVS, the name is based on the Transaction ID assigned to the item, it takes the first letter of the transaction ID (A) and then creates folders that use the next three numbers or letters of the transaction ID.

Another example, if an item called 107DC3824ADB33CDABCE5C15B7B46BD1.DVS and it was archived on January 11th 2010, it would be located in the following location:
E:\Enterprise Vault Stores\Journal Vault\2010\01-11\1\07D

On Enterprise Vault collection files are stored in the first letter of the transaction id’s location.
For instance the collection file may be stored here
E:\Enterprise Vault Stores\Journal Vault\2010\01-11\1\Collection12345.cab

What happens when I delete an item or run storage expiry?
When items are added to a CAB file, they will remain there until a process called Sparse Collections is run, which involves extracting valid savesets and then deleting the cab, those savesets are then re-collected at a later date.

When an item is deleted, Enterprise Vault simply cannot delete an item with in a CAB file (this actually applies to any compressed file such as ZIP or RAR) therefor you get in to a situation where items are deleted from the Databases and indexes, but still remain in the CAB files.

So what occurs is that Enterprise Vault does a look up of all the items in a CAB and determines which ones are still valid, if there is only a certain percentage of items that truly exist in the CAB file, then EV extracts all the items, and the cab file is deleted.

So how does Enterprise Vault know which cab files to check?
Well when a collection file is created, there are two SQL Columns populated in the Collections table.
One is called RefCount and one is called TotalCount.

When a Collection is first created, it takes a count of how many items are stored, and sets the refcount and totalcount to the same number, so if 100 items are stored , both refcount and totalcount will be set to 100.

Then, when an item is deleted or expired from that collection, it will reduce the number of the refcount, but the totalcount will remain the same.

So if 50 items are deleted that belong in that CAB file, then refcount will be set to 50, and the totalcount will remain at 100. When the Refcount hits 0, this means that none of the DVS files within that CAB file exist in the database or the indexes, thus the CAB file and all its contents can be deleted.

But what happens if you have a refcount of 1 and a totalcount of 100? This 1 item that still exists in EV is stopping the other 99 items from being removed from disk and freeing up storage. So what happens is the Sparse collections process.

The last items are extracted to their original location, the refcount is set to 0 and then EV deletes the CAB file. By default, Enterprise Vault will initiate the sparse collections when the refcount is 15% of the the totalcount.

So if you have 100 items stored in a cab, as soon as the refcount hits 15 items or lower, it will extract and then delete the cab file. So if you every run a storage expiry, make sure you run your collections process after so that you can reclaim disk space immediately.

Comments
Bruce_Crankshaw
Level 6
Partner

Interesting read , explains the process nicely thx :)

chhabrak
Not applicable
Very informative..Thanks
Jayasimha
Level 3

You mentinoned the directory sturcture where DVS files are placed symbolyses when item was archived. But in my environment I could see directory structure starting with 1899 year. Then why these are created when EV never existed at that time?
TonySterling
Moderator
Moderator
Partner    VIP    Accredited Certified

Prior to EV 8 the directory structure date was based on the actual item.  Rouge years like the 1899 you see come from malformed or spam mail items that have erroneous date/time on them.
Jayasimha
Level 3

Hi Tony,

In EV8, is the directory structure based on when the message is archived? Or is it depens on the "original time message is created" attribute of the mail?
TonySterling
Moderator
Moderator
Partner    VIP    Accredited Certified

It is based on when the message is archived.
Stumpy
Not applicable

Our cab files, since upgrading to version 8, appear to be much smaller than before, probably 1/10th of the size - even though the cab files size has been set to 20Mb. This is having a large impact on our backup performance.

 

Is there any way of forcing a re-cab process where the smaller cabs can be extracted and re-built into larger cab files?

John_Santana
Level 6

Cool, many thanks fro the great article Jesus !

AbdulKadir
Not applicable
Employee

Amazing article Jesus. When does the temporary files (ARCHDVSSP) get deleted? Is there any process to initiate it? Because I have loads of ARCHDVSSP temporary files eating up the hard drive space.

Thank you in advance.

Dead-Data
Level 5
Partner Accredited Certified

@AbdulHadir,

The same collection process removes aged ARCH files 24 hours after they were last accessed. 

GertjanA
Moderator
Moderator
Partner    VIP    Accredited Certified

Hello all, and JW3 especially,

You write in this excellent article:

Quote: But what happens if you have a refcount of 1 and a totalcount of 100? This 1 item that still exists in EV is stopping the other 99 items from being removed from disk and freeing up storage. So what happens is the Sparse collections process.

The last items are extracted to their original location, the refcount is set to 0 and then EV deletes the CAB file. By default, Enterprise Vault will initiate the sparse collections when the refcount is 15% of the the totalcount.

End quote.

The sentence " the last items are extracted to their original location...."

Are those items having an extention DVS or ARCHDVS? I assume (as they need to be 'recabbed'), it is DVS, but just needing to be sure.

John_Santana
Level 6

ok, so if the EV vault store is converted into collection and then the .CAB files written to tape using Symantec Netbackup, can user still retrieve the archived items after the tape is recalled ?

The old data was archived while it was still on EV 9 and the new EV server running in production is EV 10.

Version history
Last update:
‎01-30-2010 07:50 PM
Updated by: