Showing results for 
Search instead for 
Did you mean: 

Index sizing and actuals

Level 4

Hello all,

I am in the middle of figuring out what the Index architecture will be for a new EV (11) platform. This platform will replace an existing EV platform (migrate).

Looking through all the docs, incl. the estimator tools, the rule of thumb is "full index = 13% of the unarchived data size". I.e. 100 TByte of original data will result in approx 13 TByte of Index data.

I have had a look on the current environment (holding about 220 TByte and 1.3 Billion items) and see that I have approx. 13,5 TByte of Index and roughly 17 Million files stored in the Index. Where as the size is a bit less (13% = 28,5 TB).

I'm a bit in the blank how indexes are stored and why I see a different outcome looking on disks (sizes) and calculator...

The number of achived items if pretty much the same as the number of Indexed Files (minus the failed)



Level 4
Partner Accredited Certified

Hi Bert,

The rule of thumb is usually pretty good when the total content sizes are smaller and you're looking at one - maybe two - vault stores.  When you're getting into a larger set of data, however, it can get a bit murky.  This is because of the differences in how vault stores and indexes are structured.

The folders in your index partitions are based on archives, whereas vault store partitions are based on the vault stores they're associated with.  So while the index volumes are only building based on the archive they're associated with, vault store partitions are building based on whether or not actual items are already in the store (assuming there's SIS going on within the store), and/or whether or not other vault stores in the group have that item (assuming SIS is going on within the group).  Different storage devices also play into this, as they may or may not be handling their own compression or Single-instancing.

In your case, it's possible that there's a good deal of duplication going on between vault stores.  It's also possible that the indexes are seeing a lot less unique characters in the content... or it's a combination of the two.

Either way, the 13% tends to be a conservative rule of thumb from my experience, as rules of thumb pertaining to sizing should be.  Even though your case is a bit more unusual than most, I'm not finding myself terribly surprised by the number.

Hope that helps!


Level 6
Partner Accredited

Yes thats what I have noticed at several of my customers, the Index is basically 10%