Knowledge Base Article

EV, EMC Centera and Collections

I had the opportunity to test various configuration options in EV as well as on an EMC Centera. A customer has stored the EV archive data on NTFS devices and wanted to move the data to EMC Centera. The sole reason for this exercise was verifying the expected size of data on the EMC Centera compared to NTFS.

The white paper from Symantec and EMC should be read for the basics! smiley

Note: The tests were done at different days during the week in a production environment of a medium size corporation. The EV server is used during the day in terms of RAM and CPU for Vault Cache, Virtual Vault and manual archiving, which user's do regularly in larger quantities. Also the EMC Centera is used for other applications. This should be noted when comparing archiving speed. No true performance evaluation was done. As such, the information and data is provided as is without any guarantee. Please do note make any decisions based on this information without contacting your trusted EV support partner, Symantec and/or EMC.

We had the exiting environment enhanced by the following:

  • Two separate Vault Store Groups
    • 1 Vault Store Group with 1 Vault Store with 1 VS partition configured with NTFS storage ⇒This was the base line of the data size.
    • 1 Vault Store Group with 1 Vault Store with 1 VS partition connected to EMC Centera
      • The EV archive of the test user on EMC Centera was deleted before each test
      • The EMC Centera was verified that no data was left (even per Centera garbage collection)
  • Three PST files were each imported twice after each other

The EV partition and EMC Centera were configured like this:

  1. EV on NTFS with (device does not provide data duplication, device does not provide data compression).
  2. EV with Centera Collection enabled and EMC Centera in Performance mode
  3. EV with Centera Collection disabled and EMC Centera in Performance mode
  4. EV with Centera Collection disabled and EMC Centera in Capacity mode
  1. PST (1st import) 1. PST (2nd import) 2. PST (1st import) 2. PST (2nd import) 3. PST (1st import) 3. PST (2nd import)
PST content accumulated in GB 0.65 1.3 2.73 4.16 4.99 5.82
1st data import Data import to NTFS
NTFS number of objects in NTFS partition 8248 13059 35114 48595 62001 71206
NTFS (Size on Disk) in GB 0.4 0.47 1.14 1.32 1.89 2.04
Duration of import in h:mm:ss 00:10:01 00:10:23 00:22:16 00:17:55 00:25:07 00:12:36
EV Total items (per VS reporting) 4890 9701 23184 36665 45789 55075
EV Total Size in GB (per VS reporting) 0.47 0.86 1.645 2.451 3.12

3.756

2nd data import EV Centera Collection enabled, Centera in Performance Mode
Centera C-Clip Count 55 104 226 375 443 546
Centera User File Count 1658 3249 6171 9932 10822 12517
Centera used capacity in GB 0.43 0.85 2.00 2.00 3.00

4.00

Duration of import in h:mm:ss 0:11:26 0:12:08 0:24:16 0:25:14 0:25:54 0:21:11
3rd data import EV Centera Collection disabled, Centera in Performance Mode
Centera C-Clip Count 4811 9622 23103 36161 45789 54994
Centera User File Count 3916 7830 16461 24848 30556 36018
Centera used capacity in GB 0.39 0.78 1.00 2.00 3.00

3.00

Duration of import in h:mm:ss 0:25:17 0:24:11 1:01:19 1:00:05 0:43:20 0:43:47
4th data import EV Centera Collection disabled, Centera in Capacity Mode
Centera C-Clip Count 4811 9622 23104 36306 45790 54995
Centera User File Count 3917 7833 16468 24936 30568 36034
Centera used capacity in GB 0,39 0.78 1.00 2.00 3.00 3.00
Duration of import in h:mm:ss 0:24:07 0:24:14 1:01:56 0:59:23 0:45:30 0:43:38

 

The data set isn't very large. Also we figured the EMC Centera is listing the size in MB up to one GB and just reports full rounded GB values afterwards. Hence I was including trend lines below to show an average value.

These results spring to my mind:

  • The single instance of the original data compared to NTFS size on disk is very good. Basically the native EV compression and OSIS (optimized single instance storage) for small unstructured data is without comparison.

Centera_1.png

  • The archiving times for NTFS and Centera with collections is similar and very good too.
  • The archiving times for Centera without collections has increased significantly. This is due to the Centera checking each item for single instance.
  • We changed the Centera from Performance mode to Capacity mode, as in Performance Mode (apparently) data larger 250KB should be saved as sis objects on the Centera in Capacity mode. We wanted to also have smaller objects than 250KB be deduplicated by Centera more efficiently.
  • My biggest surprise is that we could see no difference between the Performance Mode and the Capacity Mode. A EMC Centera garbage collection was run... also the involved EMC consultant was a bit at a loss here.
  • 2nd data import: The EV Total Size in GB (per VS reporting) for EV Centera Collection enabled is pretty close to the value reported in EV Total Size in GB (per VS reporting). There is little data saved when collecting the items. My rule of thumb is that collecting data means, the EV usage reports shows you the data used on the Centera.
  • As the centera has a larger address range than previous versions, I challenge the EMC whitepapers best practice to enable device level--sharing on an Centera partition. This is favorable for decreasing the Centera User File Count, but at the same time the Centera Used Capacity is quite a bit higher. - In the latest Centera Models, the user file count value can be increased to 100 millions literally archiving EV data easily without collections, though the Centera used TB value decreases.
  • Performance Mode should have been quicker in archiving than Capacity mode, as more of the smaller files need to be checked and put into the single Instance of the Centera. Apparently, this does not make any difference - at least not in our environment.
  • Following the trend lines, NTFS archiving is the smallest disk size, Centera without collections is following whereas EV with collections does not provide a very good data deduplication and size decrease after all.

Centera_2.png

Please leave your comment below if you find this information useful. I am looking forward discussing these findings further.

Published 11 years ago
Version 1.0

Was this article helpful?