EV, EMC Centera and Collections
I had the opportunity to test various configuration options in EV as well as on an EMC Centera. A customer has stored the EV archive data on NTFS devices and wanted to move the data to EMC Centera. The sole reason for this exercise was verifying the expected size of data on the EMC Centera compared to NTFS.
The white paper from Symantec and EMC should be read for the basics!
- Enterprise Vault 10.0 - Performance Guide: Chapter 13 (http://www.symantec.com/docs/DOC4553)
- Symantec Enterprise Vault and EMC Storage Applied Best Practices (http://www.emc.com/collateral/hardware/white-papers/h6790-symantec-enterprise-vault-centera-wp.pdf)
Note: The tests were done at different days during the week in a production environment of a medium size corporation. The EV server is used during the day in terms of RAM and CPU for Vault Cache, Virtual Vault and manual archiving, which user's do regularly in larger quantities. Also the EMC Centera is used for other applications. This should be noted when comparing archiving speed. No true performance evaluation was done. As such, the information and data is provided as is without any guarantee. Please do note make any decisions based on this information without contacting your trusted EV support partner, Symantec and/or EMC.
We had the exiting environment enhanced by the following:
- Two separate Vault Store Groups
- 1 Vault Store Group with 1 Vault Store with 1 VS partition configured with NTFS storage ⇒This was the base line of the data size.
- 1 Vault Store Group with 1 Vault Store with 1 VS partition connected to EMC Centera
- The EV archive of the test user on EMC Centera was deleted before each test
- The EMC Centera was verified that no data was left (even per Centera garbage collection)
- Three PST files were each imported twice after each other
The EV partition and EMC Centera were configured like this:
- EV on NTFS with (device does not provide data duplication, device does not provide data compression).
- EV with Centera Collection enabled and EMC Centera in Performance mode
- EV with Centera Collection disabled and EMC Centera in Performance mode
- EV with Centera Collection disabled and EMC Centera in Capacity mode
1. PST (1st import) | 1. PST (2nd import) | 2. PST (1st import) | 2. PST (2nd import) | 3. PST (1st import) | 3. PST (2nd import) | |
PST content accumulated in GB | 0.65 | 1.3 | 2.73 | 4.16 | 4.99 | 5.82 |
1st data import | Data import to NTFS | |||||
NTFS number of objects in NTFS partition | 8248 | 13059 | 35114 | 48595 | 62001 | 71206 |
NTFS (Size on Disk) in GB | 0.4 | 0.47 | 1.14 | 1.32 | 1.89 | 2.04 |
Duration of import in h:mm:ss | 00:10:01 | 00:10:23 | 00:22:16 | 00:17:55 | 00:25:07 | 00:12:36 |
EV Total items (per VS reporting) | 4890 | 9701 | 23184 | 36665 | 45789 | 55075 |
EV Total Size in GB (per VS reporting) | 0.47 | 0.86 | 1.645 | 2.451 | 3.12 |
3.756 |
2nd data import | EV Centera Collection enabled, Centera in Performance Mode | |||||
Centera C-Clip Count | 55 | 104 | 226 | 375 | 443 | 546 |
Centera User File Count | 1658 | 3249 | 6171 | 9932 | 10822 | 12517 |
Centera used capacity in GB | 0.43 | 0.85 | 2.00 | 2.00 | 3.00 |
4.00 |
Duration of import in h:mm:ss | 0:11:26 | 0:12:08 | 0:24:16 | 0:25:14 | 0:25:54 | 0:21:11 |
3rd data import | EV Centera Collection disabled, Centera in Performance Mode | |||||
Centera C-Clip Count | 4811 | 9622 | 23103 | 36161 | 45789 | 54994 |
Centera User File Count | 3916 | 7830 | 16461 | 24848 | 30556 | 36018 |
Centera used capacity in GB | 0.39 | 0.78 | 1.00 | 2.00 | 3.00 |
3.00 |
Duration of import in h:mm:ss | 0:25:17 | 0:24:11 | 1:01:19 | 1:00:05 | 0:43:20 | 0:43:47 |
4th data import | EV Centera Collection disabled, Centera in Capacity Mode | |||||
Centera C-Clip Count | 4811 | 9622 | 23104 | 36306 | 45790 | 54995 |
Centera User File Count | 3917 | 7833 | 16468 | 24936 | 30568 | 36034 |
Centera used capacity in GB | 0,39 | 0.78 | 1.00 | 2.00 | 3.00 | 3.00 |
Duration of import in h:mm:ss | 0:24:07 | 0:24:14 | 1:01:56 | 0:59:23 | 0:45:30 | 0:43:38 |
The data set isn't very large. Also we figured the EMC Centera is listing the size in MB up to one GB and just reports full rounded GB values afterwards. Hence I was including trend lines below to show an average value.
These results spring to my mind:
- The single instance of the original data compared to NTFS size on disk is very good. Basically the native EV compression and OSIS (optimized single instance storage) for small unstructured data is without comparison.
- The archiving times for NTFS and Centera with collections is similar and very good too.
- The archiving times for Centera without collections has increased significantly. This is due to the Centera checking each item for single instance.
- We changed the Centera from Performance mode to Capacity mode, as in Performance Mode (apparently) data larger 250KB should be saved as sis objects on the Centera in Capacity mode. We wanted to also have smaller objects than 250KB be deduplicated by Centera more efficiently.
- My biggest surprise is that we could see no difference between the Performance Mode and the Capacity Mode. A EMC Centera garbage collection was run... also the involved EMC consultant was a bit at a loss here.
- 2nd data import: The EV Total Size in GB (per VS reporting) for EV Centera Collection enabled is pretty close to the value reported in EV Total Size in GB (per VS reporting). There is little data saved when collecting the items. My rule of thumb is that collecting data means, the EV usage reports shows you the data used on the Centera.
- As the centera has a larger address range than previous versions, I challenge the EMC whitepapers best practice to enable device level--sharing on an Centera partition. This is favorable for decreasing the Centera User File Count, but at the same time the Centera Used Capacity is quite a bit higher. - In the latest Centera Models, the user file count value can be increased to 100 millions literally archiving EV data easily without collections, though the Centera used TB value decreases.
- Performance Mode should have been quicker in archiving than Capacity mode, as more of the smaller files need to be checked and put into the single Instance of the Centera. Apparently, this does not make any difference - at least not in our environment.
- Following the trend lines, NTFS archiving is the smallest disk size, Centera without collections is following whereas EV with collections does not provide a very good data deduplication and size decrease after all.
Please leave your comment below if you find this information useful. I am looking forward discussing these findings further.