Lessons Learnt with Large Scale Archive Migrations

Rob_Wilcox1 · ‎09-25-2014

What is large scale?

While most people would consider a 20TB migration as large scale, we regularly find ourselves involved in projects where we have to migrate 100+ TB, often across multiple sites (physical sites and logical Enterprise Vault sites).

In this post I would like to share some of the experiences running a 180TB migration from an old EV 9 environment (with 100TB stored in a legacy EMC Centera and 60TB in NetApp - mostly in CAB files) to a new EV 11 platform.

With the customer having 3TB growth per month we estimated another 20TB to be archived during the migration.

Planning a seamless migration with Sync & Switch:

Perhaps the most powerful approach to Enterprise Vault migration is to decouple the data migration from the user migration. Therefore our first stage is to create an “unassigned” archive on the target side and start synchronizing the data in an existing archive to it, while the user still works with (and gets archived by) the old infrastructure as if nothing happened. Once we reach 100% identical data in both archives, we make sure that we keep this steady-state by synchronizing the “delta” between old and new every 12 hours. The elegance of this process means that there is no pressure or urgent cutover for the customer: they can switch any given user to the new archive when ready.

Once a user gets switched, we run a final synchronization. That final synchronization is normally only a few items and also includes the full seamless workflow of disabling the old archive within the source system, assigning and enabling the archive on the target side, and finally fixing the shortcuts in the mailbox to point to the new Enterprise Vault Archive and Saveset IDs.

This approach allows for a high degree of predictability, especially in large projects, because of the unique deferred nature of synchronization before switching.

It’s also worth noting that in this project, we also ran a pilot for 2 weeks migrating 30 users before we flicked the switch synchronizing the data for all the other archives.

For 7 months we only logged in occasionally to monitor the progress – no daily checking needed!

Fundamental decisions for our customer:

Should we try to migrate to another Vault Store in the same system or would it be better advised to create a new environment?
As the customer already had some issues with the existing environment and had more archives per Vault Store than they would like to have moving forwards, we decided to use a clean EV 10.0.4 system, built from scratch on current OS and SQL Versions and with new VMWare ESX hosts.

This decoupled the export and import so that the impact on the legacy environment was fairly low.

During the 5 months of data synchronization we never had more than 25-30% load on the source boxes – presenting perfect balance between migration speed and user experience for shortcut retrieval and search.

Before the first users were switched, the system was upgraded to EV11 without any issues.

Migrating a journal:

The good news is that migrating a journal is easier than user mailboxes as there are no shortcuts to fix or mailboxes to enable. The bad news is that unless you move to EV 10.0.3 or later, the data has to be ingested into Exchange as the ability to understand envelope data only exists in the most recent EV versions.

This was partly behind the idea to use the migration to jump start the EV 11 deployment – allowing direct Journal Ingestion with envelopes.

Migrating Public Folder data:

The customer had 900GB of Public Folder data, which we migrated back into Exchange using the QUADROtech Public Folder Exporter.
It’s important to make sure you understand all other sources such as File, SharePoint or Public folders before you select products, or create a detailed migration plan.

Cabbing and Centera Collections

While Centera and NTFS collections are a great way to reduce backup times and complexity they are a performance burden when extracting large amounts of data. We asked the customer during our first discussion to set the date threshold to 999 months and this gave us nearly 1 years worth of un-cabbed files by the end of the project.

We ended up with 40TB as single files, 40TB in Collections and 100TB on Centera.

We estimate that if we had everything in CABs we would have needed another 5-6 weeks for the data migration.

Filters

We discussed with the customer the idea to only migrate data younger than 5 years for mailboxes. We soon we realized during the pilot that this did not make sense here. The Legal team confirmed that the journal has to be kept for 7 years and therefore we would not see a big storage impact. The loss of old email would need to go through some international reviews and above all it was thought that troubleshooting filtered migrations would extend the project.

For example, if there is 47657 items in the source and the target only has 43891 – it leaves all sorts of questions open. “User can’t find a mail? Probably filtered. But let me check first…”

So we decided against filtering and migrated everything.

Performance numbers:

How fast did we migrate? For extraction, we achieved a peak performance of 180GB/h across 8 servers and a sustained rate of 100GB/h (2.4TB per day). The bottleneck here was the clearly the Centera.

The import process was a bit slower due to the scalability of the virtual hardware.

We saw peaks of 120GB/h across 6 target servers and a 70GB/h sustained rate.

Shortcut repair performance:

One of the hidden aspects of a user migration is the performance needed to repair the shortcuts in the user mailbox, so that they point to the new environment instead of the old. While this can be scaled quite nicely it can be a significant hit on the Exchange CAS Servers and Exchange Storage database – no matter how efficient the integration, Exchange disk and processing will always hit a resource bottleneck when updating 100’s of 1000’s of small shortcut items!

In this case the customer pushed through a change that they considered for a long time and reduced the shortcut lifetime from 2 years to 6 months. This would mean that we would finish a large mailbox in about 15 minutes versus 45 minutes or more before the change.

Project duration

From pilot to completing the data migration took around 6 months. Five months were required to finish the data synchronization.
The switching process for 48000 users took about 3 months and we started this about 3 months into the migration. If I had to run this project again, I would probably wait even longer and finish the data synchronization completely, due to the Virtual Vault issues below. Whilst the old system is still place, it is now in the process of being decommissioned. The migration server will still be in place for a few months, as some users still need shortcuts from PSTs or other repositories to be fixed.

Migration errors

We looked at migration errors twice a week and implemented several registry keys and patches to reduce them to a bare minimum. We had several hundred messages that were either “Item not found” errors due to a handful of corrupted CAB files or “MAPI item corrupt” errors where the MSG was structurally broken and could not even be opened by Outlook-Spy or MFCMAPI.

User impact

While we are able to automate every step in the migration workflow such as disabling and enabling mailboxes for archiving, there is one caveat when it comes to fully automating an EV migration:

There is no mechanism within EV to trigger a Virtual Vault / Vault Cache rebuild on the client. Therefore all VV-enabled users were automatically notified via email to click “Ctrl-Shift” –> EV Search and rebuild Virtual Vault.

Apart from that the workflow engine worked flawless.

The only helpdesk calls had to do with people that had moved shortcuts into PSTs and needed a re-run of “Fix-Shortcuts” once the items were copied back into the mailbox.

Virtual Vault / Vault cache has been the biggest impact on our migration speeds.

When we started to switch users, our import performance was impacted by 40%. We soon discovered that we had literally half the company synchronizing their Virtual Vaults against the target, requesting almost every item we had migrated while we were trying to finish the migration for the rest of the user base.

We immediately stopped switching any user with Virtual Vault and left those users for the end of the project when data synchronization has completed.

Summary:

•   Sync & Switch is a great way to structure your project and take the complexity out of the delivery. Wait until the majority of data has been moved before you start switching the users.
•   Try to migrate to new hardware whenever possible: The source system should only do extraction in addition to archiving and occasional VV/VC delta sync requests. The target should only do data ingestion and will be fully loaded with VV/VC full download requests for switched users. Doing everything on the same server is simply not a good idea.
•   If you need to migrate a Journal, upgrade to EV 10.4 or later.
•   Make sure that you have other sources like Public Folders, SharePoint, etc. covered.
•   Consider the impact of filtering. The notion of “losing data” should not be associated with your project as it might cause issues with the user base and EV Single Instance make storage savings negligible when using Journaling.
•   If possible reduce the amount of shortcuts by adjusting the lifetime for those otherwise provide dedicated CAS Servers for the shortcut fixing.
•   Keep Virtual Vault / Vault Cache users to the end of the project. Let them sync their data once the migration is (nearly) complete
•   Understand that while the best migration products can automate the whole user migration, the rebuilding of VV/VC is a manual task
•   Handle migration errors as they arise. Many can be fixed through settings, registry keys and patches.
•   Don’t mix up peak versus average performance. Often it is easier to extend the hours the system is running versus the items per hour.
•   Speed is important, but not everything. In our case we could have added 2-3 more servers on the target side to get from 5 months to 3-4 months for the data synchronization, but the customer decided against additional hardware to be deployed and maintained which would be of no real use after the migration.
•   Be realistic about goals, timeframes and impact. It took you 7 years to amass this data, now give 7 months to move all of it. There is no point in wasting 3 months in discussion about how the migration can be shortened by 8 weeks.

tonywu_ingram · ‎10-03-2014

Hi, How to sync and switch? Please advise is there any tools. Are u setting a new EV site? Thanks TW

Rob_Wilcox1 · ‎10-04-2014

Hi Tony,

Thanks for your comment.

Sync & Switch is a methodology which Archive Shuttle uses specifically to ease the pain of archive migration on end-users. It's not built into Enterprise Vault.

Hope that helps

peter_kozak1 · ‎10-10-2014

Worth to read if you plan migrations into Enterprise Vault:
http://www.quadrotech-it.com/archive-migration-why-extraction-speed-is-only-half-the-story/

VOX