cancel
Showing results for 
Search instead for 
Did you mean: 

How often should I run data collection

GulzarShaikhAUS
Level 6
Partner Accredited Certified

Hi,

We have EV 10.0.3 on Windows cluster and we are archiving Exchange mailboxes and journal mailboxes.

Our strategy to back EV is to do daily Full parttion backups for open partitions .. We have seen that the backup speed is rediculously slow .. less than 2MBps. The data collection is enabled but it is scheduled to run for data older than 10 days.. 

I know that data collection helps in getting backup done faster but would like to know what is the optimal use of it. Would the email retrival become slow in case I start collecting data one day older so I have almost all data collected in cab files.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

AndrewB
Moderator
Moderator
Partner    VIP    Accredited

you'll generally see on this forum that we recommend to avoid using collections if at all possible. your backup speeds sound rather slow but it would help to know what else you've checked and what other troubleshooting you've done about the backup speeds. also:

what are you using for backups?

how big are your open partitions?

what storage are you using for vault store partitions?

if you monitor the backup job, how fast are you reading cab files vs uncollected files?

etc

etc

 

View solution in original post

7 REPLIES 7

AndrewB
Moderator
Moderator
Partner    VIP    Accredited

you'll generally see on this forum that we recommend to avoid using collections if at all possible. your backup speeds sound rather slow but it would help to know what else you've checked and what other troubleshooting you've done about the backup speeds. also:

what are you using for backups?

how big are your open partitions?

what storage are you using for vault store partitions?

if you monitor the backup job, how fast are you reading cab files vs uncollected files?

etc

etc

 

GulzarShaikhAUS
Level 6
Partner Accredited Certified

We are using Netbackup Vault agent

200GB open patition

NetApp FAS 2040

I could never figure out which one is going faster

AndrewB
Moderator
Moderator
Partner    VIP    Accredited

ok 200gb is very reasonable. how many hours does your open partition job take to complete? your netbackup admin should be able to monitor the job as it runs to see the throughput differences but i imagine there are also logs that could be reviewed.

GulzarShaikhAUS
Level 6
Partner Accredited Certified

It takes anywhere from 15-18 hours ... Very long time .. 

We ended up reducing the partition size to 100GB to minimize the time to back it up.

GulzarShaikhAUS
Level 6
Partner Accredited Certified

Can you please explain why data collection is not recommended?

JesusWept3
Level 6
Partner Accredited Certified

Personally i hate collections as andy said, some of the reasons I don't recommend it are

1. If a collection goes bad, rather than losing one item, you can lose several dozen or even a hundred archived emails with one lost or bad CAB files

2. Processes such as EVSVR scans, index rebuilds, Move Archives, Meta Data Store builds are 25% slower, because it has to extract the item from CAB to disk etc

3. It's a one way process, theres no way to get out of collections except for cheap hacks

4. There have been multiple programatic issues with collections, such as having incorrect counts of items stored in the CAB, which is really important

5. Deleting items or storage expiry does not lead to immediate space savings, EV has to run through a collections process to check each CAB to extract the legitimate items, delete the CAB etc, so you could have deleted many archives thinking you'd save a lot of space, but not see anything happen on disks

6. You could also end up in a scenario where you need to do a restore of your Vault Store database, and the restore is from before a sparse collections was run, so it may say a particular item is in collection12345.cab, but in fact, it was placed in collection67890.cab and thus the item can't be retrieved

That along with other things that i've seen such as items being recorded in the database in wrong collections, ev losing track of the refcounts etc, its just a nasty process that isn't fun, and I don't generally recommend one way type of processes

As for your backup issue, if you're on a NetApp, why not use Snapshots?

https://www.linkedin.com/in/alex-allen-turl-07370146

GulzarShaikhAUS
Level 6
Partner Accredited Certified

Good points JesusWept3 .. 

I was thinking of the same to use snapshots but my customer is not confirtable and soon the data will be migrated to EMC storage. Also they are confirtable with EV agent type backup as we dont need to run scripts and many times the scripts have caused kaos by not removing the partitions/index from backup mode.

Having said that I think keeping smaller partitions would help me in easiler way than breaking my head with support for slow backup issue.

Also after learning about collections, I would now turn off the collections on the new partitions and will see how long it takes to backup smaller 100GB partitions without collection. If it is acceptable I am going to stick to it.

You inputs on this if any ??

Thanks again Andrew and JesusWept3.