cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

Is dedupe to disk any faster in BE 2012?

robnicholson
Level 6

Despite all the doom-n-gloom in the "First impressions thread", we don't really fall into the "lots of servers" category so I'm going to have a gander at BE2012. One thing that would sway me is if dedupe to disk is improved. Mainly performance wise and also resilience of the dedupe database.

I don't like the weekly full backup running during business hours so it starts Friday night @ 8pm. The full backup of ~2TB finishes in 32 hours and the verify takes another 20 hours so we're just in the weekend.

That's 1,600MB/min for the backup. The disk system itself can do 6,000MB/min and the network is 1Gbit/s throughout. The last backup got a de-dupe ratio of 85.7:1 which I think means that just ~24GB of that 2TB ended up been written to the de-dupe database (in terms of 64k data chunks). 24GB feels about right as a differential backup will have been run the night before.

Gut instinct makes me feel like that 32 hours should be much lower. This post isnt' as such as about "checking our harware" (and yes, this is client side dedupe) but more about whether any performance enhancements have been made in 2012.

Given that in the 32 hours, only 24GB is been written in terms of data chunks (which at theoretical max speed would be only four minutes), BE is spending a lot of time doing something else rather than writing data chunks.

I'm aware of the general algoritms used in all of this and the bottlenecks are probably:

  1. How fast 2TB can be read by the BE client on the server considering lots of smaller files (OS overhead)
  2. How fast the SAN can serve up that data across iSCSI to the server
  3. How fast the client can calculate the hash
  4. How fast BE/Postgres SQL can lookup that hash
  5. How fast BE/Postgres SQL can update the file's entry in the catalog
  6. The fact we're backing up DFSR exclusively here and BE is slow here

Has any effort in BE2012 been put into addressing these bottlenecks? Like multi-threaded calculation of hashes on the server - which is where I suspect a big bottleneck occurs due to the relative CPU intensive hash algorithm and therefore taking advantage of the 8 cores on our server seems to make sense.

Cheers, Rob.

0 REPLIES 0