cancel
Showing results for 
Search instead for 
Did you mean: 

Client-side dedupe behaviour during first backup

bpup
Level 4
Partner

I have looked around, but still can't seem to find a definitive answer for my question. What I am trying to figure out is the behaviour of a client-side dedupe operation during the initial backup without any seeding. If the selection to backup consists of 1,024 identical 1MB files (1GB data total), does the intial backup still send all 1,024 of those to the central MSDP storage? Or does the plugin recognize that there are identical fingerprints within the dataset to be backed up (even without a fingerprint cache to compare to)? Thanks for any help, this is more for knowledge than practical purposes since I am using the seeding method anyway.

2 ACCEPTED SOLUTIONS

Accepted Solutions

Andrew_Madsen
Level 6
Partner

Confusing indeed. The critical portion of the process is the client only loads the hash table at the beginning of the back up job. So for the first backup there is nothing to load so everything gets sent regardless of similarities. Later the client does a cleanup and rearranges its database to reflect the duplications. If however it can load (seed) its hash table with another image it will and therefore you will get some deduplication. 

View solution in original post

Marianne
Level 6
Partner    VIP    Accredited Certified

I have just done a test - backed up my own laptop to our demo Appliance. No previous backup for my laptop.

I added my laptop in Client Attributes and selected Client-side dedupe.

Results:

07/24/2015 14:22:52 - Info appmaster (pid=5037) StorageServer=PureDisk:appmaster; Report=PDDO Stats for (appmaster): scanned: 93979429 KB, CR sent: 59125131 KB, CR sent over FC: 0 KB, dedup: 37.1%, cache disabled

 

So, some common data was found on the Appliance (37%) and only unique data was sent.

View solution in original post

4 REPLIES 4

Marianne
Level 6
Partner    VIP    Accredited Certified

I have moved this post to the NetBackup forum.

My understanding is that the 1st backup will send all data to the media server.
The information in the Details tab of the 1st job should confirm this.

But I may be wrong....

The NetBackup Deduplication Guide  says the following on p.28:

About NetBackup MSDP Client Deduplication
With media server deduplication, the client sends the full backup data stream to
the media server. The deduplication engine on the media server processes the
stream, saving only the unique segments.
With NetBackup Client Deduplication (also know as client-side deduplication), the
client hosts the deduplication plug-in that duplicates the backup data. The NetBackup
client software creates the image of backed up files as for a normal backup. Next,
the deduplication plug-in breaks the backup image into segments and compares
them to all of the segments that are stored in that deduplication node. The plug-in
then sends only the unique segments to the NetBackup Deduplication Engine on
the storage server.
The engine writes the data to a Media Server Deduplication
Pool.

Andrew_Madsen
Level 6
Partner

Confusing indeed. The critical portion of the process is the client only loads the hash table at the beginning of the back up job. So for the first backup there is nothing to load so everything gets sent regardless of similarities. Later the client does a cleanup and rearranges its database to reflect the duplications. If however it can load (seed) its hash table with another image it will and therefore you will get some deduplication. 

Marianne
Level 6
Partner    VIP    Accredited Certified

I have just done a test - backed up my own laptop to our demo Appliance. No previous backup for my laptop.

I added my laptop in Client Attributes and selected Client-side dedupe.

Results:

07/24/2015 14:22:52 - Info appmaster (pid=5037) StorageServer=PureDisk:appmaster; Report=PDDO Stats for (appmaster): scanned: 93979429 KB, CR sent: 59125131 KB, CR sent over FC: 0 KB, dedup: 37.1%, cache disabled

 

So, some common data was found on the Appliance (37%) and only unique data was sent.

bpup
Level 4
Partner

Thanks for the feedback, much appreciated! Helpful to know in case I decide to attempt an initial backup without seeding over the WAN. I am awaiting a seed drive from the UK, so I will be able to test the whole process at that point on 7.6.1.2.