cancel
Showing results for 
Search instead for 
Did you mean: 

Difference between De-Duplication and incremental backup

Varma_Chiluvuri
Level 5
Certified

    Can someone explain the difference between De-Duplication and incremental backup ?

   The incremental backup takes the changes from previous backup and it backs up the changed data and De-Duplication also does the same right ?

 

7 REPLIES 7

S_Williamson
Level 6

An Incremental Backup only backs up the difference since the previous FULL and subsequent incremental backups.

DeDuplication only backs up blocks (or fingerprints) it does not currently have.

So if you do a incremental backup and say its 5GB of data, Puredisk may allready have some of the blocks (or fingerprints) from within that 5GB (From this server or possibly another server), so although you backup effectively 5GB , puredisk might only download 1GB.

Dedupe rates on Puredisk is generally 90% + on Windows Filesystems (not including the first run on a server).

Hope this helps.

Varma_Chiluvuri
Level 5
Certified

@William, I think the the blocks with same fingerprints will be ignored in the next Backups(De-Duplication). But while RESTORATIONdo we need both FULL and DE-DUPE Backups ?

Patrick_Whelan_
Level 6

I think there is some misunderstanding regarding de-duping and backups. De-duping is a method of storing data that minimizes the amount of storage needed, i.e. if the bit of data being stored has the same checksum or whatever method is used by that particular de-dupe device then it will not be stored but only pointed to.

From an applications point of view (NetBackup or any other application putting data on a de-dupe device) the mechanism is transparent. When you do a restore you select the files/directories/folders you need and NetBackup pulls the data back from wherever is convenient.

i.e. Restores neither know nor care about de-depuing.

Just my £.02 worth.

Varma_Chiluvuri
Level 5
Certified

@Patrick De-Duping is clear

For Example consider.

Day1 Tape - A --- Full Backup

Day2 Tape - B --- De-Dupe

Day3 Tape - C --- De-Dupe

Now I need to restore Day 3 Files, for this do I need Tape - A ?

 

 

S_Williamson
Level 6

DeDupe is not written to tape. If you export a deduped data selection to tape, Netbackup recompiles the entire image and writes it off. Dedupe data is only kept on your diskpool.

f25
Level 4

@Varma Chiluvuri
I totally do not understand your example. Let me rewrite it to have any sense:

A Full Backup --> Dedupe 
                            --> Later duplication/export to tape
B Incremental --> Dedupe
C Incremental --> Dedupe

If you do not run full backups to a deduplicated storage and run only incrementals to it: you have it wrongly set up.

Then if you run backups from NetBackup (full or incremental or whatever) it works exactly the same like tape. With that difference you use 1 TB of tape for 1 TB of data but you use only the "unique records/blocks" space for 1 TB of data. Example: if you copy&paste 1 MB document on source 100 times, you should use only 1MB + "some overhead space" for its backup the first time, and only "some overhead space" for any consecutive backups (either full or incremental).

I don't know if this clears-up a bit but I hope so :)

 

grlazy
Level 3

@Varma Chiluvuri

I 'll make my attempt now.

As stated above, dedup is the way the files are  taken backup to a diskpool.

For my examples shake, let's say we use client side dedup.

the files of the client are broken into fragments and via a hash algorithm the agent create the fingerprints which are transfered to the dedup disk pool.THESE fingerprints are unique for the whole dedup pools and every client sending data to it.

The first time the backup is taken , the total of the clients's data will follow the above procedure and will transfered to the dedup pool.

At the second backup , full/incremental/whatever the files which are marked for backup (archive bit/time stamp) will be broken again at fragments and will take backup ONLY the fingerprints WHICH ARE NOT TAKEN THE DAYS BEFORE and already exist (and not expired) at the dedup pool.It does not mater is full or incr backup, only the type of the data does change the size of the data which are transfered to the pool (or lowers the deduplication rate as the transfered data size increments).

At the dedup pool you can have retention periods but as soon the retention period expires, so does the fingerprints which are not needed.Then the garbadge collection runs and the fingerprints are deleted.

Following the above process you may do a copy of the data to a tape and with a different retention (Via the so-called SLP)

The process which doed extract the files from the pool is called rehydration and recreates the file and write it to the tape in the form it was at clients disk (and as taken with traditional backup).This very same process is used at the restore.

 

depending on you implementation you may have numerous copies with different retentions.

You may have one week at dedup pools and at the same time a second copy on tape with a month retention.As soon as the dedup copy expires , the tape is the primary......