cancel
Showing results for 
Search instead for 
Did you mean: 
pkh
Moderator
Moderator
   VIP    Certified

In Part 1,

https://www-secure.symantec.com/connect/articles/deduplication-simplified-part-1-backup

I have discussed what happened when you back up to a dedup folder/OST appliance. After our series of backups, the dedup store looks like this

abcde = FileA-FB1, FileC-IB1, FileC-FB2

ABCDE = FileB-FB1, FileB-FB2

12345 = FileA-FB1, FileB-FB1, FileC-IB1, FileB-FB2, FileC-FB2

67890 = FileA-FB1, FileB-FB1, FileC-IB1, FileB-FB2, FileC-FB2

A = FileC-IB1, FileC-FB2

Aabcd = FileA-IB1, FileA-FB2

e1234 = FileA-IB1, FileA-FB2

56789 = FileA-IB1, FileA-FB2

0 = FileA-IB1, FileA-FB2

qwert = FileD-FB2

yuiop = FileD-FB2

 

Restore

Suppose we want to restore File A from Full Backup 2 (FB2).  The dedup engine would have to look into its database, follow the various pointers and assemble the file from the various data chunks.  This process is referred to as rehydration.  This looks simple from our small example, but a file may consist of thousands of chunks.  Following the pointers and assembling the chunks is very I/O intensive and can take time.

Duplication

When you duplicate a backup set in the dedup folder to disk or tape media, the backup is rehydrated before it is written out to the media.  This will take time.  The only exception to this, is optimised duplication.  Optimised duplication involves duplication backup sets from one dedup folder to another dedup folder.  During optimised duplication, only the data chunks which do not exist in the target dedup folder is sent to it.  This saves bandwidth and the time used to rehydrate the backup set.

Deletion of Backup Sets

What happens when backup sets are deleted from the dedup folder?

Suppose Full Backup 2 is deleted.  The dedup folder will delete the pointers to the data chunks which are from the FB2 backup sets.  Our example will then look like this.

abcde = FileA-FB1, FileC-IB1

ABCDE = FileB-FB1

12345 = FileA-FB1, FileB-FB1, FileC-IB1

67890 = FileA-FB1, FileB-FB1, FileC-IB1

A = FileC-IB1, FileC-FB2

Aabcd = FileA-IB1

e1234 = FileA-IB1

56789 = FileA-IB1

0 = FileA-IB1

qwert = 

yuiop =

You will notice that the last two data chunks, qwert yuiop, do not have any pointers to them.  The space occupied by these two chunks would be reclaimed during the next housekeeping run of the dedup engine.  Housekeeping is done twice a day and the frequency and timing cannot be changed.

Let's rewind the clock and go back to the time before Full Backup 2 was deleted and assume the Full Backup 1 is deleted instead. (Ignore the fact that this this not possible because Incremental Backup 1 depends on Full Backup 1 and DLM will not allow this deletion).  Our example will look like this.

abcde = FileC-IB1, FileC-FB2

ABCDE = FileB-FB2

12345 = FileC-IB1, FileB-FB2, FileC-FB2

67890 = FileB-FB1, FileC-IB1, FileB-FB2, FileC-FB2

A = FileC-IB1, FileC-FB2

Aabcd = FileA-IB1, FileA-FB2

e1234 = FileA-IB1, FileA-FB2

56789 = FileA-IB1, FileA-FB2

0 = FileA-IB1, FileA-FB2

qwert = FileD-FB2

yuiop = FileD-FB2

You will noticed that all the data chunks still have pointers to them and thus the space they occupy would not be reclaimed during the next housekeeping run.  The dedup housekeeping is done twice a day around noon and midnight and these times cannot be changed.

There will be times when you run out of space on your dedup folder.  You think that by deleting some backup sets, you would be able to get some space, but you did not get any more space.  The situation above is what happens.  The data chunks which constitute the backup sets that you have deleted still have pointers to them and their space cannot be reclaimed.

To recap, when you delete a backup set, the dedup engine will remove the pointers from this backup set to the constituent data chunks.  When there is no more pointers to a data chunk, the space occupied by this data chunk would be reclaimed during the next housekeeping run.

The reclaimed space is a space within some file in the dedup folder.  The "container" file does not actually release this space and make itself smaller.  The reclaimed space is just marked so that it can be used by the dedup engine.  You might free up a space within the dedup folder, but the actual disk space occupied by the dedup folder may remain the same.

While you would like a high dedup ratio which means that a lot of backup sets are using chunks which already exists in the dedup folder, a high dedup ratio means that when you delete backup sets, the amount of space freed is not much.

If you need space in your dedup folder immediately and cannot wait for the next housekeeping run, then use the procedure in this document to reclaim space

http://www.symantec.com/docs/TECH130103

If you do not gain much space, then you would need to move the dedup folder to a volume with bigger space by following the procedure in this document

http://www.symantec.com/docs/TECH160832

The Dedup Folder

The files in the dedup folder are all inter-connected and are part of a big database.  As such, you should not mess with any of the files.  Messing with a single file may caused irreparable damage to the entire dedup folder.  If there is anything wrong with your dedup folder, the best thing is to log a formal support case with Symantec so that an engineer can take a look at the probem.

Conclusion

When using dedup, just bear in mind, there is no such thing as a free lunch.  You save on space, but you pay in terms of processing and sometimes speed.

 

Version history
Last update:
‎10-02-2014 08:44 PM
Updated by:
Moderator