Backup & Recovery

De –Duplication Role: -

The role of data de-duplication is to increase the amount of information that can be stored on storage appliances (Disk arrays, filers,)   and to increase the effective amount of data that can be transmitted over networks.

 
Data reduction, built on a methodology that systematically substitutes reference pointers for redundant variable-length blocks (or data segments) in a specific dataset, is a key approach to data deduplication.

Data deduplication operates by segmenting a dataset  (In  backup environment this is normally a stream of backup data)  into blocks and writing those blocks to a  Disk Storage . To identify blocks in a transmitted stream, the data de-duplication engine creates a digital signature - like a fingerprint - for each data segment and an index of the signatures for a given repository.

The index, which can be recreated from the stored data segments, provides the reference list to determine whether blocks already exist in a repository. The index is used to determine which data segments need to be stored and also which need to be copied during a replication operation.

      Data Deduplication and Data Movement

      Replacement of duplicate data with references to a shared copy

      Distinction: Whole record level or sub-record level

 -      Whole record level refers to File or Object level

-        Sub record level

      Deduplication @ source reduces WAN traffic

      Allows Full backup metadata collection

      Allows sub file level incremental data movement

When data de-duplication software sees a block it has processed before, instead of storing the block again, it inserts a pointer to the original block in the dataset's metadata. If the same block shows up multiple times, multiple pointers to it are generated

            De- duplication can be on File, Block, Segment Level   : -

       File Level

  Usually, the file is identified through it’s content, and multiple copies are Deduped. If a bit in the file changes,
  it will be seen as a new copy, and stored in it’s full size + the 1 new bit.

      Block Level

 Disk technologies allow the identification of a block, where each block content is stored only once, where multiple copies are referenced within a File system database. Usually deduplication is done after the data has been stored on disk, in a post process like way.

       Segment Level

   At file level, the file is read and divided in smaller parts called segments, where for each segment as       hash is calculated based upon it’s content. A database contains the relation between a file metadata and the segments stored as files. De- duplication is done at the segment level

 

In Backup Environment De duplication Happens on : -

  •    Client

      Less impact on WAN

      Backups are moved/calculated by clients

      Reduced backup Window

      Removes tape from Remote location

  •    Media Server

      Easy to implement

      Local LAN solution targets data centre

      Scalable by # of Media Servers

      Offloads clients from dedupe

      Leverage the Media servers

      Storage compatibility

  •     Target deduplication

      Appliance model, usually Storage dependant

      Less intelligent

      Dedupe limited to the appliance level

      No impact on backup solution

      Less scalable, questionable suitability for larger data centres


Data Deduplication Applied to Replication

Data deduplication makes the process of replicating backup data practical by reducing the bandwidth and cost needed to create and maintain duplicate datasets over networks. At a basic level, deduplication-enabled replication is similar to de-duplication-enabled data stores.

Once two images of a backup data store are created, all that's required to keep the replica or target identical to the source is the periodic copying and movement of the new data segments added during each backup event, along with its metadata image, or namespace.

The replication process begins by copying all the data segments in one share or portion of a source appliance to an equivalent share or portion in a second, target appliance. Although this initial transfer can occur over a network, data volumes often make it more practical to temporarily co-locate the source and target devices to synchronize the datasets, or to transfer the initial datasets using tape.

Comments
Hello, harish 13,

I am currently evaluating Symantec Backup Exec System Recovery 2010 and I want to know is there any de-duplication included in BESR at any of the layers You described in Your article (quite usefulSmiley Happy ?
Great article...quick question though:

When de-duplication tasks are moved to the client server, what sort of performance overhead does this add?

I cannot wait for BEWS 2010 to get released so that I can enable this on our file servers on the 34 sites I manage. It's going to save a lot of time during backups, a lot of cash for buying new tapes...now if I could only get that enabled on the file servers themselves so that I can free up space =)
Please advive , how can i implement deduplication in our Netbackup Environment.
My current seup is :-
1. One Netbackup master server as well as media server 6.5.4 (W2K3)
2. 35-40 clients (all virtual machines)(windows and linux).
3. Total data size is 2TB. and policy for rentation period is 120 days. 
4. Using MSL 4048 tape library and VLS 9000.


Please advice .

Many thanks.

Please advice ,

HP VLS 9000 represent itself as a tape library and tapes to Netbackup.
So, how puredisk will able to recognize VLS9000 as a disk.

Thanks,

Ask your vendor if they can provide the deduplication option. It should be possible

Deduplication of Backup Exec 2010 is very good feture.

Reference:

https://www-secure.symantec.com/connect/articles/deduplication-option-and-unified-archiving-option-s...

Hope you find it informative as well.

 

Angel

You have a couple of options for deduplication with NetBackup.

If you wish to stick with NetBackup 6.5 (I highly recommend upgrading to 6.5.6), consider a separate PureDisk environment or a NetBackup 5000 series Appliance, and use the PureDisk Deduplication Option (PDDO).

Or you can upgrade your NetBackup 6.5 environment to version 7 and use the Media Server Deduplication Option (MSDO).