cancel
Showing results for 
Search instead for 
Did you mean: 

How to avoid disk fragmentation with deduplication?

tw71
Level 3
Partner

Is there a way to defrag / optimize the backup exec deduplication folder?

6 REPLIES 6

Kiran_Bandi
Level 6
Partner Accredited

You can run defragmentation on deduplication folder. 

Refer http://www.symantec.com/docs/TECH11562

Regards......

CraigV
Moderator
Moderator
Partner    VIP    Accredited

...unfortunately there is no way to avoid fragmentation on a HDD...you can take maintenance steps like defragging the drive once you notice a drop in performance...

fredouille
Level 3
Partner Accredited

Hi,

for best security I'd stop BE services and dedupe engine, then run defrag, and then restart all BE & dedupe services.

Best regards.

Fred.

Simon_B_
Level 6
Partner Accredited

Actually it does not really make much sense to defrag the deduplication folder. The reason is: The dedup folder grows further as more space is needed. But it does not shrink again when old backups are deleted because this data is only freed up inside the deduplication container files (*.bin IIRC). That is pretty much the same as with SQL Server logs: They do not shrink even if free space is available within them.

As consequence out of that you do not have the constant changes you have with a B2D folder (write files, delete some in between, write bigger ones fragmented in the free areas). Therefore your Dedupe Volume should not show high fragmentation (as long as you do not store any other data in that volume which I strongly advise against).

The dedup folder itself got a maintenance schedule every 12h to delete unused blocks (this can be seen as optimization) but beyond that no optimization is possible and no defrag should be necessary.

tw71
Level 3
Partner

Our dedupe volume shows over 1 million total fragements (defraggler tool). This seems to be a big problem for the backup performance, because our backup time dramatically increase. For example:

backing up a linux file server with 1.3 TB of data takes about 9 hours to the dedupe store

4 weeks later (weekly full)

backing up the same server = 18 hours to the same dedupe store

Any ideas why the backup time doubles?

Simon_B_
Level 6
Partner Accredited

4 weeks later you deduplication database has grown in comparison to the time you've started using dedup. So when a chunk is hased it might take longer to look up the database if it is already present in the storage.

Do you have the deduplication DB on the same physical disks as the storage? That might decrease performance when the DB grows larger.

During the backup you should check the CPU usage on the BE Server (be advised: dedup only uses one core!) and the disk queue length of the volume of the storage and the database. You should be able to spot the bottleneck. 

1 Million total fragments sounds large, but you should also check how many files are present and what are the highest fragmentation count for one item (you could check with JkDefrag -a 1 for example.

Also keep in mind: Fragmentation is the problem where the disk drive can not read/write a file sequentially because larger files are written into smaller gaps of free space in the filesystem. No matter how much you defrag your dedup volume: You will always have the situation that the drive has to perform many seeks to read/write data and can not perform standard sequential reading/writing as for B2D! This is because even when writing one single sequential file to the dedup folder the folder itself results in writing data scattered all over the DB and Storage volume, because parts of the file could already be known (e.g. have to be referenced in the DB at a certain location), or new data is written to the dedup container files (at locations where space is available inside these containers [which again is no sequential operation]).

What I am trying to explain: Even if you defrag your dedup folder on a regular basis you always will have the situation that there are no sequential write operations to it, but only changes in certain (different) places where the HDD heads have to seek to (which will take some ms and thus increase the backup duration). If you start over with a new dedup folder almost all data can be written sequentially (as it is new , unique content) which of course will work faster against the above described mixture of small changes at certain locations).

Hope this wall of text is understandable.

However as pointed out before you can give it a try by defragging it, but you should be sure to stop all Backup exec services (and also the BE Pure Disk services and the PostgreSQL Database which is not automatically stopped via the BE Services Manager) and observe the results.

In case you should do so please report the results as the situation described above is mainly theory ;)