Forum Discussion

jmarques68's avatar
jmarques68
Level 3
11 years ago

Host requirements for client-side-deduplication

I don’t know if someone can help me on getting at least an approximate idea of the CPU/RAM requirements for the clients that use client-side-dedup in an NetBackup environment, do you have any metrics...
  • Jaime_Vazquez's avatar
    11 years ago

    Having just had a small class that somewhat covered this, this is what I could infer:

    For the client: At least a dual core 64 bit processor >= 2.5 GHz. Naturally, more is better.

    RAM depends on the backup selection size. There is supposed to be documentation concerning the requirements for this.I would think a minimum baseline is 4 GB, but again, more is better.

    During the client side dedupe process lots of shared memory is used on the client to hold the SO's associated with the backup being performed. The larger the backup to be done, the more memory needed to cache the SO entries of its own existing backups. It is also fairly compute intensive as it scans the files, creates the associated SO fingerprints, and checks the SO 'fingerprints' it creates from the PO selections it is running against the cached memory SO entries it is holding. The amount of RAM needed increases as the number of SO entries grows.

    The mechanism of client-side deduplication is a means by which each client can 'filter out' those SO entries that it knows about that are already in existence on the storage server. Instead of sending out all of its generated SO fingerprints to the server, it only sends out what it see as new SO fingerprint entries to it.  The subset of entries is then checked out in the cached memory of the server itself to see if it already holds the same SO entries, due to backups for other clients. 

    Client side dedupe has no effect on the needed RAM requirements of the Server, as it must always hold all of the active SO entries for all of its PO/DO entries. Client side dedupe reduces the amount of data transferred from client to server and reduces the workload needed to process that data on the Server. I would think it a hard metric to determine how much work is being offloaded from the server back to the client for this. .As I would say - "Your mileage will vary, subject to change without notice, void where prohibited, must be over 18 to play".  8-)

     

    SO – Segment Object

    PO – Path Object

    DO – Data Object