Forum Discussion

Sylvain_S's avatar
Sylvain_S
Level 2
6 years ago

MSDP: Variable Length Deduplication

Hello all,

It seems that there is no NBU stream handler for HNAS backups (Hitachi filers)... the deduplication rate is very poor, that's why I'm thinking about implementing Variable Length Deduplication (VLD, introduced in NBU 8.1.1) for that kind of data only.

Is there any theoratical or practical recommendations regarding the segment size range (minimum segment size and maximum segment size) to get a good deduplication rate for HNAS backups? Any information from the real life is welcome :-)

Thanks in advance,

Sylvain

  • Hello Sylvain,

    VLD use more CPU cycle to compute a segment boundary hence it take more time than a fixed-length deduplication. Ensure you enable it only on powerful/resourceful server or in-case of NetBackup appliances model 5240 or newer.

    Practical value will depend on size of files present on the backup share. You will have to do trail and error to find best value. Without VLD you are using segment size of 128 KB and not getting good dedupe ratio. First you should try with 64 KB and then 256KB to find out if you need to go down to go up (related to 128KB).

    Theoratical values for segment range:

    VLD_MIN_SEGKSIZE - The minimum size of the data segment for variable-length deduplication in KB. The segment size must be in multiples of 4 and fall in between 4 to 16384 KB. The default value is 64 KB. Value must be lesser than VLD_MAX_SEGKSIZE. Different NetBackup clients can have different segment sizes.

    VLD_MAX_SEGKSIZE - The maximum size of the data segment for variable-length deduplication in KB. It is used to set a boundary for the data segments. The segment size must be in multiples of 4 and fall in between 4 to 16384 KB. The default value is 128 KB.

    By default VLD is disabled. To give customers flexibility, VLD can be enabled by given policy name or given client name in pd.conf.

    By default, the VLD_CLIENT_NAME parameter is not present in the pd.conf configuration file. Specifies the name of the backup policy to enable variable-length deduplication.
    VLD_CLIENT_NAME = clientname
    You can add a maximum of 50 VLD_CLIENT_NAME items in the pd.conf file.

    By default, the VLD_POLICY_NAME parameter is not present in the pd.conf configuration file. Specifies the name of the backup policy to enable variable-length deduplication.
    VLD_POLICY_NAME = policyname
    You can add a maximum of 50 VLD_POLICY_NAME items in the pd.conf file.

     

  • Hello Sylvain,

    VLD use more CPU cycle to compute a segment boundary hence it take more time than a fixed-length deduplication. Ensure you enable it only on powerful/resourceful server or in-case of NetBackup appliances model 5240 or newer.

    Practical value will depend on size of files present on the backup share. You will have to do trail and error to find best value. Without VLD you are using segment size of 128 KB and not getting good dedupe ratio. First you should try with 64 KB and then 256KB to find out if you need to go down to go up (related to 128KB).

    Theoratical values for segment range:

    VLD_MIN_SEGKSIZE - The minimum size of the data segment for variable-length deduplication in KB. The segment size must be in multiples of 4 and fall in between 4 to 16384 KB. The default value is 64 KB. Value must be lesser than VLD_MAX_SEGKSIZE. Different NetBackup clients can have different segment sizes.

    VLD_MAX_SEGKSIZE - The maximum size of the data segment for variable-length deduplication in KB. It is used to set a boundary for the data segments. The segment size must be in multiples of 4 and fall in between 4 to 16384 KB. The default value is 128 KB.

    By default VLD is disabled. To give customers flexibility, VLD can be enabled by given policy name or given client name in pd.conf.

    By default, the VLD_CLIENT_NAME parameter is not present in the pd.conf configuration file. Specifies the name of the backup policy to enable variable-length deduplication.
    VLD_CLIENT_NAME = clientname
    You can add a maximum of 50 VLD_CLIENT_NAME items in the pd.conf file.

    By default, the VLD_POLICY_NAME parameter is not present in the pd.conf configuration file. Specifies the name of the backup policy to enable variable-length deduplication.
    VLD_POLICY_NAME = policyname
    You can add a maximum of 50 VLD_POLICY_NAME items in the pd.conf file.