Forum Discussion

Kenneth_Hansen's avatar
14 years ago

Duplication from DeDup server very poor performace

Hi,

NBU version 7.0 on windows 2008 R2.
Duplicating using Vault from DeDup server to FC LTO4  drives.

We are seeing speed less than 30Mb/s from disk to tape.
From client to DeDup server we see much greater speeds, up to 70MV/s ( Dont know it this is a limit form the client which is a file server )
NUMBER_DATA_BUFFERS and SIZE_DATA_BUFFERS is set for the LTO 4 drives.

Is there any way to enhance duplication performance from the DeDup server to tape?
NUMBER_DATA_BUFFERS_DISK? 

Thank you :)

  • Obviously disk I/O and network bandwidth are major factors in the overall performance, but I seldom see re-hydration to tape perform better than 30-40MB/s regardless of CPU/memory/disk/net. White papers tell you a different story, but reality is different with real load on the systems.

    There isn't much of tuning in PDDE/PDDO as such, apart from optimizing the multipathing I/O policies, using correct block sizes for file system, and so on. We could also look at thread pools and block sizes, and number of buffers in the OST plugin interface.

    When you re-hydrate to tape, have a look in the bptm log. If you find the bptm process writing to tape has a huge amount of delayed wait (for full buffers), then you know that the sending bpdm/bptm is not being able to retrieve data fast enough from PDDE/PDDO. Look at the I/O performance on the file system on the PDDE/PDDO together with CPU performance. If the spoold process is not doing a lot, then the disk system can't deliver.

    Also, as we talk about delays, it is important to understand the underlying tape technology. Use tape drives that can better handle slow stream speeds. By not having to wait for the tape drive to re-position, you can gain substantial increase in write performance.

    Buffer size and number tuning: In contradiction to most beliefs, don't use huge sizes and many buffers for disk. Rather use say 24-32 buffers, with sizes of 64k to 256k. I find if I use NUMBER_DATA_BUFFERS = 32, and SIZE_DATA_BUFFERS = 262144, that I on average have 50-60MB/s re-hydration with ~50% of the dup images perform in the ~70-120MB/s range (using LTO4 or TS1130 drives).

    If we look at tuning the buffers for lower values, than we can hit bad performance for B2T jobs. So it is important to find a balance that suits your needs... In some environments (Windows) you may actually see best performance without any type of tuning...

    But, in the end, in my opinion, re-hydration performance is slow due a design not geared for anything but backup and optimized duplication to another PDDE/PDDO destination. I guess the solution would be to stop duplicating to tape, as the product can't do it properly...

    /A

  • Performance depends on several area's;

    - What is the underlying storage of your dedup disk (iscsi, JBOT, SAN, etc)?

    - How did you format the dedup disk?

    - How many streams do you allow towards the dedup area?

    - Were there backups running during your vault jobs?

    - Have you performed monitoring of you dedup disk, such as Avg, sec/read, Avg read Queue time, etc?

    - Have you run a defrag report on the dedup partition?

    You should determine where your bottleneck is located, before you can try to improve it. And remember; Backups towards dedup area will almost always perform better than restore/duplication, because less data needs to be physically transported (due to dedup).

  • As a hint whether the issue is the tape subsystem. Try doing duplication to disk. If performance still isn't satisfying you need to look at the source disk system.

  • MarcoV@NL:

    There are 5 DeDuplication server set up in our environment. And I find that there is a mix of storage system used for our deduplication disk. There is a mix of SAS and SATA disk sub system all over FC. I've checked fragmentation and the worst I can fint i 2% defragmented.

    I have not done any I/O check but can do so just to see.

     

    Nicolai:

    I might be able to test that. But still others must have had some of the same issues that I'm seeing. And that have found some way to tweek the performance.

  • Any dedupe operation rely heavily on small disk I/O. Try a disk performance tool like IOmeter. 

    General I/O issues could be array alignment and multiple LUN's on same physical disks.

  • Obviously disk I/O and network bandwidth are major factors in the overall performance, but I seldom see re-hydration to tape perform better than 30-40MB/s regardless of CPU/memory/disk/net. White papers tell you a different story, but reality is different with real load on the systems.

    There isn't much of tuning in PDDE/PDDO as such, apart from optimizing the multipathing I/O policies, using correct block sizes for file system, and so on. We could also look at thread pools and block sizes, and number of buffers in the OST plugin interface.

    When you re-hydrate to tape, have a look in the bptm log. If you find the bptm process writing to tape has a huge amount of delayed wait (for full buffers), then you know that the sending bpdm/bptm is not being able to retrieve data fast enough from PDDE/PDDO. Look at the I/O performance on the file system on the PDDE/PDDO together with CPU performance. If the spoold process is not doing a lot, then the disk system can't deliver.

    Also, as we talk about delays, it is important to understand the underlying tape technology. Use tape drives that can better handle slow stream speeds. By not having to wait for the tape drive to re-position, you can gain substantial increase in write performance.

    Buffer size and number tuning: In contradiction to most beliefs, don't use huge sizes and many buffers for disk. Rather use say 24-32 buffers, with sizes of 64k to 256k. I find if I use NUMBER_DATA_BUFFERS = 32, and SIZE_DATA_BUFFERS = 262144, that I on average have 50-60MB/s re-hydration with ~50% of the dup images perform in the ~70-120MB/s range (using LTO4 or TS1130 drives).

    If we look at tuning the buffers for lower values, than we can hit bad performance for B2T jobs. So it is important to find a balance that suits your needs... In some environments (Windows) you may actually see best performance without any type of tuning...

    But, in the end, in my opinion, re-hydration performance is slow due a design not geared for anything but backup and optimized duplication to another PDDE/PDDO destination. I guess the solution would be to stop duplicating to tape, as the product can't do it properly...

    /A