Forum Discussion

rshehab's avatar
rshehab
Level 3
8 years ago

Netbackup DFSR backup very slow

Hello, we recently deployed netback 7.7.3 on windows 2012. we have a Data domain storage. Everything is configured by following the veritas and emc guides.

eveything is superb except when it comes to DFSR folders. After some reading we understood "Use Journal" is not supported with dfsr. 

we understood we should add the backup select like 

Shadow Copy Components:\User Data\Distributed File System Replication\DfsrReplicatedFolders\LosAngeles
Shadow Copy Components:\User Data\Distributed File System Replication\DfsrReplicatedFolders\NewYork
Shadow Copy Components:\User Data\Distributed File System Replication\DfsrReplicatedFolders\Denver

we have a 2 TB of data and the backup durations of full and incrimental backup are not ecceptable. I read a lot of posts from people complaining from the same issue when backing up the shadow copy components bot none was helpful.

i think i missed something. your help will be much appreciated. 

let me know if i should provide more info. 

38 Replies

  • What we found helpfull about DFS-R backup.

    • No more than two concurrent backup stream in the DFS-R areas
    • Ensure backup of local disks via VSS doesn't happen when backup of DFS-R run.

    Best Regards

    Nicolai

  • As DSFR now functions with a VSS snapshot, you will be limited by how fast you read the snapshot(s) which in turn depends on how much activity there is on the original volume(s). One thing to be aware of is that VSS is pretty heavy on the virtual memory, so a bigger pagefil can help.

    Playing with the buffer and raw buffer size might give some improvement.

    There also might be a setting in windows which tells it not use VSS for backup/restore of DSFR

    And you can consider to stop the DSFR service before the backup and starting after, like you did in the earlier days.

  • Please tell us what 'not acceptable' amounts to.
    What kind of transfer rates are you seeing?

    Are you running backups as a single stream or multiple simultaneous streams?

    What type of network connectivity do you have between Fileserver -> media server -> DD?

    When last was fragmentation checked on the filesystems?

    Do you have Accelerator selected in the policy? 
    Although Change Journal is not supported, Accelerator is supported.

    • rshehab's avatar
      rshehab
      Level 3

      hello, thank you for replying.

      check the image attached. i selected few folders around 100 GB only. see the time taken to complete. i am sure it is not a cummunication issue as i have other non-dfsr jobs running smoothly. 

      in addition, we still have the old backup software from Arcserve running. backup job of dfsr folders were taking less time but without "Shadow Copy". 

      Total Directories............ 549,174
      Total File(s)................ 13,425,612
      Total Skip(s)................ 0
      Total Size (Disk)............ 2.16 TB
      Total Size (Media)........... 2.21 TB
      Elapsed Time................. 21h 1m 34s
      Average Throughput........... 1.79 GB/min

       

      please help as we need to shutdown the old backup software and continue with netbackup.

       

      thanks. 

      • sdo's avatar
        sdo
        Moderator

        Ok - we're definitely not comparing apples here.

        You are comparing:

        a) ArcServe plain file backup, not SCC: based, unknown whether VSS based, reading unconfirmed client, using unconfirmed media server, sending to unconfirmed storage, selecting unconfirmed client source paths

        ...comparing with:

        b) NetBackup SCC: based DFSR backup, will be using VSS, reading unconfrmed storage, using a different media server, sending to DD storage, selecting different source paths?

        ...so I think we are going to drop any notion of debugging a slow backup by comparing unknown situations between two different applications and configurations.

        .

        Now then, let's get back to the png screen shot that you posted.  Yes, this looks really slow:

        elapsed 17:09:24 time
        elapsed 61,764.0 seconds
        total 106,744,880,128.0 bytes
        sent 54,760,727,552.0 bytes
        names 835,916.0 names
        sent 53,477,273.0 KB
        sent 52,223.9 MB
        sent 51.0 GB
        total 104,243,047.0 KB
        total 101,799.9 MB
        total 99.4 GB
        total - disk read 1.6 MB/s
        sent - LAN speed 0.8 MB/s
        names throughput 13.5 names/second
        waited for full buffer 44,537.0 times
        delayed count 3,861,463.0 times
        parent delay 1.5 ms
        parent delay 0.0015 seconds
        delayed seconds 5792.1945 seconds
        delayed time 01:36:32 hh:mm:ss

        .

        What really strikes me about this is not only the very slow disk read of 1.6 MB/s (and FYI accelerator reduces this to 0.8 MB/s LAN send speed)... but it is the "bptm delayed count" which is most striking which sums to a total delay of about 1.5 hours in a 17 hour backup job. i.e. the NetBackup media server has only lost 1.5 hours out of the 17 hours.... which means that the NetBackup Media Server is probably struggling.

        The backup job ID number 1401 is quite a low number which is indicative of either a new build NetBackup environment, or a test environment.

        So, is this a new build production environment, or a virtualised test environment?

        Was this backup job being sent over a WAN link?

      • Marianne's avatar
        Marianne
        Level 6

         mad_jock

        Following that TN is good, but seeing you found this discussion, can I ask if you have read through the entire discussion from 'page 1'?

        There have been lots of suggestions but rshehab eventually stopped responding.

  • Some simple calcs, show that (when based on size only) to take a 2TB full backup in say 12 hours (from say 20:00 Saturday night to Sunday 08:00), then we need to achieve an average minimum sustained throughput of 49 MB/s:

    size 2 TB
    size 2,048 GB
    size 2,097,152 MB
    time 12 hours
    time 43,200 seconds
    speed 49 MB/s

     

    I'd be interested to know how many total leaf objects across the namespaces that make up that 2TB?  i.e. how many folders, plus how many files?

  • right. so i didn't read the rest of the thread. how have you configured your backup policy? do you use a dedicated NIC for backup? do you use multiple streams? where do you do duplication (server or client)? how long do you keep the backup image on disk?

    some questions may not be relevant but those are the questions i used as guidelines to help me achieve acceptable backup time (2.48 hours for 1.8TB).

    hth.