cancel
Showing results for 
Search instead for 
Did you mean: 

DFSR backup error when using deduplication client side

Hamza_H
Moderator
Moderator
   VIP   

Hello,

We are facing a big problem backuping DFSR directories + E:\  (almost 15TB) when the Deduplication client-side is activated.

the error is: Error bptm (pid=6756) image copy failed: error 2060056: Stream Handler error, 

error code status is 14,13 or 40.. but the same error inside the detailled status.

When we disable client-side deduplication, the backup is successful but it takes so long (17 hours).

The throughput is fine.

The client server has 20 Core and 128 of RAM (so I guess it is plenty enough for the deduplication).

We tried to minimize the usage of MEMORY cache on pd.conf by changing it from : 

FP_CACHE_MAX_MBSIZE: 20 (MB)

to

FP_CACHE_MAX_MBSIZE : 4000 (MB)

But this didn't change anything, same error.

Can you please advise?

 

PS: Accelerator is also activated and allow multiple data stream also ( and backup selection has only 2 streams).

Version of netbackup on Master, media and client : 8.2.

Os on client : Windows server 2019.

for tags, I didn't find 8.2 so I put 8.1.2 

Thanks,

BR.

13 REPLIES 13

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Personally I've never seen a DFSR backup perform well. As soon as it goes down the "Shadow copy components" route it's just slow, doesn't matter what you do.

Don't know if other have better experiences.

Hamza_H
Moderator
Moderator
   VIP   

Hello @RiaanBadenhorst , thanks for your reply,

That's what I thought too :( I hope others can help..

sdo
Moderator
Moderator
Partner    VIP    Certified

My DFSR backups quite often tend to be quite slow too.

You have 15TB on DFSR on Windows 2019, but there are still some limits, like 70M files :

https://docs.microsoft.com/en-us/windows-server/storage/dfs-replication/dfsr-faq

...and other postings around the web note various performance requirements:

https://www.theregister.co.uk/2010/09/28/sysadmin_ntsf_file_limits/

https://techcommunity.microsoft.com/t5/Storage-at-Microsoft/Understanding-DFS-Replication-quot-limit...

.

Are your two backup streams concurrent?  Maybe make the two streams sequential, because I'm not sure how VSS reacts when two concurrent backup jobs both try to establish shadows at almost the same time.

Hamza_H
Moderator
Moderator
   VIP   

Hello @sdo ,

Thank you for your reply,

I have took a look on the links that you shared, it sounds interesting, I will send it to our customer to take a look.

however, for your question : 

Are your two backup streams concurrent?  Maybe make the two streams sequential, because I'm not sure how VSS reacts when two concurrent backup jobs both try to establish shadows at almost the same time.

the backup selection is configured as below :

NEW_STREAM:
E:\
NEW_STREAM
Shadow Copy Components:\User Data\Distributed File System Replication\DfsrReplicatedFolders\Applications

Shadow Copy Components:\User Data\Distributed File System Replication\DfsrReplicatedFolders\Data

Shadow Copy Components:\User Data\Distributed File System Replication\DfsrReplicatedFolders\Data {84…)

Etc..

so even when the streams are running concurrently, I don't think that would be a problem? right? or am I wrong? besides, it is successful when the deduplication client-side is disabled..

sdo
Moderator
Moderator
Partner    VIP    Certified

Ok, so the first stream is an NTFS volume, and the second stream is multiple (at least three) DFSR areas.

Are any of the DFSR areas in that policy also on the same E: volume?

And, you didn't actually show, or state clearly, whether the two streams really do run concurrently.  Do they?

Is yes, then you still have VSS attempting to shadow twice... which might not be a good thing.

If yes, then is there any specific reason why you need to run both streams concurrently ?

.

This link states that multiple concurrent DFSR backups cannot run at the same time:

https://www.veritas.com/content/support/en_US/article.100010224

...but I have not been able to find any documentation (from Microsoft or Veritas) that explicitly states whether it is supported, or is not supported, to also backup the hosting NTFS volume at the same time as DFSR areas hosted on the same NTFS volume.  Personally, IMO, I doubt it, because both involve calls to VSS.

sdo
Moderator
Moderator
Partner    VIP    Certified

ok - it seems that multiple VSS shadows at the same time is quite normal:

https://serverfault.com/questions/289511/how-many-services-can-make-use-of-windows-vss-concurrently

Is the E: drive used for anything else?  I mean, for example, the NetBackup Client itself is not installed on E: is it ?

If the E: volume is not used for anything else, then why is it being backed-up ?

.

Maybe the underlying disk is just too busy when two streams are running concurrently and so maybe client-side dedupe struggles to keep the data flowing to keep the conversation alive.  Have you checked disk IO queue length, and disk IO response/latency times?  Maybe you could increase the client read timeout.  IME, it is quite normal for NetBackup admins to increase from the default of 300 seconds to 600 and perhaps to 900 seconds - but I would be a bit wary of going beyond that, because it is a global-ish (to server) type setting:

https://www.veritas.com/content/support/en_US/doc/18716246-126559472-0/v40569182-126559472

Hamza_H
Moderator
Moderator
   VIP   

Thanks @sdo for all your replies,

So what if we change the backup selection by removing the E:\ partition ( I have to ask the customer if NBU is installed on this partition or not..) .. and keep only DFSR directories, so in this case, there will be no multipls VSS requests , only for DFSR content.. what do you think?

 

Hamza_H
Moderator
Moderator
   VIP   

@sdo , please find below my answers :

 

Ok, so the first stream is an NTFS volume, and the second stream is multiple (at least three) DFSR areas.

> the E:\ partition is on a stream and the rest of DFSR content on another stream, so we have Parent job + 2 streams = 3jobs

Are any of the DFSR areas in that policy also on the same E: volume?

And, you didn't actually show, or state clearly, whether the two streams really do run concurrently. Do they?

> Yes there are running concurrently

If yes, then is there any specific reason why you need to run both streams concurrently ?

>I don't think so, I have just asked the customer to create a new policy for the E:\ volume and make sure that the execution time won't overlap the DFSR policy's schedule.

Is the E: drive used for anything else? I mean, for example, the NetBackup Client itself is not installed on E: is it ?

If the E: volume is not used for anything else, then why is it being backed-up ?

>> I have just asked the customer for this, I will give you a feedback asap

sdo
Moderator
Moderator
Partner    VIP    Certified

I think you may have missed this question too - maybe...  :\

Are any of the DFSR areas in that policy also on the same E: volume?

Hamza_H
Moderator
Moderator
   VIP   

Hello @sdo , thanks again for your reply,

Sorry, I forgot to answer that when I was typing. I have asked the EC, I'm waiting for his reply to confirm.

So, what do you think about removing the E:\ partition from the backup selection and keep only DFSR content ?

 

sdo
Moderator
Moderator
Partner    VIP    Certified

I would avoid backing up E: if there was genuinely no need for it, but a new potential problem arises in that if someone takes one of the DFSR areas out of DFSR then it will no longer be protected by the the DFSR backup.  So here's what I would do... the DFSR backups are more important than the "just in case" E: volume backup which is usually technically empty, so I would change the order, i.e. make the E: stream the second stream, and then change the policy to max jobs to 1... but I am assuming that there is only one client in that backup policy - but then what happens to user performance when accessing share/folder/file resources within the DFSR areas if a VSS based NetBackup backup of E: is still active when the user day starts.

Hamza_H
Moderator
Moderator
   VIP   

Hi @sdo ,

 

The EC confirmed that there is no DFSR content in E:\ and the nbu is installed on C:\ partition.

He did a backup test only with DFSR contents and same error  (i attached the detailled status)

for the error :

Oct 30, 2019 5:28:25 PM - Error bpbrm (pid=3060) from client fs19-ch-1: ERR - Unable to backup System State or Shadow Copy. Please check the state of VSS and associated Writers.
Oct 30, 2019 7:34:28 PM - Error bpbrm (pid=3060) from client fs19-ch-1: ERR - Unable to backup System State or Shadow Copy. Please check the state of VSS and associated Writers.dir - 3 128 0 5 190 -1 92 16832 root root 0 1572460468 1572460468 1572460468 1 /Shadow Copy Components/User Data/Distributed File System Replication/DfsrReplicatedFolders/

I asked him to check if the WMI writer is enabled and running also asked for event logs (sys & app) + bpfis & bpbkar logs.

what do you think?

 

thanks

 

sdo
Moderator
Moderator
Partner    VIP    Certified

So it shouldn't matter then that the E: stream and the DFSR stream run concurrently because they are unrelated resources - any more than running two streams on any backup client might cause disk IO or compute resource contention, anyway...

...as for the nature of the problem, seems like Microsoft's own VSS is the root cause of that particular backup failure, i.e. VSS and DFSR are in disagreement, and it's not really a NetBackup Client issue.

I couldn't really read your previous attachment, it looked like it might have been a text file inside the zip file, but then again it looked a bit binary too, and I'm ever so sorry but I'm not really in a position to pour over someone else's detailed logs.  So, might be best if you open a proper support case with Veritas - sorry.