cancel
Showing results for 
Search instead for 
Did you mean: 

File server windows 2019 backup slowly

Verneti_Berny
Level 6

Hi all, I have a problem with file server backup, since we upgrade the OS of FS to Windows 2019, sometime it backup stay very slowly.

The first time it occurs I fix it restarting my file server, after one month, the problem return.

Only Inc backup occur this, the full backups are ok.

I have a policie with many clients and these other working fine, only file server has problem.

The normal backup of this client take 3 hours in normally conditions, but now it takes 10 hours and the speed from 52MB/s past to 20MB/s.

I don´t know whats happen, nothing was did in file server or master and media server.

Can anybody help me?

 

My configuration is:

Master Server:

Windows Server 2012 R2 (Virtual Machine)

Two media server MSDP (phisical machines), we have two sites.

My File server client is Windows Server 2019 (Virtual Machine)

Our virtual environment is VmWare

Attached I post two job log one with normally condition and the other with the problem.

 

1 ACCEPTED SOLUTION

Accepted Solutions

Yes a red herring - the bpbkar process in 8.x is called bpbkar32.exe - and there is only a 64 bit binary available for Windows now.

What I'd be looking at is enabling Windows change journalling to avoid the costly directory walk in the first place. For the client in "Host properties -> Clients -> Windows Client -> Client Settings":  enable "Use Change Journal". 
I'd also be enabling aceletator in the policy (it doesn't appear to be set)

This should remove the directory walk time which is where your backup is showing slowness. 

Another option would be to use a Flashbackup-Windows policy for the E: drive (and exclude E: from original policy) as that seems to have a very large number of files for which that policy type is designed. 

View solution in original post

22 REPLIES 22

sdo
Moderator
Moderator
Partner    VIP    Certified

To me this looks like problem in the client OS.  In this summary we see very similar delay and wait counts for NetBackup buffers which makes me think that the problem is not with NetBackup Server:

was good
24/02/2020 02:30:16 - Info bptm (pid=952) using 262144 data buffer size
24/02/2020 02:30:16 - Info bptm (pid=952) setting receive network buffer to 1049600 bytes
24/02/2020 02:30:16 - Info bptm (pid=952) using 30 data buffers
24/02/2020 02:30:16 - Info scsnb3 (pid=952) Using OpenStorage client direct to backup from client uscfs1 to scsnb3
24/02/2020 05:48:55 - Info bpbkar32 (pid=10524) bpbkar waited 13852 times for empty buffer, delayed 23719 times.
24/02/2020 05:49:00 - Info scsnb3 (pid=952) StorageServer=PureDisk:scsnb3; Report=PDDO Stats (multi-threaded stream used) for (scsnb3): scanned: 745085458 KB, CR sent: 1954275 KB, CR sent over FC: 0 KB, dedup: 99.7%, cache disabled
24/02/2020 05:49:05 - end writing; write time: 3:18:47


now slow
02/03/2020 02:30:17 - Info bptm (pid=244) using 262144 data buffer size
02/03/2020 02:30:17 - Info bptm (pid=244) setting receive network buffer to 1049600 bytes
02/03/2020 02:30:17 - Info bptm (pid=244) using 30 data buffers
02/03/2020 02:30:18 - Info scsnb3 (pid=244) Using OpenStorage client direct to backup from client uscfs1 to scsnb3
02/03/2020 12:19:23 - Info bpbkar32 (pid=8148) bpbkar waited 12443 times for empty buffer, delayed 29060 times.
02/03/2020 12:19:37 - Info scsnb3 (pid=244) StorageServer=PureDisk:scsnb3; Report=PDDO Stats (multi-threaded stream used) for (scsnb3): scanned: 748291295 KB, CR sent: 1533437 KB, CR sent over FC: 0 KB, dedup: 99.8%, cache disabled
02/03/2020 12:19:43 - end writing; write time: 9:49:24

.

My guess is that it is the file-system walk of the client OS that is slow.  Something is slugging the client.  But determining the root cause for this is going to be quite difficult.

One idea that comes to mind would be to collect level 3 (I think it is level 3) bpbkar logging which reveals the file name of each file as the file system is walked.  But manually visually spotting any repeated minor slight delay and increase in length of time across millions of files will be near impossible.  So, one answer might be to write a script / tool to process the bpbkar log to analyse the time gap between each file detected (i.e. count, sum, max, min, average, standard deviation, etc...) and see if this clearly points to a delay in walking the file-system.  If so, then open a case with Microsoft.

Another idea might be to run a bpbkar to null every day and thus definitely prove that the delay has nothing at all to do with LAN (to NetBackup Server) nor with NetBackup storage units.  And again when you see clear slowness from one day to the next, open a case with Microsoft.

Another idea might be to try to see if the problem is definitely at least in part related to purely folder walking.  Write a script (which could a PowerShell one-liner) which only walks the folder structure, i.e. do not open any files, and see if that demonstrates sudden slowness from one day to the next.

Just curious as to the nature and config of the storage that underlies the file-share.  Could you describe?  Is it a DAS, or NAS, or SAN (iSCSI or FC?) ?

sdo
Moderator
Moderator
Partner    VIP    Certified

Another thing that you can do is to logon,and leave it logged on, an RDP session to the file-server, and re-logon once a day to keep the RDP session alive, and inside this session use the Windows "perfmon" tool to monitor IO to the logical disk(s) hosting the file-server services/shares.  Set the counter to 120 seconds in 86400 seconds (i.e. every two minutes for 24 hours), and monitor counters like:

- operations : reads/sec, writes/sec

- latency : sec/read, sec/write

- payload : avg read size, avg write size

- throughput : read MB/s, write MB/s

Thanks sdo, my restart does not fix at this time, I will do your suggestion and let you know here.

Thanks again.

You ask.

Just curious as to the nature and config of the storage that underlies the file-share.  Could you describe?  Is it a DAS, or NAS, or SAN (iSCSI or FC?) ?

We are using a Cisco HyperFlex (Hyperconvergence) as cluster servers with Vmware.

Thanks

Also what is the file system type used for the storage that is exhibiting the slow backup speed?

Systems_Team
Moderator
Moderator
   VIP   

This may be a complete red herring, but I though I'd throw it out there as this is a bit of a weird case.  In both of the job details I see this:

  • starting bpbkar32 on client

Is this a 32 bit client on a 64-bit OS?  All of my Windows clients (all 64 bit) show "starting bpbkar", not bpbkar32.

Steve

Yes a red herring - the bpbkar process in 8.x is called bpbkar32.exe - and there is only a 64 bit binary available for Windows now.

What I'd be looking at is enabling Windows change journalling to avoid the costly directory walk in the first place. For the client in "Host properties -> Clients -> Windows Client -> Client Settings":  enable "Use Change Journal". 
I'd also be enabling aceletator in the policy (it doesn't appear to be set)

This should remove the directory walk time which is where your backup is showing slowness. 

Another option would be to use a Flashbackup-Windows policy for the E: drive (and exclude E: from original policy) as that seems to have a very large number of files for which that policy type is designed. 

Our storage file system is NTFS

I don´t know why but all off my job details of all of clients are showing starting bpbkar32, and all of my clients are 64 bits.

I don´t know if this is wrong.

sdo
Moderator
Moderator
Partner    VIP    Certified

Ignore the numbers 32.  It is a 64-bit executable.  The executable program name, also the process name, is a red herring with the 32 on it.  It is a hangover, a legacy from the days when Unix had 64 bit executables, and Windows had 32-bit executables.  Veritas or Symantec needed to separate the code streams, i.e. there were probably two separate pieces of source code for disk readers - but that is all irrelevant now.

So, there is only one disk reader process for modern Windows clients, a 64-bit executable named "bpbkar32".  Ignore the 32 in the name - it's just text, and utterly meaningless, and it might as well be named "bpbkar99".

Hamza_H
Moderator
Moderator
   VIP   

Hello @Verneti_Berny Can you please do as @davidmoline suggested.. it sounds correct.

Good luck,

 

 

I will do it but I need to wait job finalize before, after it I will do it and re-run job.

Thanks

sdo
Moderator
Moderator
Partner    VIP    Certified

Just knocked this up for fun, so here's a simple folder walking script :

Option Explicit

Dim go_fso
Dim gs_script_spec, gs_script_path, gs_script_name, gs_script_title, gs_script_fac, gs_script_engine, gd_script_start, gd_script_end

Dim gl_folders, gl_files, gl_depth, gs_deepest, gl_longest, gs_longest
Dim gd_timings, gd_first, gd_last

  gd_script_start = Now
  Call s_init()
  Call s_main()
  gd_script_end = Now

  Call s_log( "" )
  Call s_log( gs_script_fac & "script started:    " & fs_datetime( gd_script_start ) )
  Call s_log( gs_script_fac & "script finsished:  " & fs_datetime( gd_script_end   ) )
  Call s_log( gs_script_fac & "script duration:   " & Trim( fs_duration( gd_script_start, gd_script_end ) ) & "`..." )
  Call s_log( gs_script_fac & "script exiting..." )

WScript.Quit( 0 )




Sub s_init()
  Const cs_fac = "%s_main, "

  Set go_fso = CreateObject( "Scripting.FileSystemObject" )

  gs_script_spec  = WScript.ScriptFullName
  gs_script_path  = go_fso.GetParentFolderName( gs_script_spec )
  gs_script_name  = go_fso.GetBaseName(         gs_script_spec )
  gs_script_fac	= "%" & gs_script_name & ", "

  gs_script_engine = LCase( go_fso.GetBaseName( WScript.FullName ) )
  Select Case gs_script_engine
  Case "cscript"
  Case "wscript"
    Call s_abort( cs_fac & "can only be run from command line..." )
  Case Else
    Call s_abort( cs_fac & "unknown script engine `" & gs_script_engine & "`..." )
  End Select

  Set gd_timings = CreateObject( "Scripting.Dictionary" )
  Set gd_first   = CreateObject( "Scripting.Dictionary" )
  Set gd_last    = CreateObject( "Scripting.Dictionary" )

  gl_folders = 0
  gl_files   = 0
  gl_depth   = 0
  gl_longest = 0
  gs_deepest = ""
  gs_longest = ""

  If WScript.Arguments.Count <> 1 Then Call s_abort( cs_fac & "specify a folder path to walk..." )
End Sub




Sub s_main()
  Const cs_fac = "%s_main, "
  Dim ls_root, lo_root, ls_hhmm, ll_count, ls_first, ls_last

  ls_root = WScript.Arguments.Item( 0 )
  Call s_log( "" )
  Call s_log( cs_fac & "walking:  " & ls_root )
  Set lo_root = go_fso.GetFolder( ls_root )
  Call s_walk( lo_root, 1 )
  Call s_log ( cs_fac & "...done..." )

  Call s_log( "" )
  Call s_log( cs_fac & "summary..." )
  Call s_log( cs_Fac & "...total folders:  " & gl_folders )
  Call s_log( cs_Fac & "...total files:    " & gl_files   )
  Call s_log( cs_Fac & "...max depth:      " & gl_depth   )
  Call s_log( cs_Fac & "...deepest:        " & gs_deepest )
  Call s_log( cs_Fac & "...longest:        " & gl_longest )
  Call s_log( cs_Fac & "...longest:        " & gs_longest )
  Call s_log( cs_fac & "...done..." )

  Call s_log( "" )
  Call s_log( cs_fac & "counts and locations in folders by time HH:MM..." )
  For Each ls_hhmm In gd_timings.Keys
    ll_count = gd_timings.Item( ls_hhmm )
    ls_first = gd_first.Item(   ls_hhmm )
    ls_last  = gd_last.Item(    ls_hhmm )
    Call s_log( cs_fac & "...time: " & ls_hhmm & "   count: " & ll_count )
    Call s_log( cs_fac & "......first: " & ls_first )
    Call s_log( cs_fac & "......last:  " & ls_last  )
  Next
  Call s_log( cs_fac & "...done..." )
End Sub



 
Sub s_walk( po_folder, pl_depth )
  Const cs_fac = "%s_walk, "
  Dim lo_folders, lo_folder, lo_files, lo_file, lb_accessible, ls_hhmm

  If Left( po_folder.Name, 12 ) = "$RECYCLE.BIN" Then Exit Sub

  If pl_depth > gl_depth Then
    gl_depth   = pl_depth
    gs_deepest = po_folder.Path
  End If

  If Len( po_folder.Path ) > gl_longest Then
    gl_longest = Len( po_folder.Path )
    gs_longest = po_folder.Path
  End If

  Set lo_folders = po_folder.SubFolders

  lb_accessible = True
  On Error Resume Next
  ll_dummy = lo_folders.Count
  If Err.Number = 70 Then lb_accessible = False
  On Error Goto 0

  If Not lb_accessible Then Exit Sub

  gl_folders = gl_folders + 1
  gl_files   = gl_files   + po_folder.Files.Count

  ls_hhmm = fs_hhmm()

  Call s_inc( gd_timings, ls_hhmm )

  If Not gd_first.Exists( ls_hhmm ) Then gd_first.Add ls_hhmm, po_folder.Path
  If     gd_last.Exists(  ls_hhmm ) Then gd_last.Remove ls_hhmm
  gd_last.Add ls_hhmm, po_folder.Path

  If ( ( gl_folders = 1 ) Or ( ( gl_folders Mod 1000 ) = 0 ) ) Then
    Call s_log( cs_fac & "...count: " & gl_folders & "   path: " & po_folder.Path )
  End If

  For Each lo_folder In lo_folders
    Call s_walk( lo_folder, pl_depth + 1 )
  Next

' Set lo_files = po_folder.Files
' For Each lo_file In lo_files
'   ...collect a table of distribution of file sizes?
'   deepest file name?
'   longest file name?
'   oldest file?
'   top 10 largest files?
'   table of counts, and table of sizes - by file type extension name...
' Next
End Sub




Sub s_inc( pd_dic, ps_key )
  Dim ll_value
  If pd_dic.Exists( ps_key ) Then
    ll_value = pd_dic.Item( ps_key ) + 1
    pd_dic.Remove ps_key
  Else
    ll_value = 1
  End If
  pd_dic.Add ps_key, ll_value
End Sub




Sub s_bubble( pd_dic )
  Dim ls_keys, ls_items, ls_key, ls_item, lx_temp, ll_entry, lb_swapped, ll_compares, ll_swaps, ll_count
'...is there actually any work to do...
  ll_count = pd_dic.Count
  If ll_count < 2 Then Exit Sub
'...extract keys and items...
  ReDim ls_keys(  pd_dic.Count )
  ReDim ls_items( pd_dic.Count )
  ll_entry = 0
  For Each ls_key In pd_dic.Keys
    ll_entry = ll_entry + 1
    ls_keys(  ll_entry ) = ls_key
    ls_items( ll_entry ) = pd_dic.Item( ls_key )
  Next
'...sort...
  ll_compares = 0
  ll_swaps    = 0
  For ll_i = 1 To ll_count
    lb_swapped = False
    For ll_j = ll_count To ll_i Step -1
      ll_compares = ll_compares + 1
      If ls_items( ll_j )    > ls_items( ll_j + 1 ) Then
        ls_swap              = ls_items( ll_j )
        ls_items( ll_j     ) = ls_items( ll_j + 1 )
        ls_items( ll_j + 1 ) = ls_swap
        lb_swapped = True
        ll_swaps   = ll_swaps + 1
      End If
    Next
    If Not lb_swapped Then Exit For
  Next  
'...reload keys and items...
  Set pd_dic = Nothing
  Set pd_dic = CreateObject( "Scripting.Dictionary" )
  For ll_entry = 1 To ll_count
    pd_dic.Add ls_key( ll_entry ), ls_item( ll_entry )
  Next
'...cleanup...
  Set ls_keys  = Nothing
  Set ls_items = Nothing
End Sub



Sub s_abort( ps_message )
  Const cs_fac = "%s_abort, "
  Call s_log( ps_message )
  Call s_log( cs_fac & "script aborting..." )
  WScript.Quit( 1 )
End Sub

Sub s_quit( ps_message )
  Const cs_fac = "%s_quit, "
  Call s_log( ps_message )
  Call s_log( cs_fac & "script quitting..." )
  WScript.Quit( 0 )
End Sub

Sub s_log( ps_message )
  Dim ls_message
  ls_message = fs_hhmmss() & "  " & ps_message
  WScript.Echo ls_message
End Sub

Function fs_hhmmss()
  Dim ld_dt
  ld_dt = Now
  fs_hhmmss = fs_zeroes( DatePart( "h", ld_dt ), 2 ) & ":" & fs_zeroes( DatePart( "n", ld_dt ), 2 ) & ":" & fs_zeroes( DatePart( "s", ld_dt ), 2 )
End Function

Function fs_hhmm()
  Dim ld_dt
  ld_dt = Now
  fs_hhmm = fs_zeroes( DatePart( "h", ld_dt ), 2 ) & ":" & fs_zeroes( DatePart( "n", ld_dt ), 2 )
End Function

Function fs_zeroes( pl_number, pl_length )
  Dim ls_result
  ls_result = String( pl_length, "0" ) & CStr( pl_number )
  ls_result = Right( ls_result, pl_length )
  fs_zeroes = ls_result
End Function

Function fs_datetime( pd_datetime )
  Dim ld_datetime, ls_result
  If VarType( pd_datetime ) = vbDate Then
    ld_datetime = pd_datetime
  Else
    ld_datetime = Now
  End If
  ls_result = WeekDayName( WeekDay( ld_datetime ), False, 1 )
  ls_result = ls_result & " " & FormatDateTime( ld_datetime, vbLongdate )
  ls_result = ls_result & " " & FormatDateTime( ld_datetime, vbLongtime )
  fs_datetime = ls_result
End Function

Function fs_duration( pd_start, pd_end )
  Dim ll_diff, ll_dddd, ll_hh, ll_mm, ll_ss, ls_duration
  ll_diff = DateDiff( "s", pd_start, pd_end )
  ll_ss = ll_diff - ( ( ll_diff \ 60 ) * 60 )
  ll_diff = ll_diff \ 60
  ll_mm = ll_diff - ( ( ll_diff \ 60 ) * 60 )
  ll_diff = ll_diff \ 60
  ll_hh = ll_diff - ( ( ll_diff \ 24 ) * 24 )
  ll_diff = ll_diff \ 24
  ll_dddd = ll_diff
  ls_duration = ""
  If ll_dddd > 0 Then
    ls_duration = ls_duration & fs_right( ll_dddd, 4 ) & " "
    ls_duration = ls_duration & fs_zeroes( ll_hh, 2 ) & ":"
  Else
    ls_duration = ls_duration & CStr( ll_hh ) & ":"
  End If
  ls_duration = ls_duration & fs_zeroes( ll_mm, 2 ) & ":"
  ls_duration = ls_duration & fs_zeroes( ll_ss, 2 )
  fs_duration = ls_duration
End Function

.

You could save this as :  folder-walk.vbs

...and then run it and make it walk, for example C:\Users, using :

cscript folder-walk.vbs C:\Users

...and if you don't want to see the cscript header, then run this once :

cscript //nologo /s

...one sec, and I'll post an example of me running it too.

sdo
Moderator
Moderator
Partner    VIP    Certified

Here's an example of me running it on my dev/test Windows Server 2019 :

C:\Users\myadmin\scripts\folder-walk-file-sizes>cscript folder-walk.vbs C:\Users
15:09:29
15:09:29  %s_main, walking:  C:\Users
15:09:29  %s_walk, ...count: 1   path: C:\Users
15:09:31  %s_walk, ...count: 1000   path: C:\Users\myadmin\AppData\Local\Google\Chrome\User Data\Default\Extensions\pkedcjkdefgpdelpbcmbmeomcjbeemfm\7919.1028.0.0_0\_locales\zh
15:09:34  %s_walk, ...count: 2000   path: C:\Users\myadmin\AppData\Local\Programs\Microsoft VS Code\resources\app\extensions\node_modules\typescript\lib\ru
15:09:37  %s_walk, ...count: 3000   path: C:\Users\myadmin\AppData\Roaming\Microsoft\Crypto\Keys
15:09:38  %s_main, ...done...
15:09:38
15:09:38  %s_main, summary...
15:09:38  %s_main, ...total folders:  3652
15:09:38  %s_main, ...total files:    8384
15:09:38  %s_main, ...max depth:      18
15:09:38  %s_main, ...deepest:        C:\Users\myadmin\AppData\Local\Programs\Microsoft VS Code\_\resources\app\extensions\ms-vscode.node-debug2\node_modules\vscode-chrome-debug-core\i18n\chs\out\src\chrome
15:09:38  %s_main, ...longest:        176
15:09:38  %s_main, ...longest:        C:\Users\myadmin\AppData\Local\Programs\Microsoft VS Code\_\resources\app\extensions\ms-vscode.node-debug2\node_modules\vscode-chrome-debug-core\out\src\chrome\crdpMultiplexing
15:09:38  %s_main, ...done...
15:09:38
15:09:38  %s_main, counts and locations in folders by time HH:MM...
15:09:38  %s_main, ...time: 15:09   count: 3652
15:09:38  %s_main, ......first: C:\Users
15:09:38  %s_main, ......last:  C:\Users\Public\Videos
15:09:38  %s_main, ...done...
15:09:38
15:09:38  %folder-walk, script started:    Thursday 05 March 2020 15:09:29
15:09:38  %folder-walk, script finsished:  Thursday 05 March 2020 15:09:38
15:09:38  %folder-walk, script duration:   0:00:09`...
15:09:38  %folder-walk, script exiting...

.

The purpose of the script is to list, by minutes, how many folders it was able to walk, and for each minute, which was the first folder it walked into, and what was the last folder it walked into...

...so I'm hoping that this might reveal if you have any slow spots in your file-system.

 

Hamza_H
Moderator
Moderator
   VIP   

Great job @sdo ! as always,

I don't really understand script things at all.. but what you did sounds really interresting, hope one day I'll get to do the same,

one question regarding this, so if i got it right, there is no need to run a backup during this? its all on os part , right?

Again, great job :)

sdo
Moderator
Moderator
Partner    VIP    Certified

As it stands the script is purely a folder walker, and not a file walker.  As to whether running the script will reveal anything useful is debatable, but you never know.  If I get time, I might add an option to also get the script to optionally walk all files too and table-up file by size and by file type, and maybe by file age too.

It shouldn't really matter whether the script is running during a backup, but again you never know, as some file-systems and some storage-systems just simply cannot handle the workload of multiple walkers, so it would probably be best to test on a dev system first, and then firstly test out of hours on production.

@davidmoline suggestion to enable Change Journal tracking and to enable NetBackup Accelerator would seem to me to be sound advice, especially on low volatility and low churn file systems.

I try to use change journal, but I nees to wait a full backup on the weekend for it to take efect. 

I changed policie to "use accelerator", but still not take efect.

At this moment I need to wait it finish, I checked that it server has ask to apply some updates, I will do it at this weekend and let you know about it.

Thanks all

 

Thank you David Moline, I configured to use "change journal" and accelerator in a new policie, everythings run fine now.

Thanks everybody for the help.

Best Regards.

sdo
Moderator
Moderator
Partner    VIP    Certified

Hi VB - glad you got it sorted.  Any chance of a little summary of the differences / improvements / benefits of the results of the changes made?  Thank you.