03-03-2020 06:45 AM
Hi all, I have a problem with file server backup, since we upgrade the OS of FS to Windows 2019, sometime it backup stay very slowly.
The first time it occurs I fix it restarting my file server, after one month, the problem return.
Only Inc backup occur this, the full backups are ok.
I have a policie with many clients and these other working fine, only file server has problem.
The normal backup of this client take 3 hours in normally conditions, but now it takes 10 hours and the speed from 52MB/s past to 20MB/s.
I don´t know whats happen, nothing was did in file server or master and media server.
Can anybody help me?
My configuration is:
Master Server:
Windows Server 2012 R2 (Virtual Machine)
Two media server MSDP (phisical machines), we have two sites.
My File server client is Windows Server 2019 (Virtual Machine)
Our virtual environment is VmWare
Attached I post two job log one with normally condition and the other with the problem.
Solved! Go to Solution.
03-04-2020 10:53 PM
Yes a red herring - the bpbkar process in 8.x is called bpbkar32.exe - and there is only a 64 bit binary available for Windows now.
What I'd be looking at is enabling Windows change journalling to avoid the costly directory walk in the first place. For the client in "Host properties -> Clients -> Windows Client -> Client Settings": enable "Use Change Journal".
I'd also be enabling aceletator in the policy (it doesn't appear to be set)
This should remove the directory walk time which is where your backup is showing slowness.
Another option would be to use a Flashbackup-Windows policy for the E: drive (and exclude E: from original policy) as that seems to have a very large number of files for which that policy type is designed.
03-04-2020 01:36 AM - edited 03-04-2020 01:39 AM
To me this looks like problem in the client OS. In this summary we see very similar delay and wait counts for NetBackup buffers which makes me think that the problem is not with NetBackup Server:
was good 24/02/2020 02:30:16 - Info bptm (pid=952) using 262144 data buffer size 24/02/2020 02:30:16 - Info bptm (pid=952) setting receive network buffer to 1049600 bytes 24/02/2020 02:30:16 - Info bptm (pid=952) using 30 data buffers 24/02/2020 02:30:16 - Info scsnb3 (pid=952) Using OpenStorage client direct to backup from client uscfs1 to scsnb3 24/02/2020 05:48:55 - Info bpbkar32 (pid=10524) bpbkar waited 13852 times for empty buffer, delayed 23719 times. 24/02/2020 05:49:00 - Info scsnb3 (pid=952) StorageServer=PureDisk:scsnb3; Report=PDDO Stats (multi-threaded stream used) for (scsnb3): scanned: 745085458 KB, CR sent: 1954275 KB, CR sent over FC: 0 KB, dedup: 99.7%, cache disabled 24/02/2020 05:49:05 - end writing; write time: 3:18:47 now slow 02/03/2020 02:30:17 - Info bptm (pid=244) using 262144 data buffer size 02/03/2020 02:30:17 - Info bptm (pid=244) setting receive network buffer to 1049600 bytes 02/03/2020 02:30:17 - Info bptm (pid=244) using 30 data buffers 02/03/2020 02:30:18 - Info scsnb3 (pid=244) Using OpenStorage client direct to backup from client uscfs1 to scsnb3 02/03/2020 12:19:23 - Info bpbkar32 (pid=8148) bpbkar waited 12443 times for empty buffer, delayed 29060 times. 02/03/2020 12:19:37 - Info scsnb3 (pid=244) StorageServer=PureDisk:scsnb3; Report=PDDO Stats (multi-threaded stream used) for (scsnb3): scanned: 748291295 KB, CR sent: 1533437 KB, CR sent over FC: 0 KB, dedup: 99.8%, cache disabled 02/03/2020 12:19:43 - end writing; write time: 9:49:24
.
My guess is that it is the file-system walk of the client OS that is slow. Something is slugging the client. But determining the root cause for this is going to be quite difficult.
One idea that comes to mind would be to collect level 3 (I think it is level 3) bpbkar logging which reveals the file name of each file as the file system is walked. But manually visually spotting any repeated minor slight delay and increase in length of time across millions of files will be near impossible. So, one answer might be to write a script / tool to process the bpbkar log to analyse the time gap between each file detected (i.e. count, sum, max, min, average, standard deviation, etc...) and see if this clearly points to a delay in walking the file-system. If so, then open a case with Microsoft.
Another idea might be to run a bpbkar to null every day and thus definitely prove that the delay has nothing at all to do with LAN (to NetBackup Server) nor with NetBackup storage units. And again when you see clear slowness from one day to the next, open a case with Microsoft.
Another idea might be to try to see if the problem is definitely at least in part related to purely folder walking. Write a script (which could a PowerShell one-liner) which only walks the folder structure, i.e. do not open any files, and see if that demonstrates sudden slowness from one day to the next.
Just curious as to the nature and config of the storage that underlies the file-share. Could you describe? Is it a DAS, or NAS, or SAN (iSCSI or FC?) ?
03-04-2020 02:24 AM - edited 03-04-2020 02:25 AM
Another thing that you can do is to logon,and leave it logged on, an RDP session to the file-server, and re-logon once a day to keep the RDP session alive, and inside this session use the Windows "perfmon" tool to monitor IO to the logical disk(s) hosting the file-server services/shares. Set the counter to 120 seconds in 86400 seconds (i.e. every two minutes for 24 hours), and monitor counters like:
- operations : reads/sec, writes/sec
- latency : sec/read, sec/write
- payload : avg read size, avg write size
- throughput : read MB/s, write MB/s
03-04-2020 02:51 AM
Thanks sdo, my restart does not fix at this time, I will do your suggestion and let you know here.
Thanks again.
03-04-2020 05:20 AM
You ask.
Just curious as to the nature and config of the storage that underlies the file-share. Could you describe? Is it a DAS, or NAS, or SAN (iSCSI or FC?) ?
We are using a Cisco HyperFlex (Hyperconvergence) as cluster servers with Vmware.
Thanks
03-04-2020 02:12 PM
Also what is the file system type used for the storage that is exhibiting the slow backup speed?
03-04-2020 09:09 PM - edited 03-04-2020 09:22 PM
This may be a complete red herring, but I though I'd throw it out there as this is a bit of a weird case. In both of the job details I see this:
Is this a 32 bit client on a 64-bit OS? All of my Windows clients (all 64 bit) show "starting bpbkar", not bpbkar32.
Steve
03-04-2020 10:53 PM
Yes a red herring - the bpbkar process in 8.x is called bpbkar32.exe - and there is only a 64 bit binary available for Windows now.
What I'd be looking at is enabling Windows change journalling to avoid the costly directory walk in the first place. For the client in "Host properties -> Clients -> Windows Client -> Client Settings": enable "Use Change Journal".
I'd also be enabling aceletator in the policy (it doesn't appear to be set)
This should remove the directory walk time which is where your backup is showing slowness.
Another option would be to use a Flashbackup-Windows policy for the E: drive (and exclude E: from original policy) as that seems to have a very large number of files for which that policy type is designed.
03-05-2020 02:40 AM
Our storage file system is NTFS
03-05-2020 02:49 AM
I don´t know why but all off my job details of all of clients are showing starting bpbkar32, and all of my clients are 64 bits.
I don´t know if this is wrong.
03-05-2020 03:03 AM - edited 03-05-2020 03:05 AM
Ignore the numbers 32. It is a 64-bit executable. The executable program name, also the process name, is a red herring with the 32 on it. It is a hangover, a legacy from the days when Unix had 64 bit executables, and Windows had 32-bit executables. Veritas or Symantec needed to separate the code streams, i.e. there were probably two separate pieces of source code for disk readers - but that is all irrelevant now.
So, there is only one disk reader process for modern Windows clients, a 64-bit executable named "bpbkar32". Ignore the 32 in the name - it's just text, and utterly meaningless, and it might as well be named "bpbkar99".
03-05-2020 05:03 AM
03-05-2020 05:14 AM
I will do it but I need to wait job finalize before, after it I will do it and re-run job.
Thanks
03-05-2020 07:13 AM
Just knocked this up for fun, so here's a simple folder walking script :
Option Explicit Dim go_fso Dim gs_script_spec, gs_script_path, gs_script_name, gs_script_title, gs_script_fac, gs_script_engine, gd_script_start, gd_script_end Dim gl_folders, gl_files, gl_depth, gs_deepest, gl_longest, gs_longest Dim gd_timings, gd_first, gd_last gd_script_start = Now Call s_init() Call s_main() gd_script_end = Now Call s_log( "" ) Call s_log( gs_script_fac & "script started: " & fs_datetime( gd_script_start ) ) Call s_log( gs_script_fac & "script finsished: " & fs_datetime( gd_script_end ) ) Call s_log( gs_script_fac & "script duration: " & Trim( fs_duration( gd_script_start, gd_script_end ) ) & "`..." ) Call s_log( gs_script_fac & "script exiting..." ) WScript.Quit( 0 ) Sub s_init() Const cs_fac = "%s_main, " Set go_fso = CreateObject( "Scripting.FileSystemObject" ) gs_script_spec = WScript.ScriptFullName gs_script_path = go_fso.GetParentFolderName( gs_script_spec ) gs_script_name = go_fso.GetBaseName( gs_script_spec ) gs_script_fac = "%" & gs_script_name & ", " gs_script_engine = LCase( go_fso.GetBaseName( WScript.FullName ) ) Select Case gs_script_engine Case "cscript" Case "wscript" Call s_abort( cs_fac & "can only be run from command line..." ) Case Else Call s_abort( cs_fac & "unknown script engine `" & gs_script_engine & "`..." ) End Select Set gd_timings = CreateObject( "Scripting.Dictionary" ) Set gd_first = CreateObject( "Scripting.Dictionary" ) Set gd_last = CreateObject( "Scripting.Dictionary" ) gl_folders = 0 gl_files = 0 gl_depth = 0 gl_longest = 0 gs_deepest = "" gs_longest = "" If WScript.Arguments.Count <> 1 Then Call s_abort( cs_fac & "specify a folder path to walk..." ) End Sub Sub s_main() Const cs_fac = "%s_main, " Dim ls_root, lo_root, ls_hhmm, ll_count, ls_first, ls_last ls_root = WScript.Arguments.Item( 0 ) Call s_log( "" ) Call s_log( cs_fac & "walking: " & ls_root ) Set lo_root = go_fso.GetFolder( ls_root ) Call s_walk( lo_root, 1 ) Call s_log ( cs_fac & "...done..." ) Call s_log( "" ) Call s_log( cs_fac & "summary..." ) Call s_log( cs_Fac & "...total folders: " & gl_folders ) Call s_log( cs_Fac & "...total files: " & gl_files ) Call s_log( cs_Fac & "...max depth: " & gl_depth ) Call s_log( cs_Fac & "...deepest: " & gs_deepest ) Call s_log( cs_Fac & "...longest: " & gl_longest ) Call s_log( cs_Fac & "...longest: " & gs_longest ) Call s_log( cs_fac & "...done..." ) Call s_log( "" ) Call s_log( cs_fac & "counts and locations in folders by time HH:MM..." ) For Each ls_hhmm In gd_timings.Keys ll_count = gd_timings.Item( ls_hhmm ) ls_first = gd_first.Item( ls_hhmm ) ls_last = gd_last.Item( ls_hhmm ) Call s_log( cs_fac & "...time: " & ls_hhmm & " count: " & ll_count ) Call s_log( cs_fac & "......first: " & ls_first ) Call s_log( cs_fac & "......last: " & ls_last ) Next Call s_log( cs_fac & "...done..." ) End Sub Sub s_walk( po_folder, pl_depth ) Const cs_fac = "%s_walk, " Dim lo_folders, lo_folder, lo_files, lo_file, lb_accessible, ls_hhmm If Left( po_folder.Name, 12 ) = "$RECYCLE.BIN" Then Exit Sub If pl_depth > gl_depth Then gl_depth = pl_depth gs_deepest = po_folder.Path End If If Len( po_folder.Path ) > gl_longest Then gl_longest = Len( po_folder.Path ) gs_longest = po_folder.Path End If Set lo_folders = po_folder.SubFolders lb_accessible = True On Error Resume Next ll_dummy = lo_folders.Count If Err.Number = 70 Then lb_accessible = False On Error Goto 0 If Not lb_accessible Then Exit Sub gl_folders = gl_folders + 1 gl_files = gl_files + po_folder.Files.Count ls_hhmm = fs_hhmm() Call s_inc( gd_timings, ls_hhmm ) If Not gd_first.Exists( ls_hhmm ) Then gd_first.Add ls_hhmm, po_folder.Path If gd_last.Exists( ls_hhmm ) Then gd_last.Remove ls_hhmm gd_last.Add ls_hhmm, po_folder.Path If ( ( gl_folders = 1 ) Or ( ( gl_folders Mod 1000 ) = 0 ) ) Then Call s_log( cs_fac & "...count: " & gl_folders & " path: " & po_folder.Path ) End If For Each lo_folder In lo_folders Call s_walk( lo_folder, pl_depth + 1 ) Next ' Set lo_files = po_folder.Files ' For Each lo_file In lo_files ' ...collect a table of distribution of file sizes? ' deepest file name? ' longest file name? ' oldest file? ' top 10 largest files? ' table of counts, and table of sizes - by file type extension name... ' Next End Sub Sub s_inc( pd_dic, ps_key ) Dim ll_value If pd_dic.Exists( ps_key ) Then ll_value = pd_dic.Item( ps_key ) + 1 pd_dic.Remove ps_key Else ll_value = 1 End If pd_dic.Add ps_key, ll_value End Sub Sub s_bubble( pd_dic ) Dim ls_keys, ls_items, ls_key, ls_item, lx_temp, ll_entry, lb_swapped, ll_compares, ll_swaps, ll_count '...is there actually any work to do... ll_count = pd_dic.Count If ll_count < 2 Then Exit Sub '...extract keys and items... ReDim ls_keys( pd_dic.Count ) ReDim ls_items( pd_dic.Count ) ll_entry = 0 For Each ls_key In pd_dic.Keys ll_entry = ll_entry + 1 ls_keys( ll_entry ) = ls_key ls_items( ll_entry ) = pd_dic.Item( ls_key ) Next '...sort... ll_compares = 0 ll_swaps = 0 For ll_i = 1 To ll_count lb_swapped = False For ll_j = ll_count To ll_i Step -1 ll_compares = ll_compares + 1 If ls_items( ll_j ) > ls_items( ll_j + 1 ) Then ls_swap = ls_items( ll_j ) ls_items( ll_j ) = ls_items( ll_j + 1 ) ls_items( ll_j + 1 ) = ls_swap lb_swapped = True ll_swaps = ll_swaps + 1 End If Next If Not lb_swapped Then Exit For Next '...reload keys and items... Set pd_dic = Nothing Set pd_dic = CreateObject( "Scripting.Dictionary" ) For ll_entry = 1 To ll_count pd_dic.Add ls_key( ll_entry ), ls_item( ll_entry ) Next '...cleanup... Set ls_keys = Nothing Set ls_items = Nothing End Sub Sub s_abort( ps_message ) Const cs_fac = "%s_abort, " Call s_log( ps_message ) Call s_log( cs_fac & "script aborting..." ) WScript.Quit( 1 ) End Sub Sub s_quit( ps_message ) Const cs_fac = "%s_quit, " Call s_log( ps_message ) Call s_log( cs_fac & "script quitting..." ) WScript.Quit( 0 ) End Sub Sub s_log( ps_message ) Dim ls_message ls_message = fs_hhmmss() & " " & ps_message WScript.Echo ls_message End Sub Function fs_hhmmss() Dim ld_dt ld_dt = Now fs_hhmmss = fs_zeroes( DatePart( "h", ld_dt ), 2 ) & ":" & fs_zeroes( DatePart( "n", ld_dt ), 2 ) & ":" & fs_zeroes( DatePart( "s", ld_dt ), 2 ) End Function Function fs_hhmm() Dim ld_dt ld_dt = Now fs_hhmm = fs_zeroes( DatePart( "h", ld_dt ), 2 ) & ":" & fs_zeroes( DatePart( "n", ld_dt ), 2 ) End Function Function fs_zeroes( pl_number, pl_length ) Dim ls_result ls_result = String( pl_length, "0" ) & CStr( pl_number ) ls_result = Right( ls_result, pl_length ) fs_zeroes = ls_result End Function Function fs_datetime( pd_datetime ) Dim ld_datetime, ls_result If VarType( pd_datetime ) = vbDate Then ld_datetime = pd_datetime Else ld_datetime = Now End If ls_result = WeekDayName( WeekDay( ld_datetime ), False, 1 ) ls_result = ls_result & " " & FormatDateTime( ld_datetime, vbLongdate ) ls_result = ls_result & " " & FormatDateTime( ld_datetime, vbLongtime ) fs_datetime = ls_result End Function Function fs_duration( pd_start, pd_end ) Dim ll_diff, ll_dddd, ll_hh, ll_mm, ll_ss, ls_duration ll_diff = DateDiff( "s", pd_start, pd_end ) ll_ss = ll_diff - ( ( ll_diff \ 60 ) * 60 ) ll_diff = ll_diff \ 60 ll_mm = ll_diff - ( ( ll_diff \ 60 ) * 60 ) ll_diff = ll_diff \ 60 ll_hh = ll_diff - ( ( ll_diff \ 24 ) * 24 ) ll_diff = ll_diff \ 24 ll_dddd = ll_diff ls_duration = "" If ll_dddd > 0 Then ls_duration = ls_duration & fs_right( ll_dddd, 4 ) & " " ls_duration = ls_duration & fs_zeroes( ll_hh, 2 ) & ":" Else ls_duration = ls_duration & CStr( ll_hh ) & ":" End If ls_duration = ls_duration & fs_zeroes( ll_mm, 2 ) & ":" ls_duration = ls_duration & fs_zeroes( ll_ss, 2 ) fs_duration = ls_duration End Function
.
You could save this as : folder-walk.vbs
...and then run it and make it walk, for example C:\Users, using :
cscript folder-walk.vbs C:\Users
...and if you don't want to see the cscript header, then run this once :
cscript //nologo /s
...one sec, and I'll post an example of me running it too.
03-05-2020 07:16 AM
Here's an example of me running it on my dev/test Windows Server 2019 :
C:\Users\myadmin\scripts\folder-walk-file-sizes>cscript folder-walk.vbs C:\Users 15:09:29 15:09:29 %s_main, walking: C:\Users 15:09:29 %s_walk, ...count: 1 path: C:\Users 15:09:31 %s_walk, ...count: 1000 path: C:\Users\myadmin\AppData\Local\Google\Chrome\User Data\Default\Extensions\pkedcjkdefgpdelpbcmbmeomcjbeemfm\7919.1028.0.0_0\_locales\zh 15:09:34 %s_walk, ...count: 2000 path: C:\Users\myadmin\AppData\Local\Programs\Microsoft VS Code\resources\app\extensions\node_modules\typescript\lib\ru 15:09:37 %s_walk, ...count: 3000 path: C:\Users\myadmin\AppData\Roaming\Microsoft\Crypto\Keys 15:09:38 %s_main, ...done... 15:09:38 15:09:38 %s_main, summary... 15:09:38 %s_main, ...total folders: 3652 15:09:38 %s_main, ...total files: 8384 15:09:38 %s_main, ...max depth: 18 15:09:38 %s_main, ...deepest: C:\Users\myadmin\AppData\Local\Programs\Microsoft VS Code\_\resources\app\extensions\ms-vscode.node-debug2\node_modules\vscode-chrome-debug-core\i18n\chs\out\src\chrome 15:09:38 %s_main, ...longest: 176 15:09:38 %s_main, ...longest: C:\Users\myadmin\AppData\Local\Programs\Microsoft VS Code\_\resources\app\extensions\ms-vscode.node-debug2\node_modules\vscode-chrome-debug-core\out\src\chrome\crdpMultiplexing 15:09:38 %s_main, ...done... 15:09:38 15:09:38 %s_main, counts and locations in folders by time HH:MM... 15:09:38 %s_main, ...time: 15:09 count: 3652 15:09:38 %s_main, ......first: C:\Users 15:09:38 %s_main, ......last: C:\Users\Public\Videos 15:09:38 %s_main, ...done... 15:09:38 15:09:38 %folder-walk, script started: Thursday 05 March 2020 15:09:29 15:09:38 %folder-walk, script finsished: Thursday 05 March 2020 15:09:38 15:09:38 %folder-walk, script duration: 0:00:09`... 15:09:38 %folder-walk, script exiting...
.
The purpose of the script is to list, by minutes, how many folders it was able to walk, and for each minute, which was the first folder it walked into, and what was the last folder it walked into...
...so I'm hoping that this might reveal if you have any slow spots in your file-system.
03-05-2020 09:56 AM
Great job @sdo ! as always,
I don't really understand script things at all.. but what you did sounds really interresting, hope one day I'll get to do the same,
one question regarding this, so if i got it right, there is no need to run a backup during this? its all on os part , right?
Again, great job :)
03-06-2020 02:19 AM - edited 03-06-2020 02:20 AM
As it stands the script is purely a folder walker, and not a file walker. As to whether running the script will reveal anything useful is debatable, but you never know. If I get time, I might add an option to also get the script to optionally walk all files too and table-up file by size and by file type, and maybe by file age too.
It shouldn't really matter whether the script is running during a backup, but again you never know, as some file-systems and some storage-systems just simply cannot handle the workload of multiple walkers, so it would probably be best to test on a dev system first, and then firstly test out of hours on production.
@davidmoline suggestion to enable Change Journal tracking and to enable NetBackup Accelerator would seem to me to be sound advice, especially on low volatility and low churn file systems.
03-06-2020 03:09 AM
I try to use change journal, but I nees to wait a full backup on the weekend for it to take efect.
I changed policie to "use accelerator", but still not take efect.
At this moment I need to wait it finish, I checked that it server has ask to apply some updates, I will do it at this weekend and let you know about it.
Thanks all
03-09-2020 06:55 AM
Thank you David Moline, I configured to use "change journal" and accelerator in a new policie, everythings run fine now.
Thanks everybody for the help.
Best Regards.
03-09-2020 07:28 AM
Hi VB - glad you got it sorted. Any chance of a little summary of the differences / improvements / benefits of the results of the changes made? Thank you.