Forum Discussion

Holger_'s avatar
Holger_
Level 3
4 years ago

Fileserver backup: poor backup performance | Initial VirtApi DLL load check failed.

Hi,

I recently migrated our clustered MS file server to Windows Server 2019. 2 nodes (basically the same installation), VMware, iSCSI SAN, multiple policies, multible streams, time-splited schedules, no accelerator or change journal in use (currently I have no knowledge with this), NB is v8 on server and client.

Especially the daily incremental backup of the user share folders takes way too long on node 1, it exceeds the time window. Node 2 perform better - not as good as on Server 2k8, but well...

Trouble shooting turned out a different behaviour: on node 1 the server is waiting for a timeout while failing to load 2 dll's (VirtApi.dll and bedstrace.dll). Node 2 doesn't have this issue. Both nodes should be identically installed and configured.

part of the bpbkar log:

15:56:17.549 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_InitFileSys
15:56:17.549 [2524.6724] <2> ov_log::V_GlobalLog: INF - VirtApi DLL WAS LOADED FROM VirtApi.dll.!
15:57:48.925 [2524.6724] <2> ov_log::V_GlobalLog: INF - Initial VirtApi DLL load check failed. Will try again later.
15:57:48.941 [2524.6724] <2> ov_log::V_GlobalLog: INF - VirtApi DLL WAS LOADED FROM VirtApi.dll.!
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - Initial VirtApi DLL load check failed. Will try again later.
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedsxese.dll
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:13] ... failed to load bedstrace.dll.
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:23] ... failed to load bedstrace.dll.
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:30] ... failed to load bedstrace.dll.
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedsshadow.dll
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedsnt5.dll
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:5] ... failed to load bedstrace.dll.
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedsss.dll
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:1e] ... failed to load bedstrace.dll.
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
15:59:20.121 [2524.6724] <2> ov_log::V_GlobalLog: INF - VirtApi DLL WAS LOADED FROM VirtApi.dll.!
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - Initial VirtApi DLL load check failed. Will try again later.
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedsadgran.dll
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedssql2.dll
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:10] ... failed to load bedstrace.dll.
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - Initializing FSs

 

This is constantly repeated after every stream written (parallel ofcourse due to multistream) and adds up plenty of time at all. The issue is reproducable and follow the VM in case of VM migration to another ESXi host - as well as the _non-issue_ node 2 runs without this issue on the same ESX.

Unfortunately I found not much about the function of the VirtApi.dll.

Access control is prohibited at master/ media server and client. Nevertheless I tried intentionelly to decrease the default value of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\VirtFile "Sleep Before Unload" (31000) to 10 to force bpbkar to go ahead in this circumstance and not waiting, which ended in not running the backup even if the processes where running.

Services are running with local system permissions. 

I also tried to manually reinstall the VirtFile.inf and to implement some points from the Performance Tuning Guide, but ended up every time at this dll load issue. 

Any help or hint would be greatly appreciated.

 

Thanks,

Holger

 

  • I figured out that there was a difference between the 2 nodes: on the non-issue node Secure Boot was disabled in VMware Boot options (I don't know why and can't trace back). After disabling secure boot on the issued node and reinstalled NB Client the minifilter is loaded correctly by the system. I'm really interessted why just the secure boot option makes a difference while installing the client.

    So the first point should be solved now (has to be confirmed this evening). confirmed: it's solved

    Jumping to the next point: as mentioned the (incremental) backup takes about 50% more time on Windows Server 2019 than before on Server 2008 R2. Previously about 2hrs for DINCR backup of the approx 1.5M files, now with srv 2019 almost 3hrs. It's currently not a problem, but I want to understand why. Same SAN, same infrastructure, same ESXi, still NTFS, SMB v3 now. Are there NB tuning guides for server 2019? Does it make sense to enable journaling? What consequenses are to be expected? I'm really careful cause I'd read that once enabled it's not reversable.

  • Please share the number of files being backed up for the incremental backup. If its in millions then your file server has lot of activity and files are constantly changed.

    How much time does the Full backup take? Is the backup for incremental is more than the Full backup time??

    • Holger_'s avatar
      Holger_
      Level 3

      Overall there are about 7.5M files, thereof ~1.5M files just the user shares.

      In fact it was a little more before the migration (Windows Server 2008R2) and on that the inc. backup was taken daily in less than 2 hrs for the user shares.

      Untill now I had no chance to see how long a Full Backup will take on the problematic node, because as soon as I saw the backup could not be taken complete and figured out the _sleep time_ between every bunch of streams, I switched over to node 2. On that the backup of the user shares takes about 4.5hrs for full vs. 3-4hrs inc, which is not fine because I expected a better performance than on 2k8 (the rest of the infrastructure is completely the same btw.) but ok for now... hopefully being able to improve this. ( Has anyone empirical values in performance changes when comparing NB8.0 to 8.2 or 8.3? )

      But my first task is to solve the current issue: The node 2 doesn't have the dll load issue -> why on node 1 and how can I fix it?

       

      • Lowell_Palecek's avatar
        Lowell_Palecek
        Level 6

        Bedstrace.dll is not used in NetBackup (or even installed). The failures to load it should not take significant time compared with backing up millions of files. Your log snippet shows the failure occurring in the same heartbeat as the loading of the preceding DLL.

        VirtApi.dll loads VirtFile.sys. This is the "VFF" filter that NetBackup uses to simulate read/write I/O against read-only files. (I'm not clear on the details.)

        NetBackup shares a component with Backup Exec called BEDS - Backup Exec Data Store - for interacting with Exchange, SharePoint, Active Directory, and DFSR data, among others. BEDS also performs non-hardware VSS snapshots. NetBackup Windows processes load and initialize BEDS at startup. Bedstrace.dll and virtApi.dll are among the DLLs that BEDS initialization loads. Your log snippet shows several other DLLs whose names hint at their purpose. (Bedsxese.dll in tha main DLL for Exchange database access.)

        Do NOT mess with the virtFile sleep before unload parameter. Put it back. We put that in there to overcome a difference between how BE and NetBackup work. The BE component that loads these DLLs is a long-running service, so they are loaded only once. NetBackup processes - bpfis, bpkar32, tar32, nbWin, and more, are shot-lived processes that perform a task and exit. Thus, NetBackup loads and unloads the BEDS DLLs frequently while BE loads them once in a great while.

        VirtFile.sys is a Windows minifilter. (See fltmc.) It is loaded when the OS boots, according to its configured "altitude" assigned by Microsoft to Veritas. The altitude is the load order. It's not good for a minifilter to load and unload itself multiple times.

        I recommend that you put the configuration setting back the way it was and (sorry) reboot your server to restore the relationship between the OS and the virtFile driver.