cancel
Showing results for 
Search instead for 
Did you mean: 

Fileserver backup: poor backup performance | Initial VirtApi DLL load check failed.

Holger_
Level 3

Hi,

I recently migrated our clustered MS file server to Windows Server 2019. 2 nodes (basically the same installation), VMware, iSCSI SAN, multiple policies, multible streams, time-splited schedules, no accelerator or change journal in use (currently I have no knowledge with this), NB is v8 on server and client.

Especially the daily incremental backup of the user share folders takes way too long on node 1, it exceeds the time window. Node 2 perform better - not as good as on Server 2k8, but well...

Trouble shooting turned out a different behaviour: on node 1 the server is waiting for a timeout while failing to load 2 dll's (VirtApi.dll and bedstrace.dll). Node 2 doesn't have this issue. Both nodes should be identically installed and configured.

part of the bpbkar log:

15:56:17.549 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_InitFileSys
15:56:17.549 [2524.6724] <2> ov_log::V_GlobalLog: INF - VirtApi DLL WAS LOADED FROM VirtApi.dll.!
15:57:48.925 [2524.6724] <2> ov_log::V_GlobalLog: INF - Initial VirtApi DLL load check failed. Will try again later.
15:57:48.941 [2524.6724] <2> ov_log::V_GlobalLog: INF - VirtApi DLL WAS LOADED FROM VirtApi.dll.!
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - Initial VirtApi DLL load check failed. Will try again later.
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedsxese.dll
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:13] ... failed to load bedstrace.dll.
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:23] ... failed to load bedstrace.dll.
15:59:20.089 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:30] ... failed to load bedstrace.dll.
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedsshadow.dll
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedsnt5.dll
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:5] ... failed to load bedstrace.dll.
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedsss.dll
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:1e] ... failed to load bedstrace.dll.
15:59:20.105 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
15:59:20.121 [2524.6724] <2> ov_log::V_GlobalLog: INF - VirtApi DLL WAS LOADED FROM VirtApi.dll.!
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - Initial VirtApi DLL load check failed. Will try again later.
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedsadgran.dll
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - loaded bedssql2.dll
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - [AgentID:10] ... failed to load bedstrace.dll.
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - FS_DleGen: Failure to create debug trace object at line number 76
16:00:51.334 [2524.6724] <2> ov_log::V_GlobalLog: INF - Initializing FSs

 

This is constantly repeated after every stream written (parallel ofcourse due to multistream) and adds up plenty of time at all. The issue is reproducable and follow the VM in case of VM migration to another ESXi host - as well as the _non-issue_ node 2 runs without this issue on the same ESX.

Unfortunately I found not much about the function of the VirtApi.dll.

Access control is prohibited at master/ media server and client. Nevertheless I tried intentionelly to decrease the default value of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\VirtFile "Sleep Before Unload" (31000) to 10 to force bpbkar to go ahead in this circumstance and not waiting, which ended in not running the backup even if the processes where running.

Services are running with local system permissions. 

I also tried to manually reinstall the VirtFile.inf and to implement some points from the Performance Tuning Guide, but ended up every time at this dll load issue. 

Any help or hint would be greatly appreciated.

 

Thanks,

Holger

 

1 ACCEPTED SOLUTION

Accepted Solutions

I figured out that there was a difference between the 2 nodes: on the non-issue node Secure Boot was disabled in VMware Boot options (I don't know why and can't trace back). After disabling secure boot on the issued node and reinstalled NB Client the minifilter is loaded correctly by the system. I'm really interessted why just the secure boot option makes a difference while installing the client.

So the first point should be solved now (has to be confirmed this evening). confirmed: it's solved

Jumping to the next point: as mentioned the (incremental) backup takes about 50% more time on Windows Server 2019 than before on Server 2008 R2. Previously about 2hrs for DINCR backup of the approx 1.5M files, now with srv 2019 almost 3hrs. It's currently not a problem, but I want to understand why. Same SAN, same infrastructure, same ESXi, still NTFS, SMB v3 now. Are there NB tuning guides for server 2019? Does it make sense to enable journaling? What consequenses are to be expected? I'm really careful cause I'd read that once enabled it's not reversable.

View solution in original post

12 REPLIES 12

Tape_Archived
Moderator
Moderator
   VIP   

Please share the number of files being backed up for the incremental backup. If its in millions then your file server has lot of activity and files are constantly changed.

How much time does the Full backup take? Is the backup for incremental is more than the Full backup time??

Overall there are about 7.5M files, thereof ~1.5M files just the user shares.

In fact it was a little more before the migration (Windows Server 2008R2) and on that the inc. backup was taken daily in less than 2 hrs for the user shares.

Untill now I had no chance to see how long a Full Backup will take on the problematic node, because as soon as I saw the backup could not be taken complete and figured out the _sleep time_ between every bunch of streams, I switched over to node 2. On that the backup of the user shares takes about 4.5hrs for full vs. 3-4hrs inc, which is not fine because I expected a better performance than on 2k8 (the rest of the infrastructure is completely the same btw.) but ok for now... hopefully being able to improve this. ( Has anyone empirical values in performance changes when comparing NB8.0 to 8.2 or 8.3? )

But my first task is to solve the current issue: The node 2 doesn't have the dll load issue -> why on node 1 and how can I fix it?

 

Bedstrace.dll is not used in NetBackup (or even installed). The failures to load it should not take significant time compared with backing up millions of files. Your log snippet shows the failure occurring in the same heartbeat as the loading of the preceding DLL.

VirtApi.dll loads VirtFile.sys. This is the "VFF" filter that NetBackup uses to simulate read/write I/O against read-only files. (I'm not clear on the details.)

NetBackup shares a component with Backup Exec called BEDS - Backup Exec Data Store - for interacting with Exchange, SharePoint, Active Directory, and DFSR data, among others. BEDS also performs non-hardware VSS snapshots. NetBackup Windows processes load and initialize BEDS at startup. Bedstrace.dll and virtApi.dll are among the DLLs that BEDS initialization loads. Your log snippet shows several other DLLs whose names hint at their purpose. (Bedsxese.dll in tha main DLL for Exchange database access.)

Do NOT mess with the virtFile sleep before unload parameter. Put it back. We put that in there to overcome a difference between how BE and NetBackup work. The BE component that loads these DLLs is a long-running service, so they are loaded only once. NetBackup processes - bpfis, bpkar32, tar32, nbWin, and more, are shot-lived processes that perform a task and exit. Thus, NetBackup loads and unloads the BEDS DLLs frequently while BE loads them once in a great while.

VirtFile.sys is a Windows minifilter. (See fltmc.) It is loaded when the OS boots, according to its configured "altitude" assigned by Microsoft to Veritas. The altitude is the load order. It's not good for a minifilter to load and unload itself multiple times.

I recommend that you put the configuration setting back the way it was and (sorry) reboot your server to restore the relationship between the OS and the virtFile driver.

Thank you very much for sharing this information about the functionality.

I already put the changed setting back, and further I created a clean new cluster node and installed NB Client on it without changing any parameter. But unfortunatly I had the same issue on it. So currently we are still at the same point: the backup takes too long (only on node 1), because of adding timouts before a bunch of new bpbkar processes could start to read data. This issue concerns the user share backup only because our policy is set up like this: 

%policy_name%_user_A-F
Backup selections:
%path%\A*
%path%\B*
%path%\C*
%path%\D*
%path%\E*
%path%\F*

%policy_name%_user_G-L
Backup selections:
%path%\G*
%path%\H*
[...] and so on. This was initially set up by a colleage many years ago as he learned it at NetBack trainings. (And it did run well until the server migration).

I also understand that the issue eventually could be worked around with changing the policies to %path%\User\*. But I wanted to solve the root cause and I still don't understand, why this bevahiour is just on one server, while the other don't have this dll load issue even if the config is the very same....

Also while trouble shooting we saw this:

C:\Windows\system32>fltmc unload VirtFile

Unload failed with error: 0x801f0013
The system could not find the filter specified.

Which tells me, that the filter was/is not installed.... (And I didn't want to run this cmd at the working node for double checking...)

I gave it a new try. Did uninstall the client, cleaned up, rebooted, installed it new via remote install, rebooted. The behaviour is the same. I add the logs.

You can use fltmc to find out what minifilters are loaded. It seems that virtFile.sys is not. Furthermore, it seems from your original post that NetBackup (virtAPI.dll) is not able to load it.

I conjecture that it did not get installed when NetBackup was installed. It goes in Windows\System32\drivers. Perhaps the NetBackup installation was blocked from installing it. Please check for it in the drivers folder. If it's missing, there's a copy in NetBackup\bin\virtFile. I can ask my colleagues for a procedure to install it. I've done it before, but it's been a long time.

ok, as assumed it's not loaded. (even if the file already is present in \c$\Windows\System32\drivers)

It would be great, if you could ask. Thank you very much in advance!

 

I figured out that there was a difference between the 2 nodes: on the non-issue node Secure Boot was disabled in VMware Boot options (I don't know why and can't trace back). After disabling secure boot on the issued node and reinstalled NB Client the minifilter is loaded correctly by the system. I'm really interessted why just the secure boot option makes a difference while installing the client.

So the first point should be solved now (has to be confirmed this evening). confirmed: it's solved

Jumping to the next point: as mentioned the (incremental) backup takes about 50% more time on Windows Server 2019 than before on Server 2008 R2. Previously about 2hrs for DINCR backup of the approx 1.5M files, now with srv 2019 almost 3hrs. It's currently not a problem, but I want to understand why. Same SAN, same infrastructure, same ESXi, still NTFS, SMB v3 now. Are there NB tuning guides for server 2019? Does it make sense to enable journaling? What consequenses are to be expected? I'm really careful cause I'd read that once enabled it's not reversable.

quebek
Moderator
Moderator
   VIP    Certified

Hi

How you CINC is configured? with archive bit or time stamp? I would changed that to time stamp and start using change journal with incrementals. Look for this in client properties...

Hi quebek,

currently archive bit is used. I'll give it a try with switching to time stamp and compare the times.

What confuses me is the note mentioned here https://www.veritas.com/content/support/en_US/doc/18716246-139298062-0/v40145790-139298062

Use change journal

Note:

This option works together with the Use Accelerator policy attribute and the Accelerator forced rescan schedule attribute.

Does it mean that change journal only works with Accelerator or Acc exclusively with change journal? As I saw Accelerator is only available with an additional Data Protection Optimization License and would not surely improve backup times in all circumstances respectively could only be used in certain circumstances (storage unit limitations etc.), it will not fit for our environment.

quebek
Moderator
Moderator
   VIP    Certified

Hi

What is your destiation of these backups? Can you leverage accelerator? if not just use time stamp - leave the use change journal unchecked...

 

directly to tape.

no, I can't (we dont have the needed license yet).