HotAdd failed with lots of 6/41/89 errors

KelvinHBLi · ‎10-06-2021

Hi,

We have recently setup a Virtual Machine 8.2 Netbackup media Server to backup our datacentre's VMs using HotAdd transport mode. The media Server has a HBA card passing through to be zoned with a tape library so that its data is written directly to our LTO-8 tape library during its backup.

The media Server works works very well if we cap the concurrent jobs to be only 4. This way, it could achieve very fast throughput, i.e. 150 MB/Sec with very few failed jobs.

However, as soon as we increase the concurrent job numbers to be greater than 4, lots of jobs would start failing with 6/41/89 error codes. In fact, the more concurrent jobs we increase up to, the more jobs it would fail with 6/41/89.

It seems to me, it only works OK (with few 6/41 errors) if we keep the concurrent jobs to be 4, but of course, the problem is, with this setting, it would take forever to finish all of our VMs backup.

The Netbackup engineer and I have applied the latest VDDK (6.7.3), added the SCSI cards up to 4 (to the VM media Server itself), changed the SCSI type to be VMware Paravirtual, but none of those has solved the issue.

I know there must be something I have overlooked, but not sure what I have missed...

The below is the configuration of this VM media Server

Netbackup version: 8.2
OS: Windows 2019
Memory: 32 GB
CPU: 16
HBA: passing through to be zoned with a LTO-8 tape library
SCSI cards: 4
SCSI type: VMware Paravirtual

The below is our VMware environment

vSphere version: 6.7
The VM media Server and all the VMs to be backed up are within the same data centre
Total VMs that need to be backed up: 500
The version of ESXi hosting this VM media Server: 6.7

So, I'm wondering if someone can shed some light on this ?

Many thanks,

Kelvin

KelvinHBLi · ‎10-06-2021

And I did observe that, during the backup, the VMware proxy Server, which is also the media Server, was always busy scanning SCSI cards to mount/unmount vmdk disks...

As a result, the disk management would be very often freezing...

Then I'm just thinking, if the rescanning SCSI is process, then it would cause problems to a backup job which is reading data from a mounted vmdk disk ?

If so, then, if this rescanning process happens very often, i.e. the more concurrent jobs, the more often it needs to rescan SCSI cards, it would certainly have more 41 and 6 errors ?

VOX

HotAdd failed with lots of 6/41/89 errors