We have recently setup a Virtual Machine 8.2 Netbackup media Server to backup our datacentre's VMs using HotAdd transport mode. The media Server has a HBA card passing through to be zoned with a tape library so that its data is written directly to our LTO-8 tape library during its backup.
The media Server works works very well if we cap the concurrent jobs to be only 4. This way, it could achieve very fast throughput, i.e. 150 MB/Sec with very few failed jobs.
However, as soon as we increase the concurrent job numbers to be greater than 4, lots of jobs would start failing with 6/41/89 error codes. In fact, the more concurrent jobs we increase up to, the more jobs it would fail with 6/41/89.
It seems to me, it only works OK (with few 6/41 errors) if we keep the concurrent jobs to be 4, but of course, the problem is, with this setting, it would take forever to finish all of our VMs backup.
The Netbackup engineer and I have applied the latest VDDK (6.7.3), added the SCSI cards up to 4 (to the VM media Server itself), changed the SCSI type to be VMware Paravirtual, but none of those has solved the issue.
I know there must be something I have overlooked, but not sure what I have missed...
The below is the configuration of this VM media Server
The below is our VMware environment
So, I'm wondering if someone can shed some light on this ?
And I did observe that, during the backup, the VMware proxy Server, which is also the media Server, was always busy scanning SCSI cards to mount/unmount vmdk disks...
As a result, the disk management would be very often freezing...
Then I'm just thinking, if the rescanning SCSI is process, then it would cause problems to a backup job which is reading data from a mounted vmdk disk ?
If so, then, if this rescanning process happens very often, i.e. the more concurrent jobs, the more often it needs to rescan SCSI cards, it would certainly have more 41 and 6 errors ?