We have a dedicated Dell R710 server with two quad core X5670's and 60GB of memory. One PCIe 2.0 x8 slot connected to a PERC H830 adapter. Twelve 6GB 12G/sec SAS disks in RAID 50 w/ 5 parity groups (using ~32tb of it for dedup space). A second H700 at x8 connected to 4 ssd drives in RAID 10 for the C drive and 4 ssd drives in RAID 10 for BE. An LTO5 SAS tape drive on an x4 adapter with a few USB drives with legacy b2d jobs on them. I can run somewhere around 10 concurrent jobs at a time without hiccups, i forget how many. It doesn't really cap out the system while backing up network systems no matter how many concurrent jobs we run. 4 gigabit nics just can't cap the system and we settled on 6 concurrent jobs due to the way our network is arranged.
From a performance standpoint, our server has no problem transferring at 300MB/s which is 2-3x the recommended throughput for dedup storage.
I don't think you'd benefit muchg adding more processors to the BE VM. Only the DR disc creation uses more than 2 cores and that can be done from a management workstation.
We do have a problem when we try to backup the dedup storage to tape for offsite storage while other jobs are running. Too much I/O from running jobs, catalogs or verifies causes failures to tape. We have to pause the dedup storage while the tape is running or the tape backup fails. Tape drives are amazingly fast writers. I have a bit more investigation to do on this issue.
As far as running BE in a VM, i'm sure it works but I'll never do that again. A tape drive attached to the VM will inevitably stop functioning or go offline. The entire server has to be restarted to get it back online, This kind of defeats the benefits of a VM and disrupts any of the other VM's on the server. I'm already having some iops issues, in a VM it would be worse.
If you are running 2012R2 and backing up your VM's, if BE is running inside a VM, you can't utilize 2012R2's block level backup feature for incremental backups to deduplicated storage when backing up VM's directly on the server, it is very inefficient to run BE in a VM in that scenario.