I have a number of customers who run VMWare infrastructures and have backupexec in place to perform their backups with varying degrees of success, however, I've discovered an issue which is of some concern. I have found backupexec is frequently leaving snapshots behind when backup jobs fail, as such the VMs continue running on the snapshots and I'm having to go and manually consolidate them. This issue is happening in the following scenarios (that I've found so far):
Physical Backup Server (Win2k3, all patches installed) + BE2010 R3 > AVVI Backup of Esxi 4 with all patches installed, VMs running 2k3 and 2k8r2
Physical Backup Server (Win2k3, all patches installed) + BE2012 > AVVI Backup of Esxi 4 with all patches installed, VMs running 2k3 and 2k8r2
Physical Backup Server (Win2k3, all patches installed) + BE2012 > AVVI Backup of Esxi 5.1 with patches installed, VMs running linux and 2k3.
I appreciate there are some 'quality issues' with 2012 and Vmware as per a symantec blog post, however, I'm just trying to perform non-grt backups of a linux machine as I can accept not having granular restores available for the time being, but my primary concern is the snapshots frequently being left behind, the BE media services aren't crashing, just the job is failing and BE not instructing vsphere to consolidate the snapshot. This issue was first discovered on Wednesday morning when a customer lost access to their most important database as the datastore had filled up after a housekeeping process which is scheduled to run after the backup kicked in, causing the delta to grow rapidly. Consequently my customer lost 6 hours of their day whilst I consolidated snapshots and got the machine back up and running. Along with this the backup had failed, meaning if i'd had any corruption issues I would have had to roll them back another day for a working backup. Presently I'm having to check for snapshots being left behind and consolidate them manually, which I've done 3 times yesterday and twice this morning. First off, am I doing something wrong? The configuration isn't particularly complex and I can't see any obvious options I'm missing that would lead to this scenario, but I'm all ears if someone has a pointer or two. Secondly, if it is a known issue going as far back as 2010 and esxi 4, is there a patch scheduled to deal with this?
I accept if the media server crashes then it wouldn't be around to make an api call to vmware to perform the consolidation, so in this situation I know to go and tidy up manually, but obviously if the services aren't failing and are leaving things behind this is a nightmare, causing me to spend more time making checks and working around the product, not ideal as I sell Backupexec to my customers under the theory that it should reduce their support bills by being a superior product. Sadly, at present, I'm finding I spend more time supporting BE than I did NTBackup or Server Backup built into Windows, after convinving them to shell out ~£1k on software to improve reliability and reduce support.
I understand this is really frustrating being have tried out 2010, 2012, ESX 4 and 5.1. I hope you have applied the Beta Service pack to enable support for ESX 5.1
You could try the steps mentioned in the technote below. They may help in the removal and cleanup of orphaned snapshots
If the job is cancelled or fails without a process crash then the command to remove the snapshot from VMware is still sent from Backup Exec to the VMware API (programming interface) so the snapshot should still be removed.
If beremote on the media server is restarted or crashes then the snapshot will get left behind because beremote is responsible for requesting the removal.
The information in Tech 200709 is an attempt to mininize the effects of any process crashes and/or any timeouts requesting snapshot removal within the VMware environment. Howevere we cannot guarantee that issues within the VMware environment itself will still mean the snapshot remains on the system as some of the causes are outside of any control that Symantec has over the snaphsots