my 2 cents...
.
Recap – the general styles of backup are:
A) Plain file system agent ONLY (skipping raw database files - but securing a copy of dump/export/datapump/RMAN-on-disk).
B) Plain file system agent PLUS database agent (skipping raw database files).
C) VM style, either “whole VM with no file cataloguing” or “whole VM with file cataloguing” ONLY.
D) VM style, either “whole VM with no file cataloguing” or “whole VM with file cataloguing” PLUS leveraged database agent.
(although there are others - e.g. flashbackup, and other complex snapshot based technologies)...
.
The rest of my text is pertinent to item C) above.
.
The issues with backup style C) above, for VM style backups of medium to large database servers… are:
- This captures what is highly likely to always be an incomplete snapshot of the live open raw database files within folders inside the virtual guest server (i.e. the effectice backup client). The backup image will contain raw database files which will be crash consistent only, and will very likely be unusable/corrupt when restored.
- In practice, some databases can be fairly large, and so this style of backup (i.e. VM style) can cause a significant amount of unnecessary IO at the SAN and/or LAN layers by reading blocks (from within VMDK/VHD files), transferring these blocks, and writing these blocks (to backup storage)… which will very likely by useless after restore.
- Causes unnecessary backup IO at the storage layer, i.e. between ESX/Hyper-V hypervisor hosts and the storage (SAN, iSCSI, NFS) - i.e. wastimg time and resource reading and sending and writing the blocks of files which will be useless when restored.
- Causes unnecessary snapshot IO…
- for VMware at the VMDK layer – the IO impact cost “during” backups is low because all updated/new blocks within the database will be being ‘stored’ (written to a write pending log) during VMware VADP style backups (i.e. an extra read is NOT taking place).
- for Hyper-V at the VHD layer – the IO impact cost “during” backups is high because all updated/new blocks within the database will be being ‘vectored’ (any prior blocks must be read before written to a VSS delta change log) during Hyper-V VSS style backups (i.e. an extra read IS taking place).
- Causes unnecessary ESX/Hyper-V host CPU and IO at the VMDK/VHD layer for:
- for VMware the IO cost penalty is at the end of the backup… for consolidation when the VMware snapshot is removed, i.e. the full list of pending writes (which were queued up during the backup) must now be applied for real to the VMware VMDK files.
- for Hyper-V the IO cost penalty is during backups… to capture the VSS delta change log, with minimal IO at then of the backup when the VSS snapshot is imply discarded/deleted - i.e. there is no need to re-process the VSS "delta change log" because all it contains is the old blocks as the used to look like before the writes (which occured during the backup) took place.
- The busier the database the bigger the snapshot gets during backups.
- The bigger the VM is, the longer the backup takes, and so the longer the snapshot is active, and the longer it has to capture IO, and so the bigger the snapshot gets. Nasty.
- Causes unnecessary IO at the LAN and/or SAN layer (between ESX/Hyper-V hosts and the backup host), i.e. sending lots of blocks which are essentially useless.
- Causes unnecessarily lengthy backup job duration which could be freed up for backup scheduling resources for other backups.
- Causes wasted backup target storage:
- If tape, then wasted tape, and if your tape is FC SAN and SAN switches, then this is effectively wasted traffic across FC SAN switch ports.
- If basic disk, or advanced disk – then wasted disk space - again also possibly across some kind of storage connectivity layer.
- If de-dupe disk - then both wasted disk space AND wasted time, CPU, RAM, disk for “fingerprint hashing” and “fingerprint storage and recall”.
- …and it is highly likely that the “changed” blocks inside “raw database file blocks” will nearly always appear to be unique to de-dupe… and so this problem is a significant problem, i.e. lots of de-dupe activity for something which is essentially useless.
.
Having said all that. Then, if your VMs contain small non-volatile databases (which are doing their own dumps/exports/data-pumps/RMAN)... AND you are taking your VM style backup AFTER these database application dumps/exports/datapumps/RMAN have occured, then there really is no problem doing VM style backups of said small virtual database servers - as long as you remember to delete/discard the restored raw database file before attempting recovery from the dump/export/datapump/RMAN backup which should have also been captured within the VM style backup (as long as the backup ran after said dumps had completed).
In summary, the problem gets worse as the virtualised database servers get both larger and/or more volatile - with posisbly huge amounts of wasted IO, and resource, at multiple and various different times and stages, for files which will very likely be useless upon restore. In which case, it really does become a very good idea indeed to use either backup type A) or backup type B) above... or a backup of type D) above (but even type "D" above will still accrue lots of "snapshot related" IO). The only way to avoid VM level snapshot files growing very large during backups is to use backup type A) or B) above.