03-19-2012 09:16 AM
I am running 7.1.0.3 to backup up my virtual machines. When the policy executes, a majority of systems start and finish successfully, however, I have a few others that are having issues when it comes to creating the snapshot. Instead of a single snapshot for a machine, Vcenter starts trying to create like 5+ snapshots.
The jobs fail in NBU, while the snapshots just sit there.
03-19-2012 10:03 AM
As a start, ensure that bpfis log folder exists on the Backup Host.
This will give us insight into snapshot attempt from NBU point of view.
03-27-2012 07:27 AM
I do have the bpfis log folder created
03-27-2012 07:41 AM
Please post the bpfis log for our review
Thanks
04-02-2012 08:06 AM
do I have to scrub any data from the log before posting?
04-02-2012 08:48 AM
No - best to see it all - add it as an attachment
04-02-2012 12:31 PM
I have attached the bpfis log
04-03-2012 02:02 AM
Without looking into, I would suggest one idea:
Regards
04-03-2012 02:38 AM
Ok - looked at the logs (all 8MB of it!) and it looses connection with vCenter (times out) whilst waiting for the snapshot.
As it has lost connection it is not able to delete an old one as it does not know it exists and retries 9 times before giving up and failing (hence the 9 snapshots)
The issue is likely to be with the client experiencing the problem - maybe VMTools is not up to date, maybe it has Apps installed such as SQL or Exchange so its tools.conf need editing to prevent app quiesce
If the others work OK then look into what is wrong with that client - maybe its own VSS components - check its System and Application event logs
04-03-2012 03:28 AM
these netbackup created snpshots shoud go in next backup run, if you are using default "Remove NBU" snapshot option.
If a virtual machine snapshot exists that a NetBackup backup previously created, NetBackup
removes the old snapshot, creates an updated snapshot, and proceeds with the virtual machine
backup.
04-10-2012 07:05 AM
OK, so a little update.
VMtools are up to date and the systems with the issue are not Exchange or SQL servers.
In testing it appears NBU is timing out and requesting another snapshot. I have increase the timeout to 900 seconds and it is still not enough. If I look in vcenter, the snapshot takes 22 minutes to create. Therefore, when the timeout is reached, NBU requests another and another and another...hence the multiple snapshots. However, this does not explain why the snapshot creation takes 22 minutes. If I manually request a quiesced snapshot from vcenter, the task is complete in under a minute. This is how a majority of my VMs function using NBU for vmware and those backups run and everything is great. It is only a handful of random systems that for whatever reason take an excessive amount of time to create the snapshot, but there is no telling why.
I feel a 30 minute timeout only masks the real issue.
04-11-2012 08:31 AM
Hi, some ideas...
How many vmdks has this VM?
Are this VM stored onto a FC, iSCSI or NFS datastore?
Are you able to successfully backup other similar virtual machines on this same ESX? (are those stored on the same datastore has the one used by the VM exhibiting this issue?)
Have you try backing up this VM directly from the ESX instead of through vCenter? (at least just to see how it works this way!)
Have you installed the Netbackup client in that VM?
Is CBT activated for this VM? (for all vmdks?)
May be answers will help.
Regard
04-11-2012 11:20 AM
The drives are all SCSI and hosted on a SAN. They all seem to have 2 drives/vmdks
From Vcenter I can create a snapshot without issue. Works just fine, it only seems to be an issue when requesting it via NBU.
One thing I have managed to learn is that the issue seems to be related to the quiescing. From vcenter quiesce vm works fine, but not from NBU. If I disable the option to quiesce in NBU policy and kick off the job, it runs without a hitch, making me believe it is a quiesceing issue.
The 3 systems I am currently using to test with all run different applications in the development environment so I would think IO is not that high to the point it over loads the quiesce. Plus, it works fine from vcenter...
Event 1001 : Warning message on COR089YA800 on cor089xk12.us.parker.corp in ha-datacenter: The guest OS has reported an error during quiescing.
04-11-2012 11:52 PM
I would suggest trying to uninstall (reboot) then reinstall vmware tools.
You didn(t answer either or not you have NBU clinet installed in that machine. (if not, may be you could trying installing it to see either or not this change behavior ... if this don't change, just roll back to a conservative restore image snapshot you would have make prior installing the NBU client).
Regards
04-12-2012 08:07 AM
Sorry, NBU 7.1.0.3 client is installed on these systems.
I will attempt to uninstall and install vmware tools.
04-12-2012 10:51 AM
OK, removed VMtools on a system and installed without VSS and the system backed up.
I then tested another system, but did the complete install that includes VSS and it fails, pointing to the issue being VSS. However, this makes me as the question as to why arent all of my systems affected. My current crop of systems with known issues is 3, but I have many more that the backups work fine and those all have VSS installed as well because the complete installation method is what is used when installing vmtools on all systems.
So this would me the issue is VSS and quiescing, but not all systems are affected. I need to test the removal of VSS to see if it makes a difference on this most recent system to be sure.
04-12-2012 01:48 PM
I can confirm the following:
Uninstall, reboot and install without VSS works
Uninstall reboot and install with VSS brings the issue back
04-13-2012 12:59 AM
I'm glad to see that directions I gave help fixing the issue... may be you can mark my post as solution.
Regards
04-13-2012 05:54 AM
Altimate,
I would love to mark your directions as a solution if it actually solved my issue, but as noted above it did not. If VSS is not installed then then quiescing is disabled and I know longer have an application consistent backup. If VSS is installed, the snapshots fail again. Therefore, this doesnt solve anything as I can achieve the same thing by entering
04-13-2012 08:53 AM
Hi,
It come to mind that may be the VSS can't take place due to out of place (no enough free space on disk) to allow fine working of VSS? During your test, is there any intensive I/O or heavy load taking place on that VM? Could it be a lack of memory issue preventing any relevant VSS service to start successfully?
Another idea would be to investigate using Storage Foundation so that to use another provider for the VSS snapshot.
Regards