cancel
Showing results for 
Search instead for 
Did you mean: 

VMware Policy backup issues (7.5.0.3)

Yair
Level 4

Hi Guys how are you doing?

 

My environment is as follows:

Master & Media server (same server): Windows 2003 64 bit

Netbackup 7.5.0.3

 

VMware environment is:

Entire environment is version ESXi 5

 

1 Virtual Center Server

4 ESX servers

 

i have a few issues so i'll state them all:

 

1) when back up of a VMware client starts.

the job splits into 2:

Snapshot and Backup.

 

The backup child job finishes, but the Snapshot job stays active for hours(!) after the backup has finished. (this does not happen on all machines)

 

2) when "Browsing for Virtual Machines" in the VMware policy and choosing "Browse and select Virtual Machine"

i need to wait for about 5-8 minutes while it's "Loading Virtual Machien list"

and sometimes it gives me the following errors:

failed to get vm server info list request has timed out(195)

 

but sometimes it loads the entire VM environment tree, after a few minutes (5-8 minutes) it show me the entire tree of the VM environment

from VC Server all the way to the machine level.

 

but this is very inconsistant. takes long time and sometimes fail as mentioned above.

 

3) also, sometime i get failed jobs for VM machines with error status 156.

and it will not back itself up until i go into the VMware policy and "browse for virtual machines"

once it finishes browsing i'll close it and restart the job and it will work.

 

 

anyone experienced this before?

8 REPLIES 8

MilesVScott
Level 6
Certified

1)  The additional time for the snapshot job is most likely due to the cleanup process. If you look at how vmware created snapshots and how it rolls the change disk back into the golden origional vmdk it will explain a lot of it.

 

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180

Mark_Solutions
Level 6
Partner Accredited Certified

The browse depends on when it was last done as it needs a refresh after so many hours (8 i think?)

The backups and snapshots can be affected by how many you run at the same time - that is why they introduced the Resource Limits (Master Server Host Properties) so that you do not over stretch you VMWare environment which can also lead to the 156 errors which are often actually timeout related.

More information can be gathered from the various logging to see where things are going wrong - enhanced in 7.5 as vCenter also now logs for you - see the VMWare Admin Guide

Hope this helps

Yair
Level 4

MilesVScott:

OK, but how can i speed up the time of the cleanup process? is there anything i can do about it?

 

Mark_Solutions:

Few months ago i only had the ESX servers configured without the VC and when i browsed for machines it worked perfectly fine. ever since i added the VC and started browsing through it, it's acting slow.

My VMware job is limited to 2 jobs at a time.

i already experienced job overload on my flesh :) so we quickly set it down to 2 jobs at a time and it's running fine.

the snapshot error 156 only starts acting normal again when i browse for virtual machines on the policy,

then i think it knows where the virtual machines are currently and have no problem backing them up, because maybe the vMotion moved them to a different ESX and that caused the snapshot error.

Yair
Level 4

Hi Guys, just a quick update:

I managed to solve the slow browsing/timeout when browsing for VM machines list by using this method:

https://www-secure.symantec.com/connect/forums/recently-upgraded-75-and-now-vm-browsing-doesnt-work

 

 

i will update this thread if there's anything new!

 

thanks.

Anonymous
Not applicable

How can you speed up the time of the cleanup. You refer to the consolidation of snapshots into the 'golden image'

This is really down to how much has been written into the delta disks during the time of the snapshot.

If you have a intensive application like MS SQL on that system there could have been many transactions happened between create snapshot and release snapshot. These have to be rolled in.

So unless you got faster storage or increased other resources, there is little much you can do to increase the speed other than the factors above which descibe how large or little data needs to be committed/consolidated.

-----

To force the update of the virtual machine list you can run a command, (whether this is recommended, not sure, as Mark says there is a regular cycle) BUT here is the discussion

https://www-secure.symantec.com/connect/forums/how-create-vcbnames-manually-netbackup-701

EDIT: Official Technote  How to refresh the Virtual Machine cache list in NetBackup 7

For the Virtual Machines with a 156 error I would look at the Tasks/Events for those VM's in vsphere client. Just rule out that DRS is not vmotion'ing them during backup due to resource issue on your ESX box. Thats not to say vmotion during a backup isnt supported... but might bring in to play a timeout issue.

 

Mark_Solutions
Level 6
Partner Accredited Certified

If you think vMotion is causing issues then on the clients tab of the policy at the bottom is the option to change the frequency of how often the query list is retained - perhaps reducing that (to 0 if you wish) would help you so it refreshes each time.

That would also help if a server was powered on between the query being run and the policy running

Hope this helps

MilesVScott
Level 6
Certified

Mark_Solutions:

Thank you so much for that tip about Resourse Limits. You just helped me fix a ton of issues without physically changing scheduling as I was considering.

 

As far as speeding up the consolidation process Stuart Green hit the nail on the head. You could possibly choose a time where less changes are being made, however I suspect you already have picked the best time possible as this is typically the main driver for scheduling.

Mark_Solutions
Level 6
Partner Accredited Certified

Welcome as always Miles ...