cancel
Showing results for 
Search instead for 
Did you mean: 

Using NBU for VMware

smakovits
Level 6

I am running 7.1.0.3 to backup up my virtual machines.  When the policy executes, a majority of systems start and finish successfully, however, I have a few others that are having issues when it comes to creating the snapshot.  Instead of a single snapshot for a machine, Vcenter starts trying to create like 5+ snapshots.

The jobs fail in NBU, while the snapshots just sit there.

48 REPLIES 48

smakovits
Level 6

A standard client side backup works, so Microsoft VSS is used there I believe and that works.

 

There is memory available and the 2 drives have several GB free.

Altimate1
Level 6
Partner Accredited

ok, are the 2 vmdks located on a same datastore (or onto 2 different ones) ?

If not, I suggest trying to migrate all vmdks onto a unique datastore before trying again.

BTW: is there any other snapshots defined for this vm? (if yes, try after deleting all).

If using vSphere 4.x have you enabled CBT? (if yes, poweroff and try again  after disabling it).

Another try I would finally suggest is to add 2 new vmdks to that VM and use SSR/BESR to transfer the 2 disk to new vmdks (Boot the vm onto SSR/BESR recovery CD to do this).

Regards

Altimate1
Level 6
Partner Accredited

Searching for information I found this interesting information :

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1009558

I understand that even w/o VSS, Vmware backup is safe becaure snapshot is taken at virtual machine level (all disks at the same time).

smakovits
Level 6

both VMDKs are on the same datastore.

 

All old snapshots have been removed.

We are on ESXi 5

As far as moving the disks to new ones, this is not much of an option because this is happening on a bunch of VMs and it does not seem to be a viable solution.

Anonymous
Not applicable

Some more questions to ask...

Are the bunch of VM's the same OS/bit level?

Were they born out of the same VM template?

Are the problem VM's a P2V?

 

smakovits
Level 6

They are all 2008 R2

As for the same VM template I cannot say, initial thought is no.

They were not P2V as far as I know.

Altimate1
Level 6
Partner Accredited

I would also suggest removing the NetBackup Client from one on those faulty Vms then try again backing up to see either or not the snapshot still took a so long time to complete. Would it be possible that the time to take the snapshot be affectec by NBU client interaction? I understand that removing nbu client prevent GRT at application level but this is mainly to have some additional infos.

smakovits
Level 6

I have disabled application quiescing on some other troubled systems as well and the backup completes without issue.

Altimate1
Level 6
Partner Accredited

Hi,

Could you give us vmdk sizes for one of faulty Vms?

Regards

smakovits
Level 6

I dont have the exact sizes becasue I do not have access to the data store, but the total provisioned space is 169.33GB and the used space is 79.36GB.

 

The one thing I want to try but it does not seem to work is just increase the timeout value before NBU requests another snapshot.

 

To this point, what I see happening on these troubled VMs is that the quiesced snapshot takes an excessively long time for some unknown reason.  If I power off the VM, the snapshot fires off without issue.  If it is on, NBU requests the snapshot and it takes anywhere from 15 to 30 minutes for the snapshot to complete.  According to vcenter the snapshots do create, but for whatever the reason, they just take a super long time.  For this reason, I want to tell NBU to just wait before it creates the next snapshot.

 

I used this following:

http://www.symantec.com/business/support/index?page=content&id=TECH136319

I used the recommended 7200, but snapshot requests are still made every 5 minutes, ignoring the timeout value.

smakovits
Level 6

From the log

 

 

09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - vmwareLogger: SetDebugLogLevel: m_PowerTimeout    = 900 secs
 
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - vmwareLogger: SetDebugLogLevel: m_JobTimeout      = 900 secs
 
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - vmwareLogger: SetDebugLogLevel: m_SnapshotTimeout = 7200 secs
 
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - vmwareLogger: SetDebugLogLevel: m_RegisterTimeout = 180 secs
 
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - vmwareLogger: SetDebugLogLevel: m_BrowseTimeout   = 180 secs
 
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - vmwareLogger: SetDebugLogLevel: m_ConnectTimeout  = 0 secs
 
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - vmwareLogger: SetDebugLogLevel: m_RefreshTimeout  = 4 secs
 
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - vmwareLogger: SetDebugLogLevel: LogLevel          = 6
 
--------------------------------------------
 
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - checkBackupRegEntry: snapshottimeout found 7200
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - SetTimeoutsAndLogging: timeout 7200 (secs) overrides default 900 (secs)
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - checkBackupRegEntry: looking registry for registertimeout
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - RegQueryValueEx failed with 2
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - checkBackupRegEntry: looking registry for browsetimeout
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - RegQueryValueEx failed with 2
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - SetTimeoutsAndLogging: JobTimeout: 900 secs
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - SetTimeoutsAndLogging: PowerOpTimeout: 900 secs
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - SetTimeoutsAndLogging: SnapshotTimeout: 7200 secs
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - SetTimeoutsAndLogging: RegisterTimeout: 180 secs
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - SetTimeoutsAndLogging: BrowseTimeout: 180 secs
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - checkBackupRegEntry: looking registry for vmcloglevel
09:45:47.973 [5328.4972] <2> onlfi_vfms_logf: INF - checkBackupRegEntry: vmcloglevel found 6

 

Altimate1
Level 6
Partner Accredited

I think this is definitivelly not related to disk space. Therefore, I would suggest investigating running applications. Wouldn't it be possible that some are not VSS aware?

Regards

smakovits
Level 6

I dont feel it is application related simply because it is a random few servers and the applications they run vary immensely.  One is simply part of a web cluster where the other servers all work fine.

 

I need to find a way to delay the snapshot retry since all settings I change dont seem to change the behavior.

Marc-Andre_Dupu
Level 3
Partner Accredited Certified

smakovits
Level 6

Thanks Marc, but currently this is 100% on VMware.  The fact that the issue can be reproduced from Vcenter independent of NBU forced me to open a case with them.  After several weeks of troubleshooting it is discovered that the VM gets the call for the snapshot and then just sits there.  After 9 minutes, it actually takes the snapshot which takes the same amount of time as a normal machine, totalling the typical 10-11 minute snapshot time I am seeing.

 

Therefore, VMware sees this as an issue on their end, but they do not know why and we are working with the engineering team 1 step away from the developers.  Once I have resolve I will surely report my findings.

smakovits
Level 6

This case has been moved to the vmware development team for the purposes of debugging in order to try and find a solution.

Mark_Solutions
Level 6
Partner Accredited Certified

Just catching up on things so sorry for the late reply ...

Reading the threa dagain you say you have increase the timeout to 900 - which is 15 minutes, but that the snapshot takes over 20 minutes - so it may still be timing out.

Not sure where you set the timeout but it should be on the VMWare backup host (Media Server?) and is set at :

HKLM\Software\Veritas\NetABckup\CurrentVersion\Config\BACKUP\

a dword named snapshottimeout and a decimal value of 1800 will give you 30 minutes for the snapshot to complete

Hope this helps and that I have not missed something you have already done / said

smakovits
Level 6

Thanks Mark.  I have made this change, but NBU still only sees it as 5 minutes, meaning even though I tell it to wait 30, NBU calls for a new snapshot every 5 minutes.  No idea why.

Mark_Solutions
Level 6
Partner Accredited Certified

Sorry for the very late reply - it may be worth also adding a string value in the same loaction named displayNameEnableIP

As you are on 7.1.0.3 there are issues - see this tech note:

http://www.symantec.com/business/support/index?page=content&pmv=print&impressions=&viewlocale=&id=TE...

Worth egtting any VMWare rollup packages orr maybe patch to 7.1.0.4?

Hope this helps

smakovits
Level 6

Strangely enough, I have had 7.1.0.4 installed on the media server, but the snapshot would still sometimes hang for 10 minutes while waiting on the lock.  Not sure why.