cancel
Showing results for 
Search instead for 
Did you mean: 

vStorage Backups seem slow to initiate snapshots...

elanmbx
Level 6

I'm curious to know how long it generally takes people to backup of VMware guest with their vStorage-API backups?  My backup window has lately been running over in the morning so I started investigating more closely the time it takes a policy to gather its client list (via VIP), initiate a snapshot via vCenter, take the snapshot, perform the backup, then close-release snap, and then complete.

In my environment it generally takes:

Complete VIP selection: 4m

Initiate snapshot in vCenter: 10m

Snapshot creation in vCenter: 1s

Write snapshot to VTL: 5m

Cleanup snapshot: 1s

Complete time from initation to completion: 20m

What I'd like to try to do is reduce this total time - and the part that I'm *very* curious about is the 2nd step - the job appears to just "hang out" for 10 minutes before the "Create virtual machine snapshot" task shows up in the vCenter console.  How can I reduce this?  Is this a function of how busy vCenter is?  Is there some way to investigate what is causing the hold-up during this 10 minute period?

Thanks for any insights!

-Mike-

1 ACCEPTED SOLUTION

Accepted Solutions

Will_Restore
Level 6

URL http://www.symantec.com/docs/TECH179346

 

bpfis showing the new lock call failing:
17:11:06.552 [6652.6788] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLibVimResolveHostName: Resolved to 192.168.1.2.
17:11:06.552 [6652.6788] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLibVim: VixDiskLibVimLogin
17:11:07.301 [6652.6788] <2> onlfi_vfms_logf: INF - vddkWarnLog: VixDiskLibVim: Not licensed to use this function.
17:11:07.301 [6652.6788] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLibVim: VixDiskLibVimLogout 
17:11:07.488 [6652.6788] <2> onlfi_vfms_logf: INF - lockVM: VixDiskLib_PrepareForAccess returned 16064


 

Contact Support to obtain the binary from ET2652715

View solution in original post

14 REPLIES 14

elanmbx
Level 6

Details on the environment:

Master: SPARC Solaris v10:NetBackup 7.1.0.3

Media/vStorage Server: Win2K8 SP2:NetBackup 7.1.0.3

Policy: VIP:SAN transport

vCenter: 4.0.0 258672

elanmbx
Level 6

Just did a controlled test in our newer (vCenter 4.1.0 258902) vCenter environment:

 

Complete VIP selection: 4m

Initiate snapshot in vCenter: 13m

Snapshot creation in vCenter: 1s

Write snapshot to VTL: 5m15s

Cleanup snapshot: 1s

Complete time from initation to completion: 23m

So the testing is similar in the result.  The majority of the time is taken up apprently in initiating the snapshot with vCenter... which it seems I could somehow burrow into to see why it's taking so long.  The snapshot/backup/cleanup really doesn't take much time at all.

Pretty sure this is why my backup windows have started to run over their scheduled times...

Any help would be greatly appreciated!

Mark_Solutions
Level 6
Partner Accredited Certified

Do you just have the vCenter listed in the Authentication section?

Just thinking of this one: http://www.symantec.com/docs/TECH176896

Or perhaps vCenter has an ESX Server specified that no longer exists casuing this (based on the one line in that tech note)

The only other thing is to ensure any old snapshots have been cleared down from vCenter so that it is clean (I believe the Ignore option in the Snapshot Section helps here too)

I guess the clue to what it is actually doing is in the line "The slow down in the snapshot occurs when NetBackup is performing a query on the cache files for the virtual machines."

Hope this helps?

elanmbx
Level 6

I have a couple vCenter servers (older & newer one) listed in my Credentials section.  I checked the bpfis log and there is no indication of the issue discussed in TECH176896..

Your post has given me some thoughts on other things to check... thanks for the help!

elanmbx
Level 6

Uploading bpfis log from vStorage backup host/media server.  Contains ONLY the log information from the 2nd test I ran this morning... in hopes someone might find some "smoking gun" that might help me in this situation...

elanmbx
Level 6

So this is being written in the bpfis log during the "Initiate snapshot" time I mention above:

12:24:05.128 [6700.7100] <2> send_keep_alive: INF - sending keep alive

12:24:13.084 [6700.6692] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLibVimResolveHostName: Resolving IP address for hostname denvc01.jeppesen.com.
 
12:24:13.084 [6700.6692] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLibVimResolveHostName: Resolved to 169.143.34.70.
 
12:24:13.084 [6700.6692] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLibVim: VixDiskLibVimLogin
 
12:24:13.661 [6700.6692] <2> onlfi_vfms_logf: INF - vddkWarnLog: VixDiskLibVim: Not licensed to use this function.
 
12:24:13.677 [6700.6692] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLibVim: VixDiskLibVimLogout 
 
12:24:13.880 [6700.6692] <2> onlfi_vfms_logf: INF - lockVM: VixDiskLib_PrepareForAccess returned 16064
 
This repeats several times before the snapshot actually starts in vCenter:

12:26:06.810 [6700.7100] <2> send_keep_alive: INF - sending keep alive
12:26:14.703 [6700.6692] <2> onlfi_vfms_logf: INF - lockVM: VixDiskLib_PrepareForAccess failed, error=16064 [The host is not licensed for this feature]
12:26:14.703 [6700.6692] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLib: VixDiskLib_Exit called. Unmatched Init calls so far: 1.
 
12:26:14.703 [6700.6692] <2> onlfi_vfms_logf: INF - vddkLog: diskLibPlugin: 2012-01-31T12:26:14.703-07:00 [06692 trivia 'Default'] Starting to shut down worker thread pool.
 
12:26:15.717 [6700.6692] <2> onlfi_vfms_logf: INF - vddkLog: diskLibPlugin: 2012-01-31T12:26:15.717-07:00 [06692 trivia 'Default'] Worker thread pool is shut down.
 
12:26:15.717 [6700.6692] <2> onlfi_vfms_logf: INF - vSphere_freeze: lockVM failed with 16064
12:26:15.717 [6700.6692] <2> onlfi_vfms_logf: INF - vSphere_freeze: creating snapshot for /vmmor/vm-70023: Created by deninfrap8b on Tue Jan 31 12:16:06 2012
12:26:15.717 [6700.6692] <2> onlfi_vfms_logf: INF - vmwareLogger: CreateSnapshot(1): entered: snapshotName=NBU_SNAPSHOT deninfrap8b 1328037366, vmPath=/vmmor/vm-70023
 
12:26:15.733 [6700.6692] <2> onlfi_vfms_logf: INF - vmwareLogger: ParseDatacenterPath: Path=\(DC)Denv02d - Dev-Test(DC)\vm\Test Dev EText\DENTCONETEXT02, DC path=/Denv02d - Dev-Test/, Container path =/Denv02d - Dev-Test/vm/, Obj type=VirtualMachine
 
12:26:15.733 [6700.6692] <2> onlfi_vfms_logf: INF - vmwareLogger: CreateSnapshot(2): snapshotName=NBU_SNAPSHOT deninfrap8b 1328037366, vmMOR=vm-70023, snapMemory=false, quiesceFileSystem=true
 
12:26:16.513 [6700.6692] <2> onlfi_vfms_logf: INF - vmwareLogger: WaitForTaskCompleteWithTimeout: Waiting for 900 seconds for task to complete
 
12:26:27.043 [6700.6692] <2> onlfi_vfms_logf: INF - vmwareLogger: WaitForTaskComplete: reached Success
 
12:26:27.043 [6700.6692] <2> onlfi_vfms_logf: INF - vmwareLogger: WaitForTaskComplete: returning 0X0
 
 

elanmbx
Level 6

So I dug back in the log - this "send_keep_alive" to "VixDiskLib_PrepareForAccess returned 16064" repeats every minute for about 10 minutes.  I think if I can crack this part I can figure out why the 10 minute delay is occurring.

elanmbx
Level 6

Eureka!  7.1.0.3 issue!

http://www.symantec.com/business/support/index?page=content&id=TECH179346

I'll make the call to support.  The solution, apparently:

 

FIX:
Contact Support to obtain the binary from ET2652715

Issue #1 will be addressed after adding the following registry and installing the binary:
The following registry key must be created on the VMware Backup Host(s):

HKEY_LOCAL_MACHINE\SOFTWARE\VERITAS\NetBackup\CurrentVersion\Config\BACKUP\disableIPResolution

- Open the registry editor and drill down to "HKEY_LOCAL_MACHINE\SOFTWARE\VERITAS\NetBackup\CurrentVersion\Config" 
- If needed, right click on "Config" and create a new key called "BACKUP"  (all upper case) 
- Right click on the new key "BACKUP" and create a new string value called "disableIPResolution"  (case sensitive) 
- You should now have the registry path indicated above.

Issue #2 will be addressed by just installing the binary.

Will_Restore
Level 6

URL http://www.symantec.com/docs/TECH179346

 

bpfis showing the new lock call failing:
17:11:06.552 [6652.6788] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLibVimResolveHostName: Resolved to 192.168.1.2.
17:11:06.552 [6652.6788] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLibVim: VixDiskLibVimLogin
17:11:07.301 [6652.6788] <2> onlfi_vfms_logf: INF - vddkWarnLog: VixDiskLibVim: Not licensed to use this function.
17:11:07.301 [6652.6788] <2> onlfi_vfms_logf: INF - vddkLog: VixDiskLibVim: VixDiskLibVimLogout 
17:11:07.488 [6652.6788] <2> onlfi_vfms_logf: INF - lockVM: VixDiskLib_PrepareForAccess returned 16064


 

Contact Support to obtain the binary from ET2652715

elanmbx
Level 6

Well done - we must have found it at precisely the same time.  I think this should be the solution!  Will confirm once I have the fix installed.

GovJames
Level 2

Mike,

How are your backups running after installing the fix and creating the registry keys?  I am debating on installing the fix.

Thanks

James

bcblake
Level 4
Partner

We applied the EEB but that didn't fix it.

I don't think we applied the registry hack, so I'll try that next but wanted to see if anyone else had the problem even after applying the EEB.

Thanks

elanmbx
Level 6

James - the issue *immediately* went away upon applying the fix.  My vStorage backups are back to finishing up in 6 hours instead of failing spectacularly after 14 hours.  laugh

bcblake
Level 4
Partner

Turns out our problem was related to permissions. The service account used was configured with a role at the Data Center level, as per one of the permissions technotes:

http://www.symantec.com/docs/TECH145949

However there is another technote with similar information:

http://www.symantec.com/docs/TECH130493

Well, the first one mentions the account could be given permissions at either Data Center or vCenter level and I think is worded poorly. However the second technote specifically states applying the permissions at the vCenter level.

We applied the permission at the vCenter level and the 10 minute timeout disappeared. So it looks like the "licensing" issue is a red herring.