cancel
Showing results for 
Search instead for 
Did you mean: 

VM Backup Failed

Possible
Level 6
Accredited Certified

Master : 7.5.0.5

Backup Host : 7.1.0.4

My Backup is failing with error code 11.

Transfer Type : NBD

This is paining me from long :(

Job Details : 

 

02/09/2014 20:01:13 - Info nbjm (pid=5461) starting backup job (jobid=93895) for client XXXX, policy AAAA, schedule Daily
02/09/2014 20:01:13 - estimated 0 kbytes needed
02/09/2014 20:01:13 - Info nbjm (pid=5461) started backup (backupid=XXXX_1391994073) job for client XXXX, policy AAAA, schedule Daily on storage unit <>
02/09/2014 20:01:13 - started process bpbrm (pid=16802)
02/09/2014 20:01:14 - Info bpbrm (pid=16802) XXXX is the host to backup data from
02/09/2014 20:01:14 - Info bpbrm (pid=16802) reading file list from client
02/09/2014 20:01:14 - Info bpbrm (pid=16802) starting bpbkar on client
02/09/2014 20:01:14 - connecting
02/09/2014 20:01:14 - connected; connect time: 0:00:00
02/09/2014 20:01:23 - Info bpbkar (pid=3552) Backup started
02/09/2014 20:01:23 - Info bpbrm (pid=16802) bptm pid: 16827
02/09/2014 20:01:24 - Info bptm (pid=16827) start
02/09/2014 20:01:24 - Info bptm (pid=16827) using 262144 data buffer size
02/09/2014 20:01:24 - Info bptm (pid=16827) using 30 data buffers
02/09/2014 20:01:25 - Info bptm (pid=16827) start backup
02/09/2014 20:01:25 - Info bpbkar (pid=3552) CONTINUE BACKUP received.
02/09/2014 20:01:27 - Info bptm (pid=16827) backup child process is pid 16847
02/09/2014 20:01:27 - begin writing
02/09/2014 21:21:58 - Error bpbrm (pid=16802) from client XXXX: ERR - Read of VM file read 0 bytes, should have read 16384 bytes
02/09/2014 21:21:59 - Error bptm (pid=16847) system call failed - Connection reset by peer (at child.c.1298)
02/09/2014 21:21:59 - Error bptm (pid=16847) unable to perform read from client socket, connection may have been broken
02/09/2014 21:21:59 - Error bptm (pid=16827) media manager terminated by parent process
02/09/2014 21:22:05 - Info <Master Server Name> (pid=16827) StorageServer=PureDisk:Master Server Name; Report=PDDO Stats for (Master Server Name): scanned: 42130774 KB, CR sent: 447842 KB, CR sent over FC: 0 KB, dedup: 98.9%
02/09/2014 21:22:05 - Error bpbrm (pid=16802) could not send server status message
02/09/2014 21:22:05 - Critical bpbrm (pid=16802) unexpected termination of client XXXX
02/09/2014 21:22:06 - Info bpbkar (pid=0) done. status: 11: system call failed
02/09/2014 21:22:06 - end writing; write time: 1:20:39
system call failed  (11)
 
 
 
bpfis logs at verbosity 5
 
06:02:56.785 [2404.1116] <2> onlfi_vfms_logf: INF - do_thaw: moref: moref=vm-126
06:02:56.785 [2404.1116] <2> onlfi_vfms_logf: INF - checkBackupRegEntry: looking registry for VClockenable
06:02:56.785 [2404.1116] <2> onlfi_vfms_logf: INF - RegQueryValueEx failed with 2
06:02:56.785 [2404.1116] <2> onlfi_vfms_logf: INF - checkBackupRegEntry: looking registry for VClockinterval
06:02:56.785 [2404.1116] <2> onlfi_vfms_logf: INF - RegQueryValueEx failed with 2
06:02:56.785 [2404.1116] <2> onlfi_vfms_logf: INF - checkBackupRegEntry: looking registry for VClocklimit
06:02:56.785 [2404.1116] <2> onlfi_vfms_logf: INF - RegQueryValueEx failed with 2
06:02:56.801 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshot: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
 
06:02:56.801 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshotAPI: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
 
06:02:58.816 [2404.1116] <2> onlfi_vfms_logf: INF - do_thaw: vmcRemoveSnapshot returned 53
06:02:59.816 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshot: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
 
06:02:59.816 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshotAPI: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
 
06:03:01.832 [2404.1116] <2> onlfi_vfms_logf: INF - do_thaw: vmcRemoveSnapshot returned 53
06:03:02.832 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshot: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
 
06:03:02.832 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshotAPI: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
 
06:03:04.707 [2404.1116] <2> onlfi_vfms_logf: INF - do_thaw: vmcRemoveSnapshot returned 53
06:03:05.707 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshot: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
 
06:03:05.707 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshotAPI: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
 
06:03:07.723 [2404.1116] <2> onlfi_vfms_logf: INF - do_thaw: vmcRemoveSnapshot returned 53
06:03:08.723 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshot: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
 
06:03:08.723 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshotAPI: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
 
06:03:10.738 [2404.1116] <2> onlfi_vfms_logf: INF - do_thaw: vmcRemoveSnapshot returned 53
06:03:11.738 [2404.1116] <2> onlfi_vfms_logf: INF - do_thaw: unlockVM returned -1
06:03:11.738 [2404.1116] <2> onlfi_vfms_logf: INF - dovSphereDisconnect: disconnect for XXXX completed with flags 0 and status 0
06:03:11.738 [2404.1116] <2> onlfi_vfms_logf: INF - dovSphereDisconnect: disconnect for XXXX completed with flags 0 and status 0
06:03:11.738 [2404.1116] <4> onlfi_thaw: INF - do_thaw return value: 11
06:03:11.738 [2404.1116] <32> onlfi_thaw: FTL - VfMS error 11; see following messages:
06:03:11.738 [2404.1116] <32> onlfi_thaw: FTL - Fatal method error was reported
06:03:11.738 [2404.1116] <32> onlfi_thaw: FTL - vfm_thaw: method: VMware, type: FIM, function: VMware_thaw (invalid method return 53; must return VFM_E_METHOD or VFM_E_METHOD_FTL or VFM_E_OK)
06:03:11.738 [2404.1116] <32> onlfi_thaw: FTL - VfMS method error 7; see following message:
06:03:11.738 [2404.1116] <32> onlfi_thaw: FTL - VMware_ctl: Invalid parameter value: cmd=13
06:03:11.738 [2404.1116] <16> bpfis Exit: FTL - snapshot delete returned status 156
06:03:11.754 [2404.1116] <2> ol_cleanup: INF - removing C:\Program Files\Veritas\NetBackup\temp\policyname_filelist
06:03:11.754 [2404.1116] <4> bpfis Exit: INF - EXIT STATUS 156: snapshot error encountered
06:03:11.754 [2404.1116] <2> stop_keep_alive_thread: INF - Stop keep_alive thread 
06:03:13.754 [2404.1116] <2> bpfis Exit: INF - Close of stdout complete

-------------------------------------------------------------------------------------------------------------------

I can see this error..but dont knoe how to got about it

06:03:01.832 [2404.1116] <2> onlfi_vfms_logf: INF - do_thaw: vmcRemoveSnapshot returned 53

06:03:02.832 [2404.1116] <2> onlfi_vfms_logf: INF - vmwareLogger: RemoveSnapshot: SYM_VMC_ERROR:  RETRIEVE_SNAPSHOT_MOR_FAILED
---------------------------------------------------------------------------------------------------------
 
Every time backup writes 48013140 kb and fails...
 
Help is much appricaited..
8 REPLIES 8

SymTerry
Level 6
Employee Accredited

There are a couple things here to look at.

The first thing to really consider is the NetBackup version of your backup host is really low. Would it be possible to get the backup host to the same version of your master server? Some VMWare backup features will not work if your not at the same version. Also bug's have been fixed.

It looks like the VM snap finishes and NetBackup is receiving, or finishes receiving the backup data. The logs are not verbose enough to tell. This could indicate some sort of drop in communication during the transfer of data to the backup host. It could also mean that the backup finished, but VMWare had an issue with deleting the snap : "snapshot delete returned status 156" 

Do you see any errors in the VM events tab in VCenter? Are other VM backups failing from the same server?

Possible
Level 6
Accredited Certified

Thanks Terry for your reply...

Yes there are 3-4 VMs experencing the issue.I will chk vmadmin for snap deletion from VC.

The thing I noticed is used size of all local drives on VM is almost equal to backed up size.

So my assumption is after backup snap deletion is problem.

But I tried for the option "Ignore" in Options tab in the policy where nbu will ignnoe the snap after backup done...Am i correct ?

 

Mark_Solutions
Level 6
Partner Accredited Certified

I think the snapshot errors are a result of the backup failure - you may need your VM admin to delete them (carful though - they have a nasty habit of destroying the VM they came from when doing that!!)

In the policy i would reccomend the setting to be Remove NBU

This job fails after 1 hour and 20 minutes (4800 seconds)

Check your media servers client read timeout values, any keep alive or firewall values to see if you have a match here - if so increase them.

What i see is the job kick off and presumably data being backed up - then suddenly the plug is pulled between the media server and ESX Server - this is what you need to address.

There are also various timeout values that can be configured for VMware backups too - see the Troubleshooting section of the NBU VMware Admin Guide

Possible
Level 6
Accredited Certified

Make sense to me...i chking with VM guys for snap deletion.

Possible
Level 6
Accredited Certified

Below things have been done..still no where near the resolution...

  1.       VM server has been moved to another ESX server for testing.
  2.       All timeout settings from master, media and backup host have been tried out.
  3.       Backups have been tried with different transport methods (nbd, san, hotadd).
  4.       Snapshot creation and deletion tests have been carried out from Vcenter.
  5.       There is enough space for snapshotting on backup host and Vcenter.
  6.       Backup host has been upgraded to 7.5.0.6 from 7.1

How should i go ahed now...its really paining now...

SymTerry
Level 6
Employee Accredited

It looks like we have covered the major issues and  you have addressed them. Since your still having issues then this is something you might want to open a case for. The techs can start a webex and see this first hand. Please let me know if you already have one open.

mrinal_sarkar62
Level 6

Hi,

Check the following...

nbjm and nbproxy return status code 11 when an exception is processed, such
as when nbproxy obtains policy or configuration information.

Examine the nbjm unified log (originator ID 117) or the nbproxy legacy log for more detail
on the cause of the error.

........

Check there is enough virtual memory....

If virtual memory is the problem, turn off unused applications or increase the amount of virtual
memory.

........

 

Possible
Level 6
Accredited Certified

I will check for Mrinal's suggestion and update here.