Well, over the weekend I

D_Flood · ‎10-27-2014

Master: W2k8 running 7.6.0.2

Data Store / Fibre Host / VM Access: 5230 running 2.6.0.2

VMWare is 5.0 and VM tools are up to date

All other smaller VM's attempted so far work just fine over fibre

The problem child is a 3.5 TB W2k8 box running SQL 2008 and client 7.5.0.7

On the VMWare side it's throwing a quiecing error saying to check the Event Logs. But there's nothing in the Event Logs. I did find this in the BPFIS log but can't seem to find this exact error in any of the KBs.

06:01:17.252 [140304] <2> bpfis main: received FIM as [345] VMware_v2:Virtual_machine_backup=2,disable_quiesce=0,drive_selection=0,enable_vCloud=0,exclude_swap=1,file_system_optimization=1,ignore_irvm=1,multi_org=0,nameuse=3,post_events=1,rHz=10,rLim=10,rTO=0,skipnodisk=0,snapact=2,trantype=san,vsrvr=xxxxxxxx,uuid=xxxxxxxxxxxxxx,nip=(xxxxxxxxxxx)(xxxxxxxxxxxx),nameuse=3
06:01:17.253 [140304] <4> bpfis: Starting keep alive process.
06:01:17.253 [140304] <4> bpfis: Keep alive process id<140305>.
06:01:17.253 [140304] <4> bpfis: Stoping keep alive process. pid<140305>.
06:01:17.254 [140304] <4> bpfis: Child proceess pid <140305> terminated.
06:01:17.254 [140304] <4> bpfis: INF - BACKUP START 140304
06:01:17.256 [140304] <2> bpfis main: receive filelist:<NEW_STREAM>
06:01:17.256 [140304] <2> check_special_names: got path entry as :<NEW_STREAM>
06:01:17.256 [140304] <2> check_special_names: after conversion returning :<NEW_STREAM>
06:01:17.256 [140304] <2> bpfis main: receive filelist:<ALL_LOCAL_DRIVES>
06:01:17.256 [140304] <2> check_special_names: got path entry as :<ALL_LOCAL_DRIVES>
06:01:17.256 [140304] <2> check_special_names: after conversion returning :<ALL_LOCAL_DRIVES>
06:01:17.257 [140304] <2> bpfis main: receive filelist:<CONTINUE>
06:01:17.257 [140304] <2> check_special_names: got path entry as :<CONTINUE>
06:01:17.257 [140304] <2> check_special_names: after conversion returning :<CONTINUE>
06:01:17.423 [140304] <2> vnet_pbxConnect: pbxConnectEx Succeeded
06:01:17.424 [140304] <2> logconnections: BPRD CONNECT FROM xxxxxxxxxxx TO xxxxxxxxx fd = 4
06:01:17.667 [140304] <4> bpfis: INF - FIS_ID=xxxxxxxxxxxxxxxxxxxx
06:01:17.667 [140304] <4> bpfis: INF - Freezing ALL_LOCAL_DRIVES using snapshot method VMware_v2.
06:01:17.667 [140304] <4> bpfis: INF - ACT=xxxxxxxxxxxxxxx
06:01:17.668 [140304] <4> bpfis: INF - Created mount point /tmp/_vrts_frzn_img_ALL_LOCAL_DRIVES_140304_1
06:01:17.673 [140304] <2> vnet_pbxConnect: pbxConnectEx Succeeded
06:01:17.673 [140304] <2> logconnections: BPRD CONNECT FROM xxxxxxxxxxx TO xxxxxxxxxx fd = 4
06:03:31.126 [140304] <4> bpfis: INF - Deleted mount point /tmp/_vrts_frzn_img_ALL_LOCAL_DRIVES_140304_1

06:03:31.126 [140304] <32> bpfis: FTL - VfMS error 11; see following messages:
06:03:31.126 [140304] <32> bpfis: FTL - Fatal method error was reported
06:03:31.126 [140304] <32> bpfis: FTL - vfm_freeze: method: VMware_v2, type: FIM, function: VMware_v2_freeze
06:03:31.126 [140304] <32> bpfis: FTL - VfMS method error 0; see following message:
06:03:31.126 [140304] <32> bpfis: FTL -
06:03:31.126 [140304] <32> bpfis: FTL - VfMS error 11; see following messages:
06:03:31.126 [140304] <32> bpfis: FTL - Fatal method error was reported
06:03:31.126 [140304] <32> bpfis: FTL - vfm_freeze: method: VMware_v2, type: FIM, function: VMware_v2_freeze
06:03:31.126 [140304] <32> bpfis: FTL - VfMS method error 0; see following message:
06:03:31.126 [140304] <32> bpfis: FTL -
06:03:31.126 [140304] <16> bpfis: FTL - snapshot creation failed, status 156
06:03:31.126 [140304] <4> bpfis: INF - Thawing ALL_LOCAL_DRIVES using snapshot method VMware_v2.
06:03:31.126 [140304] <8> bpfis: WRN - ALL_LOCAL_DRIVES is not frozen
06:03:31.126 [140304] <8> bpfis: WRN - snapshot delete returned status 20

06:03:31.167 [140304] <4> bpfis: Starting keep alive process.
06:03:31.271 [140304] <4> bpfis: Keep alive process id<141289>.
06:03:31.271 [140304] <4> bpfis: Stoping keep alive process. pid<141289>.
06:03:31.275 [140304] <4> bpfis: Child proceess pid <141289> terminated.

So...where do I go from here?

Marianne · ‎10-27-2014

Any directly attached volumes (RDM) on this Windows VM?

If so, you will need to install NBU client in this host and either do normal client backup over the network, or see if Snapshot Client backup can be used here. Check Compatibility Guides and documentation.

See Backup of VMware raw devices (RDM) in Appendix D of NetBackup for VMware Administrator's Guide

If no RDMs, you will need VxMS log.

PS:
What happens if you try to take a snapshot of this host from vSphere Client?

Handy NetBackup Links

D_Flood · ‎10-28-2014

All volumes are virtual and I believe that recently a snapshot was created by VMWare Support while they were working on a different problem but I don't know if it included quiesse...

We do have a client installed since we'll be taking data protection SQL backups a lot more often then VM backups. And if I can't get a VM backup working then they may just have to live with filesystem backups (despite the fact that it's impossible to recover full systems from one of them these days)....

I did log on and checked the VSS writers, all reported ok.

One thing that I did leave out of the original description is that the Application Aware step (for SQL) does finish just fine, it's the snapshot that isn't. Which may be another pointer to what Michael brought up: not enough shadow space...

I'm not sure I have access to the SQL logs but I'll check...and I'll try to scare up the VxMS log...

Pangal · ‎10-28-2014

While implementation in my environment I googled many things about VMware backup and I read about backup failing for vmdk larger than 2 tb. Hope this forum will help you https://www-secure.symantec.com/connect/forums/2-tb-vm-backup-failure

Michael_G_Ander · ‎10-29-2014

Can you do a snapshot with quiesce file system of the VM from vsphere ?

As it is a Windows machine have you checked VSS writers state with vssadmin list writers ?

Also check the SQL logs as sometimes the actually problem is in the application

Have very often seen that the problem was to small a shadow storage area, and it does not always show up in the event logs under the VSS and Volsnap entries

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

D_Flood · ‎11-03-2014

Well, over the weekend I tried enabling the VxMS log...but nothing got logged. So either the directions are different for an Appliance as VM host or I missed a step.

Anyway...the long and short is that the Application Owners don't want to mess with it any more and they're happy with just filesystem backups + SQL backups. Yes, I know that means that we can't recover the server if it goes away but they're the owners, not me.

Marianne · ‎11-03-2014

Well, over the weekend I tried enabling the VxMS log...but nothing got logged.

VxMS logging for 7.6 is totally different to previous versions... Where did you create the log folder?

One more thing - I had the same issue at a customer last week. Nothing logged in VxMS.
It turned out that credentials have changed and this user did not have sufficient rights to create snapshots.

Handy NetBackup Links

D_Flood · ‎11-03-2014

I created the dir in /usr/openv/netbackup/logs, edited the bp.conf on the Appliance to set the vx logging to 1, and bounced the services on the Appliance.

As for permissions, I back up a lot of smaller VMs from the same farm and they have no snapshot problems so I think I've got that right....I hope....

Pangal · ‎11-03-2014

Hey did you checked limitation of vmware with datastore more than 2 TB as suggested in my earlier post

D_Flood · ‎11-04-2014

I forwarded the KB articles and other links to our VMWare Admins and they replied that it is not an issue.

nathanmike · ‎11-14-2014

@D.Flood,

During the VMware backup, netbackup create some vmdk cache files on 52xx Under /tmp. they contain XML information about the LUNs, VM guest information .... Once the snapshot is removed on ESx and consildation is completed, those files should be automatically removed but sometimes they are not deleted and can impact the next VMware backup run because they are locked. If you do:

ls -lrt /tmp/*vmdk*

you will see files like this:

/tmp/A-ESB-BS-ME1.vmdk_0_249768878_30513_2
/tmp/A-ESB-BS-ME1_1.vmdk_1_249768878_30513_3
/tmp/A-ESB-BS-ME1_2.vmdk_2_249768878_30513_4
/tmp/ESB-IS-LO1.vmdk_0_1765576910_14886_1
....

ls -lrt /tmp/vmware-root

you will see subfolder like this

/tmp/vmware-root/42192d94-eb63-0a4c-7058-5311e43be3f5-vm-2209
...

You need to removed those one before every VMware backup start. I wrote a small script that run everyday before the VMware backup and clean those files/subfolder Under /tmp on each 52xx dedicated for VMware backup. this will for sure have a good impact for each VMware backup and this was a recommendation of Symantec support ;)

My script generate a report like this for each run:

nbuapp01:/home/maintenance # cat /root/script/nbuvmw-rep.log
....
the file /tmp/A-ESB-BS-ME1.vmdk_0_249768878_30513_2 will be deleted
the file /tmp/A-ESB-BS-ME1.vmdk_0_249768878_30513_2 was deleted successfully
the file /tmp/A-ESB-BS-ME1_1.vmdk_1_249768878_30513_3 will be deleted
the file /tmp/A-ESB-BS-ME1_1.vmdk_1_249768878_30513_3 was deleted successfully
the file /tmp/A-ESB-BS-ME1_2.vmdk_2_249768878_30513_4 will be deleted
....
the folder /tmp/vmware-root/42192d94-eb63-0a4c-7058-5311e43be3f5-vm-2209 will be deleted
the folder /tmp/vmware-root/42192d94-eb63-0a4c-7058-5311e43be3f5-vm-2209 was deleted successfully
nbuapp01:/home/maintenance #

nbuapp01:/home/maintenance # crontab -l
...
30 17 * * * /root/script/nbuvmw 2> /tmp/nbuvmw-error.log
nbuacorp24:/log/netbackup/bpfis #

Try this and let us now ;)

VOX

VMWare over fibre giving 156 error