Solved: Snapshot error status 156 using VCB - Page 2

schrammd · ‎04-22-2009

Enviro:
NB 6.5.3.1 on dedicated master, media and proxy.
Proxy attached to 2 CLARiiONS, one for source vmdks, the other for snapshot cache.
VCB 1.5
Windows Server 2003 R2 SP2 x64 is the VCB proxy. Attached to array over 2Gb FC.
VM Backup : 1
transfer type: 3
vmdk type: 1

Seems like if backing up one VM at a time will work. Running them in "streams" all results in snapshot errors although I don't see any bottlenecks anywhere. When I setup the VCB policy I enabled 5 streams. When the jobs kick off, only the first one was successful, the others all fail with "- Critical bpbrm(pid=5932) from client qa01: FTL - snapshot creation failed, status 156".

90% of our VM's are Linux so all I care about is fulls I guess. I'm attaching the bpfis log from the proxy, the bpvmutil log has nothing relavent to this issue as the discovery is all running fine, just incredibly slow. Has anyone seen this issue? thanks in advance.
There are multiple failing commands and errors but I do not see how to resolve this based on other posts.

Symanticus · ‎07-01-2009

I tried to backup my all four of my VMs in a folder and it is failed

Completed status: Failed
Final error: 0xe000954c - An error occurred while running the VMware 'vcbMounter' command to back up a virtual machine. See the job log for details.
Final error category: Resource Errors

For additional information regarding this error refer to link V-79-57344-38220

Backup- VMVCB::\\vCenter\VCGuestVm\(DC)domain.com(DC)\vm\ORS\ASTON
V-79-57344-38220 - Backup of the virtual machine 'ASTON' failed. VMware VCB framework reported the following error

Backup- VMVCB::\\vCenter\VCGuestVm\(DC)domain.com(DC)\vm\ORS\Win2003x64_RSA01
V-79-57344-38220 - Backup of the virtual machine 'Win2003x64_RSA01' failed. VMware VCB framework reported the following error
Error: Other error encountered: Snapshot creation failed: Could not quiesce file system.

Backup- VMVCB::\\vCenter\VCGuestVm\(DC)domain.com(DC)\vm\ORS\Win2003x64_RSD01
V-79-57344-38220 - Backup of the virtual machine 'Win2003x64_RSD01' failed. VMware VCB framework reported the following error
Error: Other error encountered: Snapshot creation failed: Could not quiesce file system.

Backup- VMVCB::\\vCenter\VCGuestVm\(DC)domain.com(DC)\vm\ORS\Win2003x64_RST01
V-79-57344-38220 - Backup of the virtual machine 'Win2003x64_RST01' failed. VMware VCB framework reported the following error
Error: Other error encountered: Snapshot creation failed: Could not quiesce file system.

and yes, when you try to back up one VM at a time it was successful.

any idea of what happened here please ?

backmeupson · ‎07-02-2009

Do you have the Vmware tools installed, running, and have the sync driver installed?

Symanticus · ‎07-02-2009

Yes backmeupson, I do have the latest VMWare tools installed on each VM, however some of the VM (in this case the SQL Server VM host from the above list ASTON) can be backed up from BE 12.5 with the option NBD ?

but now with the other VMs RSx01? i also facing a problem with Storage VMotion is this related ?

N1500 · ‎07-05-2009

hey i don't have a solution

MBelchamber · ‎09-24-2009

We had VCB backups consistently failing on 3 out of 12 virtual machines with a 156 snapshot error. I tried the stop/start vmount2 service as suggested by symantec support with no success. Eventually, looking through the bpfis logs on the proxy I found:

Scanning snapshot _VCB-BACKUP_
Found match: snapshot-3064
Error: Backup snapshot already exists.
An error occurred, cleaning up...

There were no files anywhere on the VCB proxy which looked out of place. After some digging I found this was the snapshot within VMware - easily removed with snapshot manager within vCenter. Looks like the backup process had failed one evening and not cleared up after itself.

sdo · ‎09-24-2009

Anyone using VCB v1.5 update 1?

We're also having occasional 156s.
Our VMware guy tells me they use VMotion.

NBU master Solaris 10 VCS cluster and several Solaris 10 media all v6.5.3.1.
VCB is Win2003 box, NBU v6.5.3.1 + EEBs to support non-monolithic, and backup by name.

Recently changed all six ESX servers from iSCSI to FCP, and migrated all 40+ LUNs (each 350GB) for ESX from SATA to FC-AL disk on NetApp 3070 active-active cluster running OnTAP v7.2.6.1.

Over 400 VMs in ESX, 75% of which are developer's workstations, so no backups. We have about 80 to 90 VM servers that we need backup once a week, only 3 that need backing-up every day. Most weekends we have repetitive failures for between 5% to 20% of the VMs. It's pretty annoying. I was hoping that the conversion to SAN and FC disks would solve our problems.

I suspect the one snapshot per LUN is at heart here. Can anyone offer any more insight into where in the picture "snap_lock_timeout" fits. I'm not a VMware admin, I just have "backup" rights in VMI client GUI - to remove stale VCB snapshots.

Am I right in thinking that they'll have to turn vmotion off, so that we can gather VMs into separate backup policies on a permanent basis - or that I'll have to do as others have done and write a script to re-populate backup policies with clients each weekend? But if I do this then practically every weekend the policies will run a long retention schedule because NetBackup will see the client as being new in the policy - so then I have to add functionality to modify the schedule.

I'm know VBS, but I know nothing about PowerShell. Does VCB or VMware have objects that I can instantiate from VBS?

Matt_Billing · ‎10-01-2009

Hey,

I don't think vMotion is your problem. vMotion simply moves the machines between ESX hosts, it will not move the machines files - that's storage vMotion and I don't think it's automated yet (Please correct me if I'm wrong)

So if you have machines in policy split out by datastore then unless they are moved manually, you can keep the policies as they are every week.

We have exactly the same problem - although we are certainly around the 20% mark. We have 800 Vm's spread over 39 data stores all Fibre Connected over an EMC Clariion SAN.

Symantec have recommended upgrading to 6.5.4 but I get the feeling that this is a standard response as from speaking to a senior techie the 156 issues "should have all been fixed in 6.5.3"

I've got a couple of calls being escalated at the moment, so I can keep you updated.

All I can suggest at the moment is to regularly check for snaps on your VM's and reboot your VCB Proxies.

Anonymous · ‎10-01-2009

As well as Disk Management and any MPIO software you can use the above command to see what is presented to the proxy.

Use this command to display path information to the VMFS data stores on the VCB Proxy.
The vcbSanDbg utility will show you identification information about all of the mass storage attached to the VCB
Proxy. It will indicate whether the disk contains a VMFS partition and will also report its UUID.

Watch out for:
“warning Could not scan for partitions on device. No VMFS names will be associated with this device.”
“info Lun does not contain any VMFS/LVM signatures.”

This command/tool is installed in %PROGRAMFILES%\VMware\VMware Consolidated Backup Framework

schrammd · ‎10-01-2009

We are backing up 70+ VMs now every day with as close to100% success as you can get. Everyone has a different environment, but it does work great in ours. Yes, you have to customize it and it may take some daily RMA to keep it tuned but so what. Isn't that what backup guys get paid for (and gals). The fact that support has not given you a 100% success rate is probably not due to support but something in your setup.
We have 3 datastores. Anywhere from 10-30 VM's per store. 3 ESX's. Our master server IS the VCB proxy. Keep all the VMs ON a datastore IN the same policy. We run limit 3 jobs active per VM/datastore to make sure the SAN is doing 100% of what it can and thus keeping the backup window short. In fact we are running this off of AX150's (yeah yeah, I know, you are thinking I don't believe it) Response time is not exactly speedy but backups are not a problem. Obviously doing fulls everynight is not great, but tape is cheap and pipes are big, so what? 90% of our VMs are Linux, so paying for 70 NB clients is not a panacea either. If you want more info on what I had to do to get this working, let me know. Or just fork out the big bucks and buy some other product and deal with their bugs.

everyone got bugs. you just don't see'm

Symanticus · ‎10-01-2009

Hi David,

Would you be able to post your deployment diagram here ? in that way we can implement VCB as per your success story here :-o)

in this case you're using Dell Equallogic AX series, no wonder your backup went fine with 3 snapshots at the same time, mine is using Dell PowerVault MD 3000i iSCSI and it runs very slow to backup the 900 GB of 40 VMs in one LUN.

backmeupson · ‎10-15-2009

adsf

backmeupson · ‎10-15-2009

1.5

peterdcross · ‎11-02-2009

I have resolved some 156 errors when all the logic of reading the bpfis logs etc failed. Load balancing the servers by the VMware guys, they moved from one “farm” to another and didn’t tell me required a deletion in one policy and addition to another policy. Changing the of IP address of a VM but not getting DNS updated nor Host files changed. A reboot of the media server cured some! (Its Windows it works sometimes) and lastly delete the virtual machine in NB and re-browse for it. Make sure you separate out the Windows servers and Linux into their own policies no incremental backups on the Linux. We still have a couple that cannot be backed up by snap shots , but we are working on it.

Symanticus · ‎11-02-2009

Glad to hear that sir.

it seems that in doing clustering we must document the policy in the detailed section to avoid this kind of confusion.

My biggest mistake happened when trying to rebuild three of six Hyper-V virtual host cluster nodes. The eviction, rebuild and addition of each of the nodes back into the cluster went flawlessly, but in my haste to get the systems back into the cluster, I failed to remove them from the domain policy, which automatically installs monthly Microsoft patches.

If had forgotten only one of the nodes, it would have been fine, since the VMs would have migrated over to the other five nodes. But with three of the six nodes rebooting and trying send their VMs to other nodes, they had nowhere to go, and things got a little messy. Luckily there was no data loss, but about half of the 100 VMs in the cluster experienced an unexpected shutdown. What was learned? Do not rush when configuring your hosts. The stakes are high, and a small mistake can haunt you.

By Rob McShinsky, Dartmouth Hitchcock Medical Center
URL: http://searchservervirtualization.techtarget.com/generic/0,295582,sid94_gci1372383,00.html?track=NL-1429&ad=732529&asrc=EM_NLN_9725647&uid=6527946

Cheers.

stanleym · ‎01-06-2010

Is it working for you?

I have written a windows powershell script to do the same, only my approach is slightly different: I also create the required policies & schedules.

However, as we use calendar-based scheduling, I need 33 commands (1x bppolicynew, 2x bppolicysched, 30x bppolicyschedrep) to create the full policy. Each command takes up to 4 secs to complete, making the total for one policy ~136 secs.

Due to the number of datastores and the need to have different backup windows, I need to create 151 policies. This means that creating all the policies alone takes 5 - 6 hours!
bpplclients takes another 3 secs per client, so adding 468 clients to these policies takes another ~20 minutes or so. I would like to get the total runtime down to about an hour max.

Does anyone know of a smarter/faster way to create (calender-based scheduled) policies from the commandline? Of course it does need to be a supported method ;-).

Nick_Morris · ‎01-20-2010

From my experience of VCB, i have found i have to do the following to get a 100% success rate (i am trying to tweak for better performance/more active clients but it is my base to start from).

* 4 media servers (these 4 also do our physical/application backups too) with 1 policy for each server (around 100 servers currently backed up on full on daily basis with VCB)
* Each media server has a dedicated drive for VCB backups on SAN disk
* Each policy has multiple datastores but only running 1 client at a time. The datastores are not split between policies.

Performance is that each machine takes around 10-40 mins to backup from start to finish. Machines vary from 5gb to 120gb in size. From my experience running more VCBs concurrently (we use to have 21 policies) it just slows down vCenter to a dead halt and doesn't improve time to backup. If anything, it made it unreliable and never completed within our backup window. Just keep things simple is my only advice.

Netbackup 7 looks interesting with the feature of no proxy servers needed for VCB (direct from ESX to tape/disk i assume). Anyone used this yet?

HEMANPR · ‎01-20-2010

Guys
Some time when I have a error 156 on my Virtual Machines, I verify if that virtual machine have any snapshot create by the VCB Backup Framework.
In this case, delete this snapshot create by _VCB_BACKUP and try the backup again. This happend on my VM Ware Enviroment.

backmeupson · ‎03-10-2010

Interesting points made here about the policy setup. I may give this a try. We currently have a total of 16 policies for each vmdatastore and always have issues. Its alot harder to troubleshoot a failed job when they all kick off all over the place. Virtual center is also getting hammered all night the way we have it setup now. We have 2 proxy/media servers on NBU 7.

My question is have you experienced times where a single job would fail or hang in a policy and therefore making all other jobs 196? What version of VCB framework do you have installed on your proxy servers? I was actually told by Vmware support to roll back from version 1.5 update 1 to 1.5 to help troubleshoot some issues (all of our ESX servers are 3.5).

Nathan_Kippen · ‎03-25-2010

Does anybody know how to get a report from vmware that will show me which clients are in which data stores. That way I can begin to build my policies without having to click on each individual VOE to find out which datastore it belongs to.

Also is there a way to limit how often the media servers refreshes the data that is pulls from the virtual center? It seems as though on each backup, the VCB Proxy Server takes almost 20 minutes to query the virtual center and pull down all the VOE information.

Thanks,

schmaustech · ‎03-26-2010

I completed my script and posted it out on my blog.

VOX

Snapshot error status 156 using VCB