Solved: NetBackup 7 vSphere 4.1 restoring vmdk issue

djarjosa · ‎12-07-2010

Running NetBackup 7 on Windows Server 2008 R2 backing up VMs on vSphere 4.1

Backing up VMs go smooth although I'm testing out restoring a vm (to its original location) after I delete it from disk through vSphere. I kicked off the restore, vSphere did create the VM and it created a snapshot. Then within the View Status of NetBackup restore constantly started repeating "Virtual machine restore: file write failed" and virtual center constantly repeats "Allocate blocks" in its recent tasks window and it shows it completing.

Now I have let this go on for a few hours thinking NetBackup is just waiting for all the blocks to be allocated until it finally succeeds. NetBackup is showing its "Operation" as Mounting and the Kilobytes read number is rising while this is going on (although extremely slow). I still don't see anything happening though.

Has anyone heard of seen anything like this? I highly doubt this is how a restore of a vmdk is supposed to go. Any help would be appreciated!

djarjosa · ‎12-15-2010

My apologies for not adding the solution support assisted me on. I went on vacation shortly after and needless to say work wasn't much on my mind.

Apparently there is a bug with NetBackup 7 and restoring VMware vmdks with "Try all types". The types that are provided are SAN, NBD, and SAN then NBD. The issue is with restoring with the SAN option which is NetBackups first decision. Once I changed it to NBD, the restore was successful within about an hour of starting it. If anyone has any questions or wants screenshots, let me know.

View solution in original post

djarjosa · ‎12-07-2010

Attached a few screenshots to help anyone understand my issue a little better.

Yasuhisa_Ishika · ‎12-07-2010

Can you post more detail like below and something else you think relating to this issue? * restore task detail * errors and warnings in Windows Application Log * transfer type(NBD or SAN)

RiaanBadenhorst · ‎12-07-2010

Hi,

I saw this once at a client, but never saw the end result. I think it could be related to thin disks, where the size of the volume needs to be increased little by little as the requests are received. Well this combined with the transfer type, which was LAN.

I've also seen some other posts where its suggested to restore via the ESX hosts instead of the vCenter. This however doesn't make sense as i've seen it done via the vCenter at another client using the LAN transfer type.

Can you please let us know, your transfer type, and whether the disks are thin or thick? I've seen you're using vCenter, so can you confirm that it was specifief in NetBackup when configuring the backups?

djarjosa · ‎12-08-2010

Yasuhisa:

- I included a screenshot of what I think your looking for as far as task details.

- Looked at event viewer from the time the restore started to the time I ended up canceling the job and haven't seen anything related to NetBackup or VMware.

- As far as where the VM is going it is being restored to a NetApp 2040.

Riaan:

- The disk its being restored is thin provisioned both on the SAN and Virtual Center.

- The restore is going over LAN although vCenter although over fiber to the SAN.

RiaanBadenhorst · ‎12-08-2010

Hi,

Ok, so this is not an answer but more of an investigation. There is another user that is also facing a similar and he is using 7.0.1, so i suspect this is not a NBU issue.

Seems there are quite a few possible configurations for the disk.

Thin provisioned in VMware + Thin provisioned on the disk array (your scenario)
Thin provisioned in VMware + Normal (thick) on the disk array.
Thick provisioned in VMware + Thick on the disk disk array + Lazy Zero (lazy zero is the default method for creating vmdk's)
Thick provisioned in VMware + Thick on the disk disk array + Eager Zero

So a little more of what i read about the zero stuff. Lazy zero is the default when creating a vmdk of thick type. This basically means vmware allocates the space, but doesn't zero out the disk when its created. This apparently can cause some delays as you can imagine when you start using the disk (according to what I've read).

Eager zero is something you can set on the vmdk so the whole zero-ing thing is done before hand. This is apparently better when doing clustering etc.

So from the options above (I've left out the other possibilities of thick + thin on the disk array cause its too many parameters for now) one would assume number 4 is the best, the disk is already allocated from vmware's perspective, the zero's have all been written, and the array is also "thick". Depending on what takes longer, growing of a thin disk, or zero-ing, its a toss up between 2 and 3 for the second and third place. Option one I would assume would be the slowest as there are 2 parties that have to increase there disk size.

So, if at all possible I would like to see a restore of each of these options. Maybe then we get to the bottom of this.

If you have the space, time and will to do this. I dont have a lab or any toys to play with :)

djarjosa · ‎12-08-2010

Sounds like a good theory Riaan, only issue I'm seeing with that is from our side. We can't really change our infrastructure to a thick provisioned alternative. The cost of disk space would hinder us. I'm not saying that what you described won't work, just saying that thin provisioning needs to have a way to be restored. I have a ticket open with Symantec although they haven't responded quite yet.

My biggest concern really is why NBU is showing "File write failed" and vCenter is showing its "Allocate blocks" as successful. Also why they are constantly repeating that over and over until I actually cancel the job. I have let it go for about 3 hours in order to see a change, with no difference.

RiaanBadenhorst · ‎12-08-2010

Hi,

of course, I understand the limitations from the business side. I'm purely looking at it from a technical side. Can you maybe try it on a datastore local to the ESX?

From the NetBackup side, i assume its trying to write the file, which is huge, but the "block allocation" is obviously on a smaller scale so NBU is telling us its timing out.

Anyways, please post the result you get from support, i'd really like to know what the result/fix is :)

bpup · ‎12-08-2010

The problem with the restore is technically a VMware issue....as the restore progresses, the VCenter is trying to allocate space on the fly, which is just WAY too slow for the restore process. Thus you get brutal restore rates (I see 2MB/sec over iSCSI) and your VCenter log will show both 'clear lazy zero" and "allocate blocks", usually one message per second. Thick provisioning will not help...it will try to zero everything BEFORE the restore starts...same slow result, just different order of events.

The documented work around is to restore directly to ESX, not via VCenter. That introduces a new issue, at least for me. The restore GUI would not let me select "none" for VCenter as the documentation indictaes it should. So I deleted my VCenter credentials and entered credentials for ESX (credentials verified ok). Then I could choose "none" for VCenter (in the BAR gui) and point just to ESX. Which results in another problem...I cannot see my datastores when pointing directly to ESX. They appear fine via VCenter, but when it talks to ESX the "datastores" drop down remains empty.

To sum it up, NBU 7.0.1 with EEB 2105102.6.AMD64 is utterly busted with regard to restores (backups are fine) against VSpehere 4.1. We are currently crippled and worried because I cannot say with confidence that I can restore our environment in a timely manner.

I am running NBU 7.0.1 with EEB 2105102 on Windows 2008 R2 x64 (all in one master/media/VMware proxy). Using Advanced Disk (local disk) for media and two dedicated 1Gbps Ethernet interfaces with MPIO for access to the VSphere 4.1 iSCSI network against an EqualLogic PS4000 array. The "mgmt" interfaces for NBU are two 100Mbps Ethernet interfaces aggregated via LACP.

RiaanBadenhorst · ‎12-11-2010

Hi,

Any news from support?

VirtualED · ‎12-13-2010

So they are about three different reasons why you can see the "Virtual machine restore: file write failed ":

1. By default Windows 2008 does not set the disk to online. So you must online the disk before attempt a SAN or HotAdd restore to it:

http://www.symantec.com/docs/TECH124804

2. Thin provisioned disk can be problematic at restore time if the following conditions occurs. This issue you usually see towards the end of the restore via SAN:

https://www.vmware.com/support/developer/vddk/VDDK-1.1.1-Relnotes.html

Restores to Thin Provisioned Disks Using SAN Transport Mode May Fail

By default, VMFS uses one MB blocks, but it is possible to create disks that include some fraction of a VMFS block. When using SAN transport mode to restore to a thin disk, you can only restore up to the portion of the disk that is a factor of 1 MB. For example, consider a scenario where the source disk or backup size is 512.5 MB. To restore, you can either:

Create a disk that is 513MB and restore the entire backup with SAN transport mode.
Restore 512MB of the disk with SAN transport mode, then restore the final 0.5 MB with NBD or NBDSSL transport mode.

3. Another reason I have seen this issue is just because the SAN isn't configured correctly. Assuming you are using SAN to perform the restore, if the SAN is not in the same group as the ESX it won't work. The VMware Backup Host should be added to the SAN as if it was another ESX server.

My suggestion is to try NBD.

djarjosa · ‎12-15-2010

My apologies for not adding the solution support assisted me on. I went on vacation shortly after and needless to say work wasn't much on my mind.

Apparently there is a bug with NetBackup 7 and restoring VMware vmdks with "Try all types". The types that are provided are SAN, NBD, and SAN then NBD. The issue is with restoring with the SAN option which is NetBackups first decision. Once I changed it to NBD, the restore was successful within about an hour of starting it. If anyone has any questions or wants screenshots, let me know.

RiaanBadenhorst · ‎12-15-2010

Great, thanks for the update.

VOX

NetBackup 7 vSphere 4.1 restoring vmdk issue