cancel
Showing results for 
Search instead for 
Did you mean: 

VM backup problem via VCenter

HPeter
Level 3

Hi All,

My vm backups sometimes stay in running and I will cause all other backups waiting, so no backup during the night.
This thread looks similar:
https://vox.veritas.com/t5/Backup-Exec/Backup-Exec-2014-VM-Job-keeps-running-never-ending/td-p/69420...

So I set the MaxRpcDatablockSize valuse to 5242880 (decimal) but no changes.

I suppose this is related to the fact the Vcenter is on different site and we connected via 10Mbit SSLVPN tunnel.
If only one backup failed it would not be a problem, but because it keeps running no other backup can start. :(

When i cancel the job in the morning, it is not working. I have tried to restart all services, but it is not working. The only solution to restart the server.

My ideas:

- I set the "Cancel the job if it is still running.." option to a small value. But I suppose because I cannot cancel it manually this option will not work.

- I backup the vm-s via Backup exec agent like phisical machine, however many website don't prefer this idea.

- I backup the VM not via vcenter but directly via host. As this Cluster contain two host I would say maybe this idea is absolutely wrong. (I am not sure it works and If I move a vm to the other host I have to reconfigure the backup.)

(BE15 is up-to-date FP5 + hotfixes)

Any idea is appreciated

Thanks

 

4 REPLIES 4

HPeter
Level 3

Hi All,

I could reproduce the issue in a controlled way.  I broke the internet connection (VPN tunnel) for a while, and the issue appeared again. The job keep running without end. After the connection was established again nothing changed. I had to restart the backup exec server to cancel the job.
My question is still the same as above. Which way would be  the solution? Set up, the "Cancel the job" option (I not sure it will work), backup via agent or backup directly via host insted of vCenter.

Thanks

 

Colin_Weaver
Moderator
Moderator
Employee Accredited Certified

Well firstly you probably should backup via the vCenter.  If you ever need to restore a complete VM (and not use GRT)  you will have to restore via the vCenter anyway as VMware now block direct resore to ESXi hosts if they are managed by a vCenter.

However that is probably not the cause of the issue - the cause is the loss of network connectivity.

With a traditional agent backup, where there are Backup Exec components on both ends, we do (eventually) detect loss of network and fail the job, however for a VMware VM backup using the virtual agent, we use the local copy if te remoet agent on the BE server (which does not stop communicating hence no failure) are then make calls through the VMware API to the vCenter or ESXi hosts. My guess is that the combination of how the API works and how the Backup Exec software works means that the loss of network cannot be detected, with BE waiting indefinitely for more data to be provided. The recovery of the network, with the job stil running, does not allow the process to continue almost certainly becasue whilst one side of the conversation (the BE server side)  is waiting for data, the ESXi host  side did detect it could no longer send the data and made the conversation invalid. It is easier to detect an inability to send/write than an inability to receive.

 

It is worth checking to see if a backup via the Vcenter has the same problem (I actually suspect it will). Note: a backup via the Vcenter should make requests for snapshots and receive metadata information via the Vcenter, but then access the snapshot for the transfer of backup data directly from the storage (either over NBD or SAN)

Unfortunately if you do get a network drop and the job with BE appears to hang then the only solution would be to try to cancel the job and if that appears to do nothing after a reasonable time, then restart the BE services (forcibly if necessary). I have never run into a situation where a complete restart of the BE server is needed so that is a little odd.

Note: if you have had to cancel a VMware VM backup by killing BE services. Then you may need to use the VMware snapshot manager to tidy up orphaned snapshots.

You are absolutely right, the main problem is the loss of network.

I do all vm backup via vCenter on some vm the remote agent is installed and on some it is not. I have checked both of the type has the problem sometimes.

I have tried to cancel the job but it is not working. After that I have tried to restart the services but the "Job Engine service" stayed in stopping state so I had to restart the BE server. I suppose the reason is what you have said. The connection lost between the BE server and the ESXi host.  So the cancelation never reach the ESXi host.

I think because of the lost connection the "Cancel the job if it is running" option will not work.
With the phisical servers and the sql backups on vm via remote agent never stayed in keep running.
Maybe the only solution would be if I would backup the vm-s like phisical machines via remote agent. This is running on local network so the loss of internet never could cause problem.

 

 

I can confirm the "Cancel the job if it is still running.." option was not solve the issue.  I set this option for all vm backups. All jobs which did not running was canceled as expected when the time expired but the job which was running when the connection was broken for a while never stopped. Even the cancelation was not started.

I hope sooner or later BE server can handle this kind of issue.