cancel
Showing results for 
Search instead for 
Did you mean: 

Backup Exec job never stops running

AllisonW
Level 4

I've recently installed BE 2014 on a Win 2012R2 server.  I tested the agent by running a backup on a single (separate) Windows server and it worked fine.  I decided to try backing up the entire physical box, which consists of a couple of VMs and of course the ESXi instance as well.  I did this by setting the backup job at the ESXI server level and letting it find the VMs in the datacentre, which it did.  All looked well, until the first time the job ran.

It ran until it got about 29GB in (it shows it's working on the VMDK of the first server), and then it just sits there.  It shows as active, and the program isn't frozen, but it makes no further progress.  I actually let it run for days and nothing.  I tried cancelling the job, and it shows as Active: Cancel Pending, and then never cancels.  :(

If I restart the process or the server, it obviously stops the job, but as soon as it comes back up it just tries to start it over again from scratch.  A second attempt found it hanging at the same point (~29GB).  I again couldn't cancel.  I can't edit the job or remove the server, as the job shows as running.

The storage it's writing to is a NAS connected by iSCSI, and this is the only thing being written on it - it has TBs of room.  The individual server backup completed successfully to it in my tests, and was larger than 29GB, so it doesn't seem like it should be anything to do with the sroage media.

Any ideas on how I can troubleshoot this?  I'm obviously quite new to BE, so any advice is highly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

AllisonW
Level 4

Okay - due to an unrelated issue, I moved one of the VMs from the datastore it was one to a different one.  And then suddenly all my tests are working!  So I was looking to make sure the backup storage had enough room (which it definitely did), but it was the VMWare storage.  I definitely think it should have some way of detecting that there's a resource issue and giving a useful message, rather than just hanging, though.  :(

 

Anyway, this appears to be resolved.  Thanks for helping me investigate, everyone!

View solution in original post

16 REPLIES 16

lmosla
Level 6

Check to see if you have any alerts that need to be responded too  refer here:  http://www.symantec.com/docs/HOWTO98989

AllisonW
Level 4

Nope - "no alerts exist".  :(

lmosla
Level 6

Has this server been rebooted?  Reboot the server and then run Live Updates to make sure all the Updates are installed. Push out the remote agent to the remote servers and also if antivirus is running make exclusions in the AV program for the Backup Exec processes.

AllisonW
Level 4

The server has been rebooted, yes, and LiveUpdate shows that it's up to date.  There is not AV running on the server in question (access to it is quite limited).  Also, it's hanging on backing up the vmdk, so would there be something I'd have to do in ESXi, maybe?

VJware
Level 6
Employee Accredited Certified

Are you able to take a manual snapshot of this VM successfully via the VI client ? (Uncheck the first option of snapshotting the memory and check the second option of quiescing)

Are there any relevant events logged in the event viewer of the VM itself and/or the virtual host ?

AllisonW
Level 4

A manual snapshot works through the client with the options you describe as well as the defaults.  In the VM's events tab, I just see instances (one for each time I let this job attempt to run) of "Virtual machine <name> disks consolidated successfully on <host> in cluster <cluster> in <datacentre>".  Also, the snapshot manager for the VM showed a snapshot labelled "Snapshot for full backup created by BackupExec on 18-02-2015 13:30".

No other VMs on the host have either of these indicators that the backup job ever touched them, which is consistent with BE seeming to not progress past this first one.

For the VM host, again I just see those disk consolidation comments for the first host, along with login info notices and such.  No flags.  :(

VJware
Level 6
Employee Accredited Certified

Does a Remote Agent based backup of this particular VM complete successfully ?

AllisonW
Level 4

I don't know, since I can't get the original job to die.  It won't cancel, and I can't delete it because it's running.  When I reboot the BE server it just starts up again afterwards.  This makes it difficult to do any other tests involving the BE server...:-(

 

Also, sorry for the delay in answering - I appear to have stopped receiving the reply notifications from this thread, even though I still show as subscribed.  Also odd.

VJware
Level 6
Employee Accredited Certified

Try using BEMCLI to cancel the job. And instead of rebooting the server, try to restart all the BE services. The backup job should go in a Recovered State and you should be able to either delete it or place it on hold.

If you are still unable to cancel / delete / edit the backup job, I would recommend to log a support case and have an engineer have a look @ your setup.

AllisonW
Level 4

Woo hoo...I got a notification this time!  Okay, I'll look into the BEMCLI commands (never used it before) and see what I can find.  I'll report back.  :)

 

Sadly, we can't log a support case - we are a charity and so have the NFR edition (I think?  whatever comes through Tech Soup) so we don't get support.

AllisonW
Level 4

Okay, was able to see the job with Get-BEjob.  So far so good.  Piped that to Stop-BEjob, got a "are you sure", and then after I hit "y" it just sat there.  Let it sit for over 10 minutes, and nothing.  The job still remained unchanged in the GUI as well.

 

I cancelled that and did Restart-BEService.  It went to "WARNING: Waiting for service 'Backup Exec Job Engine(BackupExecJobEngine)' to finish stopping..." and then again just sat there.  Now that's been sitting for over 10 minutes as well.  :(   I'm guessing this should be working faster if it's working at all?

 

ETA:  Of course right as I post it finally gets past that line.  I've never seen a service sit that long and then successfully complete whatever it was trying to do!  :)

The services restarted, and then the job restarted again.  I've sent the stop command again via BEMCLI - I'll give it longer this time to see if it actually works.

AllisonW
Level 4

This is still sitting there trying to stop.  However, if I go into the info of the previous backup attempt (it's just an info notice, not a caution or anything), I do see the line "Resource <first VM info> is not snappable.  Attempting to back up directly from the source volume".

 

But I was able to snapshot through vCenter, and the credentials are correct and show that they're working on every test through BE.  Definitely confused now.

VJware
Level 6
Employee Accredited Certified

Credentials test are not designed for VMs. Would you PM me your contact details ? Thanks.

AllisonW
Level 4

Okay, so....as mentioned above, the new attempts at jobs were just sitting there queued.  VJware kindly helped me offline and figured out that there were old files in the BE temp dir on C (which if I understand correctly, were the mount points BE makes to access the snapshots).  Since those were in there leftover from a previous backup attempt, the new one was not starting properly.  Once those were cleared out, the backup ran.

 

What we're still not super clear on is what got messed up on the original backup such that it hung and the files got orphaned.  Maybe a momentary network issue, or something else?  Who knows.

 

So a new backup is running with just one VM, and all looks well.  I'll be adding the other ones back, re-adding GRT, etc...just to make sure it works with all features enabled as well.  If not, that may point to the issue with the original job.  Also, the BE server itself is on this host and there's apparently a possibility the API may have gone into some kind of loop and caused the original issue as well.  So that will be another test.

 

Also, apparently the "resource not snappable" warnings I was seeing in the job logs were a red herring and nothing to be concerned about.

 

Thanks again to VJware, and I'll be checking back in here after the tests for other people's future reference.  :)

AllisonW
Level 4

Unfortunately, the backup we kicked off on Friday, while it looked good to start, again stalled out (showed as running but nothing's happening) after about 20GB.  No drives are anywhere near full, so it shouldn't be a storage issue.  Back to the drawing board, I guess...

AllisonW
Level 4

Okay - due to an unrelated issue, I moved one of the VMs from the datastore it was one to a different one.  And then suddenly all my tests are working!  So I was looking to make sure the backup storage had enough room (which it definitely did), but it was the VMWare storage.  I definitely think it should have some way of detecting that there's a resource issue and giving a useful message, rather than just hanging, though.  :(

 

Anyway, this appears to be resolved.  Thanks for helping me investigate, everyone!