cancel
Showing results for 
Search instead for 
Did you mean: 

Backup Exec Job Engine hangs, here we go... again

Fusion
Level 2
Hi,

I "upgraded" to 11d a few weeks back and have applied all of the latest so-called hotfixes. After perhaps the most painful upgrading & licensing experience I have ever dealt with in my 15 years of network administration I finally got things running... or so I thought.

First off, let me just say that I have already completey deleted all of my old jobs and policies and re-created them, which by the way, was a major pain. Anyways, here was the first interesting behavior:

Last week or so, while some incremental jobs were trying to run (note I am backing up to a firewire drive) suddenly a job was missed because the previous job hung and the subsequent job's window timed out, at which point Backup Exec went into a continuous loop of retrying the job (note I do NOT have any retry settings defined for any of my backup jobs.) The end result was my Job History was filled up with over 120,000 missed job entries in the course of one evening. Anyways, I ended up spending around three hours yesterday deleting the missed jobs (2500 at a time since that is the max it will allow). Incidentally I called Symantec "support" about this, and the tech could not tell me why or how this happened, nor could he give me any reliable troubleshooting steps or configuration recommendations to prevent it in the future.

As if this wasn't enough of a pain, not one backup has been successful after clearing these missed jobs. I consistently get WMI errors, "Failure to retrieve Windows Management Instrumentation" on multiple different servers, and/or the backup job just hangs while backing up for no apparent reason. There is no indication that the job has hung, since it never errors out. The job simply sits there in a, "running" state with no data being passed until eventually I manually cancel it. Naturally once I manually cancel it, then it hangs at, "Cancel Pending" and yes, I ensure every time that there are no alerts awaiting acknowledgement before cancelling the jobs. Anyways, at this point I am then forced to bounce my backup server since I cannot successfully recycle the Backup Exec services, they just hang trying to stop the job engine.

The question is, how on Earth can I get reliable backups running again? I am so disgusted with this entire debacle that I have already made up my mind to never renew or purchase another Symantec product again. Perhaps someone there can redeem this situation by giving me some actual assistance which doesn't involve me waiting on hold for an hour or more, only to get some incompetent nincompoop who has no clue how to solve the issue. At this point I have to literally intervene after EVERY single job, not that it seems to matter since none of them ever completely successfully anymore anyways. This is completely unacceptable, I'm seriously at the point where I may just ask for our money back and take our business elsewhere.

P.S. I just fired off an incremental job for the third time in a row now, and yes it is now hanging again after around 20 seconds.

Message Edited by Fusion on 04-06-200712:23 PM

Message Edited by Fusion on 04-06-200712:23 PM

7 REPLIES 7

nhbilly
Level 4
try running the job without AOFO enabled.......I had a hell of a time getting any of my jobs to run at all. Disabled AOFO and it works.

Do_Kim
Not applicable
Hi,
I have the same issue. The backup exec 11d is running on Win2k SP4 and upgraded from 9.1.
The backup jobs hung without any error message and the following jobs are missed like your case. 
First, Symantec tech support suggested to re-build catalog. But it didn't work. Secondly Database was rebuilt by Symantec's advise, but it didn't work either.
And last, I uninstalled Backup 11d completely and freshly reinstalled it. But in vain.
 
PS:  AOFO is not enabled.

Fusion
Level 2
I was miraculously able to get Backup Exec working again by finally rebooting several of my servers, although I have no idea why this should make a difference since the remote agents had been installed far prior to the issues.  At any rate, I've decided to just skip this headache once and for all and move to Bacula when our licensing expires, Symantec clearly doesn't care anyways.

tgbrittai
Not applicable
I feel your pain! Here is the fix for your 120,000 entries in the job history log.

http://support.veritas.com/docs/287696

Symantec Backup Exec (tm) 11.0 (11d) for Windows Servers revision 6235 (32bit) - Hotfix 16 - Multiple issues: Daylight Saving Time (DST) can cause jobs to be skipped; running a backup job in the past causes jobs to continuously loop as missed jobs.

T_K
Level 4
What resource does your backups stop at? I've also spent many hours trying to figure out why my backups hung at the exchange mailboxes, in the same won't work till bounce (or restart of services) fashion.

called symantec and they couldn't pinpoint the problem, even created a new mail-enabled account with all the correct perms. Installed all updates via live update and even updated the remote agents to 11d (which required a reboot, so it masks whether or not it fixed the problem).

I hope your reboots have fixed the issue for good. It would come back for me after a couple weeks or so for me...
--------


btw, I know a Do!

Jakob_Markussen
Level 4
I feel your pain to. And I will also never ever buy anymore Symantec products. Mybackup have been unstable for months now. And as you say - symantec doesn't seem to care.
 
For some weeks now my backup "seems" to have been running fine - but now it suddenly starts to behave like it did just after the upgrade to 11d.
 
Job hangs forever. GUI crashes when lotus files are checked etc etc.
 
This product is the biggest joke in backup history.....

intraknowareman
Level 2
Same deal here too....

WMI is the culpret for 99% of our job hangs.  Try restarting the Windows Management and Instrumentation service on the box that is hanging the backup, not the media server.  The WMI service simply re-starts and the job should just continue and finish.  THis is quite annoying and i am actively researching WMI, ( thats why i am here),  also looking into some sort of WMI pinger to notify me when the service stops responding.  Could help resurect those jobs before they hang for a millenium.

Also if the stopping of the service times out, invest in a utility suite called PSTools  there is a kill app in there that does a great job at kiling a hung service.

Good Luck....