cancel
Showing results for 
Search instead for 
Did you mean: 

Backup Exec Job Service keeps failing

Tom_Dudek
Level 3
After not getting a response (since originally having posted this in July) to the following and getting thrown into the abyss by having this incorrectly moved to the Backup Exec for Netware un-moderated forum, I am re-posting this to where this should be!

----
I have read just about every thread on this subject on these forums. Although it is somewhat comforting that I am not the only site that suffers from this issue, it is very disconcerting that there is no post that actually states that the issue has been resolved, nor have I managed to resolve the issue by trying the numerous suggestions posted. Here is the background to my issue...

As of a month ago the backup exec job services started to drop out and I started getting the above error when running the nightly scheduled backup. Up until then all was well and nothing has really changed except that I am increasingly backing up more data and the backup time is taking more time. The backup job runs and actually verifies successfully. However a few minutes after the backup completes - at around the time when the tape should auto eject, the backup job service drops out. I have set it to restart after a minute as suggested by another thread. The service restarts OK, but the Alerts show a "The job was canceled by user recovery" and the soon after "The drive hardware is offline!". ultimately although the backup and verify appears to complete, the errors appear in the log and the job status reports the job as "Canceled". Oh, the type also does not auto eject.

As I already stated I have tried just about everything suggested here. I am running version 10.1 with all the patches installed, but I have replicated the problem using version 10.1 without patches, version 10 and 9.1 also. All my drivers and firmware for the HP Ultrium 448 tape drive are up to date. Yes I tried the latest Symantec as well as native HP with the same failure. And get this.... Out of desperation I even re-formatted the hard disk on the server reinstalled windows 2003 server and did a clean install of the version 10d software. I then recreated all the media sets and backup jobs and guess what.... I STILL GET THE ERROR! Surely, by doing this I have ruled out all issues to do with corrupt catalogs and such. Now for the weird thing.... I have noticed that the backup runs and reports "Successful" when I only backup some of the files. That is I am still backing up data from the 10 or so different volumes, just less of it.... Then it runs OK. Oh.... yesterday I threw in another GIG of memory into the server just for good measure.... still failed.

I am out of ideas where to go with this next. I have read threads going back to September last year reporting this problem.... Surely if this is inherent in the product that is enough time for Symantec to address the issue....

I strongly believe that the problem is linked to the actual real time length of the backups. I have recently tried running all my backups without the verify process at the end. Turning off verify has shaved off around an hour and a half of my backups. It used to take 7 hours 30 minutes or so and now completed in just over 6 hours. Interestingly, since I turned off verify ALL the backups for the last 8 days have completed successfully. As soon as I turn verify back on, it extends the backup total time to over 7.5 hours and the Job Engine fails before the backup completes. It is almost as if the job engine cannot run longer than the 7 hour 30 minute mark without falling over!

I suppose I could fragment my one big backup job into a few smaller ones. This may make the engine keep running. I did see a thread a while ago that talked about doing this for restoring jobs in situations where large restores caused the job service to fall over.

At the end of the day, all this is a work around to the core of the problem.... a flaky Backup Exec Job Engine service.
4 REPLIES 4

M_R
Level 3
I got similar problems. I was just gonna remove/reinstall BEWS but it looks like this course of action seldom resolves the problem. Looking at the commentary posted on the thread at:
http://forums.veritas.com/discussions/thread.jspa?threadID=48444&start=30&tstart=75

...I hope it doesn't take four months for *my* question to be answered. The fact that these problems don't seem to ever "get fixed" doesn't inspire much confidence in me. Support always just says "just use our non-whql drivers and everything will be ok". Is it just me or did things go downhill after Symantec bought Veritas?

shweta_rege
Level 6
Hello,



Could you please Update us on the issue?



Thank You,

Shweta

Tom_Dudek
Level 3
errr.... I am not sure what you want in the way of an update. You have not advised me of anything since I've made the post two days ago so there is nothing to update. Situation is as it was when I made the original post.

tejashree_Bhate
Level 6
Hello,

What are the errors in the system and application event logs?

In the system even log check for device related errors. Since the device is going offline the job engine is crashing

Please post the devive errors for troubleshooting

Thanks