cancel
Showing results for 
Search instead for 
Did you mean: 

The drive hardware is offline!

Tom_Dudek
Level 3
I have read just about every thread on this subject on these forums. Although it is somewhat comforting that I am not the only site that suffers from this issue, it is very disconcerting that there is no post that actually states that the issue has been resolved, nor have I managed to resolve the issue by trying the numerous suggestions posted. Here is the background to my issue...

As of a month ago the backup exec job services started to drop out and I started getting the above error when running the nightly scheduled backup. Up until then all was well and nothing has really changed except that I am increasingly backing up more data and the backup time is taking more time. The backup job runs and actually verifies succesfully. However a few minutes after the backup completes - at around the time when the tape should auto eject, the backup job service drops out. I have set it to restart after a minute as suggested by another thread. The service restarts OK, but the Alerts show a "The job was cancelled by user recovery" and the soon after "The drive hardware is offline!". ultimately although the backup and verify appears to complete, the errors appear in the log and the job status reports the job as "Cancelled". Oh, the type also does not auto eject.

As I already stated I have tried just about everything suggested here. I am running version 10.1 with all the patches installed, but I have replicated the problem using version 10.1 without patches, version 10 and 9.1 also. All my drivers and firmware for the HP Ultrium 448 tape drive are up to date. Yes I tried the latest Symantec as well as native HP with the same failure. And get this.... Out of desperation I even reformated the hard disk on the server reinstalled windows 2003 server and did a clean install of the version 10d software. I then recreated all the media sets and backup jobs and guess what.... I STILL GET THE ERROR! Surely, by doing this I have ruled out all issues to do with corrupt catalogs and such. Now for the weird thing.... I have noticed that the backup runs and reports "Sucessful" when I only backup some of the files. That is I am still backing up data from the 10 or so different volumes, just less of it.... Then it runs OK. Oh.... yesterday I threw in another GIG of memory into the server just for good measure.... still failed.

I am out of ideas where to go with this next. I have read threads going back to September last year reporting this problem.... Surely if this is inherent in the product that is enough time for Symantec to address the issue....

Please advise.... while I still have hair left!
13 REPLIES 13

kyle_beckham_2
Level 4
We are getting the same messages also. We are using 10d sp1 with latest patches. It seems to mainly happen for us with the Synthethic policy type jobs. They will run fine for several days and then fail with the user recovery and drive hardware is offline error. Then we delete the job templates and recreate them and they run fine for a few days.

We have been using backup exec since v9.0 and never had these issues, until we started using ADBO (synthetic policies). BTW: we are backing up to disk.

shweta_rege
Level 6
Hello,


Could you please Update us on the issue?


Did you install the latest Veritas Drivers?


Do you receive any events in the event logs?


Thank You,


Shweta

Tom_Dudek
Level 3
Could you please Update us on the issue?

Sure.... The problem still exists. At the end of the verify process, the job service drops out, restarts after a minute and the status reports "Cancelled" Alert Log shows "Drive is offline!"

Did you install the latest Veritas Drivers?

Yes, I did. I mentioned this in my original post.

Do you receive any events in the event logs?

Yes. Precisely as the verify process ends I get the following...

Source: Application Error
Faulting application bengine.exe, version 10.1.5629.26, faulting module bestdutl.dll, version 10.1.5629.0, fault address 0x0000cf28.

Source: BackupExec
Backup Exec Alert: Device Error
(Server: "GURU") (Job: "Monday") The drive hardware is offline!
Please confirm that the drive hardware is powered on and properly cabled.

Source: BackupExec
Backup Exec Alert: Job Cancellation
(Server: "GURU") (Job: "Monday") The job was canceled by user Recovery.

I have noticed a pattern with the fact that the more I backup the more likely ithis is to happen. i.e. I have had succesful completions if backup runs for under 7 hours and twenty minutes. Conversly I've never had a Successful completion for jobs that run over that time. Is this simply a coincidence?

Rucha_Abhyankar
Level 6
Hi Tom,

What data are you backing up?

Are you backing up any remote servers?


What device are you using? A library or an autoloader or a standalone tape drive?

Tom_Dudek
Level 3
I am backing up a combination of Netware 6.5 and Windows 2003 servers - all using remote agents. More specifically the backup job contains the following:

2 Windows 2003 application servers - 1 volume each
2 Netware 6.5 File servers - 2 volumes each
1 Netware 6.5 GroupWise server - TSA Backup - 2 volumes

I am backing up to a HP StorageWorks Ultrium 448 Tape drive - no autoloaders are being used. Everything is being backed up to one tape.

Tom_Dudek
Level 3
Further to above. I strongly suspect that the crux of the problem is the Backup Exec Job Service stopping. I have set it to auto start after such a failure as advised in another forum and it is this restart process that returns the "Drive hardware is offline" error. If I could stop the service from intermittently stopping then I would not get the drive offline error. I know this as when the service doesn't stop (and this happens every few backup jobs), all runs well.

Deepali_Badave
Level 6
Employee
Hello,

Backup Exec services are running under which account?

Also run the sgmon log before running the backup job.

How to create SGMON.LOG
<<>>

Regards,

kyle_beckham_2
Level 4
We still see this issue when backing up to B2D folders, in addition this only happens when using the synthetic backup. We have had to restore going back to non-synthetic backups for all but one server because of this stupid drive offline problem. I really wounder if veritas even tests this software.

Tom_Dudek
Level 3
The Backup Exec Job Engine is running under the Local System administrator account. I have actually tried changing this and it makes no difference.

I strongly believe that the problem is linked to the actual real time length of the backups. I have recently tried running all my backups without the verify process at the end. Turning off verify has shaved off around an hour and a half of my backups. It used to take 7 hours 30 minutes or so and now completed in just over 6 hours. Interestingly, since I turned off verify ALL the backups for the last 8 days have completed sucessfully. As soon as I turn verify back on, it extends the backup total time to over 7.5 hours and the Job Engine fails before the backup completes. It is almost as if the job engine cannot run longer than the 7 hour 30 minute mark without falling over!

I suppose I could fragment my one big backup job into a few smaller ones. This may make the engine keep running. I did see a thread a while ago that talked about doing this for restoring jobs in situations where large restores caused the job service to fall over.

At the end of the day, all this is a work around to the core of the problem.... a flaky Bakup Exec Job Engine service.

Rucha_Abhyankar
Level 6
Hi Tom,

We will be moving your query to the Backup Exec for Netware pool.


=========

Tom_Dudek
Level 3
I'm a little puzzled why you want to do that. I am using Backup Exec for Windows, not Backup Exec for Netware. It is the Windows Media server that is giving me problems, more specifically the Backup Exec Job Service, not the Netware agents running on the Netware boxes. Please explain your reasoning.

Stumpr2
Level 6
Sorry Tom,
You have been thrown into the abyss.
This forum is not moderated.
perhaps you can repost....

forcedfx
Level 3
Oh man, this post never should have been moved here.  Is it being hidden?

You should copy and paste your first post back into the Windows version forum.