cancel
Showing results for 
Search instead for 
Did you mean: 

job fails with 'hardware error' and blocks all subsequent tasks

Knippenberg
Level 3
Dear all,

I'm trying to maintain a steady backup routine on a client's network. There's just one server being used as file, SQL and Exchange server which needs to be backed up at night. The specs are:

   *  2 x 160 GB S-ATA HD (Raid 1)
   * P4 630 (2 x 3,2 GHz)
   * 2 GB DDR2-Ram
   * GBit-NIC
   * OS: Windows Server 2003 SBS Edition (32 bit)
   * Sony StorStation SDX-570V AITi200-ST (AIT-2 Turbo S-ATA)

According to:
http://support.veritas.com/docs/282249 (Hardware compatibility list, page 20, bottom),
the tape device used is supported by BackupExec 11d.

Notes:
The backup setup was deployed as-is about half a year ago.
I'm using a localized (German) version of the software, so the error messages are translated and might not be exactly the same as in the original version.

What is supposed to happen:
The backup job should store some selected directories of the c: partition, as well as the Exchange Server/Active Directory and everything that relates to it.
There's an SQL server running as well, but it is stopped before and restarted after the backup job (I don't use the SQL server agent). This setup seems to run okay for a few days.

What actually happens:
However, usually once per week, sometimes more often, sometimes less, the customer complains about not being able to eject the tape from the drive in order to insert the next weekdays' cartridge. The status mails I receive from the BackupExec software state the job failed, the reason being "hardware error". When I logon remotely to see the status of BackupExec, usually the job is still being executed, but I have no means of either aborting it (the job state changes to "aborting..." and stays there indefinetely) or ejecting the media (since the job still seems to be active). The only way of getting on with the daily backup routine is to eject the tape by pushing the emergency eject trigger on the tape drive with a needle and rebooting the server, otherwise the job will hang until the end of days, blocking all subsequent jobs.

Other symptoms:
During the time of the backup (between 21:00 and 01:30), the Windows server reports heavy processor usage (usually 2-4 status mails per day). This could also be due to stopping and restarting the SQL server, but I'm not sure.
The errors don't seem to relate to a specific work day, the last three failures were on Tuesday, Wednesday and Thursday (on different weeks).

Things I already tried:
* Replacing two tapes with new ones, since I thought the media could be defect
* Applying all the latest patches via LiveUpdate (was quite a hassle, too, but that's another story)
* Reconfiguring the backup job in every manner possible
* Reinstalling tape drivers
* Excluding holidays from the schedule, although tapes are set to overwrite and are not automatically ejected

I'm pretty much out of ideas by now on what to do next. What I can do is to add some more information from event logs etc to this post once I can get to the machine in question again, but in the meanwhile, maybe one of you has a tip for me.

Thanks in advance,

Stefan
4 REPLIES 4

Jared_S_
Level 6
Employee
Hi Knippenberg -
 
When you have a chance, I believe the next step is to check the windows event logs. Are you receiving any event ID's 5, 7, 9, 11 or 15?
 
Also - have you updated the firmware on your tape drive? SCSI controller?
 
- Jared

Knippenberg
Level 3
Hi Jared,
thanks for your reply. I wasn't able to get ack to you sooner due to forum login issues.
The described situation happened again on Friday night, after just one day of successful backups. On Monday, I logged on to the server to check the event logs etc. There are none of these error IDs you mentioned, however, all I get is a very generic message saying (translated):

"Device SONY 1 caused an unknown problem."

That's all. I went back to previous dates when the backup failed, and there were other messages:

"The driver encountered a controller error on \Device\Harddisk0"

and:

Adamm Mover Error: Write Position Failure!
Error = ERROR_IO_DEVICE
Drive = "SONY 1"
   {1959A3A6-9752-4FF7-A15F-D37371A7533F}
Media = "#9E522FE0-000"
   {9E522FE0-71CE-4C7C-B19D-5FF4619F8DA5}
Read Mode: SingleBlock(0), ScsiPass(0)
Write Mode: SingleBlock(1), ScsiPass(1)

There are similar errors with 'Read Position Failure", too.

However, it seems that on every occasion the backup fails, there are other reasons with similar background (something about the SATA controller or I/O in general) which cause the crash. On Friday, curiously even the whole tape drive went offline in Symantec BackupExec, which it normally doesn't when a hangup occurs. Reboot solved the problem (restarting the services only didn't), and on Monday night the backup again went through fine.

What I tried:
Updating the SATA controller driver and the RAID driver. When installing the new Intel chipset driver, it gave me no errors but continued to use the old version aftewards - have to check that when I'm present right at the machine the next time (can't reboot it every now and then).

Another thing:
The system still contains an old Adaptec SCSI controller where the old external tape drive was connected to. Might this cause any trouble, by chance?

Thanks for your time,
Stefan

Knippenberg
Level 3
Dear all,
a little update on my problem:

Right now, I keep getting the errors mentioned above. After a reboot, the backup will run once and will fail on subsequent days, instead of giving me at least a week or so. As suggested, I tried out the following things:

* Removing the old SCSI adapter which was useless in the system anyway
* Updated the SATA RAID driver to the latest version (there was a difference in release dates of one day compared to the installed driver, so I don't think that will change anything)
* Updated the Sony tape driver to a more recent version (this actually might make a difference, the new driver is dated somewhat around April this year)
* As usual keeping Windows and BackupExec up to date.
* Also, I noticed that the backup, if it fails, will do so right at the beginning of the backup procedure. What I did was change the order of the sub-backup-jobs, so right now it will backup the file system first and afterwards the Exchange server, just in case if Exchange could cause the trouble.

A firmware update of both the SATA controller and the tape drive were not available.

The nightly backup once again succeeded, but that does not mean a thing - will have to see what the future days will bring.

If you have similar problems or some other clue of what to do, feel free to post - i'm running out of ideas.
I'll keep you posted in the following days if the updates mentioned did help.

Thanks,
Stefan

Message Edited by Knippenberg on 05-22-200712:18 AM

Knippenberg
Level 3
Hi all,

another small update on my problem.
I updated all relevant drivers of the chipset, the embedded sata raid controller, and the tape drive, all of which didn't help getting rid of the problem. Also tried to deinstall that old SCSI adapter still to be found in the system, but that didn not matter either.
My last chance now seems to be to actually install a second SATA controller and trying to connect the tape to that one, since in some of the posts around here I read about issues with the LSI Logic embedded SATA (which happens to be on board of this machine) and with RAID configurations in general.

Fingers crossed,
Stefan