cancel
Showing results for 
Search instead for 
Did you mean: 

Backup Jobs Hang Randomly (BE 12.5)

bjash
Level 4

Backup Exec 12.5.2213 running on Windows 2003 Standard SP2 with latest LiveUpdate and OS updates.  (Also experienced identical error on Windows XP SP3 setup).

Backup jobs will hang during backup and job rate will fall with no increase in the Byte Count for the job.  A cancel sent to the job will remain hung with cancel pending.  All BE services need to be restarted to complete the cancel (or the server rebooted).  Backup device is an HP StorageWorks 1/8 G2 SAS LTO-3 drive.

The size of the backup job is about 480 GB.  The job will hang at random places (36 GB, 32 GB, 180 GB, 80 GB, etc.) and has succeeded in the past (successful just 3 nights ago).  So far, it has not made it past 300 GB when it hangs.  We have 30 Sony LTO-3 tapes and they work in other systems with no problems, so I don't believe it is a media problem.

The job touches about 10 servers and the hang has occurred on different servers (2000, 2003, etc.), so I don't think it is a resource problem.  All servers have the 12.5 agent installed with AOFO enabled (backup has hung during regular file backups and most recently Exchange backups--but not consistently).

No notable events logged in Application or System Event logs.  All drivers for Adaptec SAS 1045 controller are up to date and tape library is using all Symantec drivers.

Any help would be greatly appreciated as we have seen this identical error on both an XP Pro SP3 OS and Server 2003 SP2 setup (both clean installs).

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

bjash
Level 4
We finally were able to resolve this issue--it turned out to be the SAS controller needed its firmware flashed.  We are using an Adaptec 1045 SAS controller.  Thanks to everyone who helped.

View solution in original post

4 REPLIES 4

chicojrman
Level 6
I use to see this occasionally with my 10.1d and 11d when it came to B2D devices. Sometimes stopping and restarting the services, agents, and even the server(s) themselves would fix it but often times it wouldn't.

In my case, I found that whenever this occurred it was creating a new BKF file (which is the case when it can't overwrite or append to an exsistig BKF file as per the media rules that were applied to it) and for whatever reason it stalled when it started to create the file.

The way I would fix it (at least in my environment):


Go to the device view in the admin console and take note of the BKF file
Cancel the job
Associate the BKF file to the retired media set and then delete it
Navigate to the actual BKF file on the server and delete it (in my case it will be zero in file size)

Inventory the folder and then retry the backup

Again, this has been the fix that I've done with respects to my environment. I do not have any tapes in my backup, just disks. I do rotate my media/backup to an offsite location by way of hot-swap drives in an array.

For me, this symptom happens every now and then. In other words, perhaps once every two or three months if that. I have no explanation as to why it happens or why I need to do the steps above to correct the condition. At first I suspected a corrupt or bad hard drive but I ruled this out a long time ago (I can go into detail but trust me that in my case it was not a bad drive Smiley Happy ).

Hope this helped.



bjash
Level 4
Thanks for the suggestion.

Do you think a quick erase of the tape would accomplish the same as the steps above?  We have done quick and long erases on the tapes (to see if it was a media issue--same as in your case).

JP-Sym
Level 2
We use BE v12.5 (build 2213) and once or twice a week one particular job (backup of a remote Suse linux host) is not showing any progress (0 bytes). Canceling the job is not woking. A reboot of the server is the only solution.
After the reboot the failed job gets an error code E0008821.

After the reboot I can perfectly (manually) start the job. The job is an appending job, therefore it couldn't be to do with BKF files.
Have not found a solution.
Upgarde to SP1 is not working, as the installation of SP1 will crash the VMWare ESX server (purple screen of death) where it is running on as guest OS (Windows 2008 standard).

Has Symantec a solution for the failing job?

bjash
Level 4
We finally were able to resolve this issue--it turned out to be the SAS controller needed its firmware flashed.  We are using an Adaptec 1045 SAS controller.  Thanks to everyone who helped.