cancel
Showing results for 
Search instead for 
Did you mean: 

Backup job hangs, forever, never errors out

dr_noble
Level 3

I am having trouble backing up the file system on my Microsoft Exchange 2016 Server. I have two backup jobs for this server.  1.The file system.  2.The DAG databases.  Both jobs backup to tape.  The DAG backup job runs fine. The file system backup job runs for a while, and then hangs. It seems to hang in a different place each time the job runs.  The job should complete in 4 hours. I have given the job 48 hours to finish, and it never progresses once hung. Also, once the job hangs, it won't cancel. I have given the job 45 minutes to cancel, and then given up. I then must restart the Backup Exec server to cease the job.

This problem has been on-going for 4 months. I opened a support case 3 months ago, and have made zero progress towards a solution.  

Part of the problem is the job sometimes completes successfully, and sometimes hangs.  If debugging is enabled, the job completes successfully every time, and creates 3500+ log files totaling 70 GB.  When the job just hangs and doesn't cancel, normal job logs aren't beneficial.

One proposed solution was to re-create the Backup job. I have recreated the Backup Exec job twice. Afterwards, the backup job will sometimes finish, but most of the time the job hangs.

I am running the latest version of Backup Exec, version 20.6 I have also uninstalled and re-installed the latest Backup Exec Remote agent to the target server.

I use this Backup Exec media server to backup numerous Windows servers, and this Microsoft Exchange 2016 Server is the only server experiencing this problem.

This is such a headache because once the job hangs, other backup jobs start queueing behind it.  Also, since the job doesn't complete, the tape end marker is unreadable and the tape cannot be appended with additional data.

Pleading for help ...

1 ACCEPTED SOLUTION

Accepted Solutions

After literally months of occasionally experiencing the backup job hanging issue, the job errored out one time with this error:  "The job failed with the following error: The network connection to the Backup Exec Remote Agent has been lost. Check for network errors."

There is a veritas Knowledge Base article for this error located here:  https://www.veritas.com/support/en_US/article.100005195

It proposes this solution:

Open Registry on the remote server
Start->Run->Regedit
Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameter
Create a new Dword value named IRPStackSize (case sensitive).
Modify the value to between 11 and 50. (15 is the default value when the key is not present; Recommend starting at 30. Note: On some computers, values from 33 through 38 and above can cause problems.)
Reboot the server.

I implemented this solution and have not experienced the error again. (It has been a few weeks.)

View solution in original post

6 REPLIES 6

Riyaj_S
Moderator
Moderator
Employee Accredited Certified

Please check if all the antivirus exclusions for BE are added on Exchange Server:

https://www.veritas.com/support/en_US/article.100046324

Thanks and Regards

Neither the Backup Exec server or the Exchange server have an antivirus product installed.

Thank you.

Dr Noble,

Since your backup jobs to tape complete successfully when slowed down by debuggung, perhaps to be successful with debugging disabled you need to use faster B2D media instead of tape.

Keith

I concur it seems like something is timing out. I have changed the job so that it backs up to disk, and then duplicates to tape upon completion.  We'll see if that tactic is repeatedly successful for this backup job.  Hopefully the problem does not spread because that solution is not really viable for other backup jobs.

Changing from "backing up to tape" to "backing up to disk" did not make a difference. The backup job still occasionally hangs while backing up to disk.  The media used apparently makes no difference.

After literally months of occasionally experiencing the backup job hanging issue, the job errored out one time with this error:  "The job failed with the following error: The network connection to the Backup Exec Remote Agent has been lost. Check for network errors."

There is a veritas Knowledge Base article for this error located here:  https://www.veritas.com/support/en_US/article.100005195

It proposes this solution:

Open Registry on the remote server
Start->Run->Regedit
Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameter
Create a new Dword value named IRPStackSize (case sensitive).
Modify the value to between 11 and 50. (15 is the default value when the key is not present; Recommend starting at 30. Note: On some computers, values from 33 through 38 and above can cause problems.)
Reboot the server.

I implemented this solution and have not experienced the error again. (It has been a few weeks.)