cancel
Showing results for 
Search instead for 
Did you mean: 

Enterprise Vault Backup Job Freezes!

WonderTuna
Level 3

After upgrading to Enterprise Vault v9.0.3 (from v9.0.2), the Enterprise Vault job running on Backup Exec 2010 R3 (v13.0 Rev 5204 Hotfixes 108429 and 176937) now hangs / stalls either whilst backing up the EVPublicFoldersVS Store, or sometimes when initially placing the first EV entities into backup mode. Previously to the EV upgrade, the job was running successfully with no errors.

When it hangs, the job will not cancel and the Backup Exec services have to be restarted in order to clear the job. Restarting the Backup Exec Enterprise Vault Agent also does not allow the job to be cancelled. Unusually there are no errors logged in the failed job report (other than that the RPC was terminated due to the services restart). There are similarly no errors logged in the Event Viewer on either the Backup Exec or Enterprise Vault servers. According to the logs, the job appears to be bringing the various vaults in and out of backup mode as expected. However, the job appears to stall as if it is waiting for something to respond / signal for it to continue.

The Enterprise Vault Agent is up to date with the Backup Exec Server build. The environment is a single Backup Exec Media Server connected to an HP MSL4048 LTO3 Library. Enterprise Vault is running as a Virtual Machine in vSphere 5.0.

I have tried re-installing the Backup Exec Agent on the EV server, and re-creating the selection list and associated job. Other jobs running on the server include a SQL, File and Exchange backup, all of which are running fine and completing normally. A search of the Symantec Knowledge Base does not seem to yield any similar issues with the same characteristics.

16 REPLIES 16

Russ_Perry
Level 6
Employee

Upgrades to EV or BE commonly cause issues with EV selection lists... but you already addressed that possibility by recreating your selection list and job. 

The fact that you can't cancel the job by stopping the Remote Agent on the EV server indicates that the job isn't actively connected and waiting for something there (if the EV DBs are on a separate SQL server, did you try to cycle that Remote Agent as well?).  As you mention the appearance of the job waiting for a response, I'd suspect some kind of media alert happening (check active alerts when the job hangs).  You could also break the job into smaller chunks to see if the hang is resource specific.  Do a job with the Indexes, one for the Vault Stores and DBs, and one with everything else. 

Russ

ReagonP
Not applicable
Partner Accredited

We are experiencing the same symptoms identified above.  The environment is also EV 9.0.3 for Exchange running on VSphere 5 virtual machines.  We have the database on a seperate server (i.e. 1 server for EV and one for the SQL database).  Our Backup Exec environment is BE 2010 R3 (SP2 and hotfixes are current) backing up to an HP 2024 LTO5 tape library (fiber attached).

We have rebuilt the backup exec environment, recreated all selection lists and jobs, and have isolated the EV jobs to be the only ones using the 2024 library.  We still get hung jobs.  It appears that the job hangs after a resource finishes and the next one is about to start.  What I mean by that is that in our environment we have two EV storage groups.  So if the job that is backing up the open partitions finishes the open partitions of the first storage group and then attempts to move to the second storage group, the job hangs.

We are able to stop the stuck jobs by stopping the BE remote agent service on the EV servers.

In our case we have broken the jobs up into three seperate jobs, one for the DB's, one for indexes, and one for the open partitions.  We still have the same problem.  There are absolutely no media alerts and no issues on the media server at all.

Anyone have any further suggestions?

Thanks.

Russ_Perry
Level 6
Employee

We haven't had reports of issues like this in support in general or specific to EV9 SP3... I also checked with EV Backline support and they haven't heard any reports either.  We would need to do debugging in order to figure out what is going on.  I'd recommend doing a support case. 

Russ

WonderTuna
Level 3

I have tried to raise a support call however the support portal does not want to recognise the Support ID number that I have from the last renewal PDF (end date 30-JUN-13) . . .

I will break the jobs into much smaller chunks as you advise to see if there is a pattern. In the meantime I am backing up the SQL databases within SQL Server and the vaults as static files after manually setting and released backup mode in EV.

 

WonderTuna
Level 3

Update:-

I have split out the various components that comprise a full EV backup using the BEXEC Agent. The job seems to hang when backing up any of the Vault Stores themselves and their associated Databases. The Directory and Moniroing Databases and the Index Locations all back up fine (as well as the EV Server System State and EV Application Folder).

The Valut Stores / Partitions hang the job with an endless 'Pre-Processing' that can only be cancelled by stoppong the EV Agent on the EV Server.

Have now raised a support call with Symantec and am hoping to debug this issue this afternoon.

Richard_Hart
Level 3

I also have the same issue, but when trying to resolve the issue with Symantec it suddenly started working! Yay I said, but then soon after I found that the archiving process had stopped working. After restarting the tasks it started archiving again but the backups then stopped working!

I am going to get my call reopened and I will get back to you if I get any progress...

be-nugget
Level 5

I currently have this same problem, couldnt be more exactly how ReagonP describes it.  Has this been addressed by Symnatec or does anyone have any update from support cases raised.  We find this problem seems to be intermittent, if the job is cancelled and re-run we often find it works.

Using EV 9.0.3 and Backup Exec 2010 R2, in our case quite a number of partitions have been backed up and it has come to backup the next partition and does nothing.  Looking at the logs, it has succesfully taken the last partition out of backup mode but you never it see it saying it is putting the next one into backup mode.  Only thing I notice is the next partition it is trying to backup is an open one, obviously this shouldnt make a blind bit of difference I know.

Prashant_Game_P
Not applicable
Employee Accredited

One possible reason for the issue is - During Backup of Vault Store or Partition, Backup Exec requests Enterprise Vault to put such entity in Backup Mode, however Enterprise Vault could be busy and might not honor the request to put entities into backup mode and hence Backup Exec would wait on Enterprise Vault to resond. In such case, restarting the Enterprise Vault service may resolve the issue, if not please follow the steps mentioned below, this will help us in narrowing down the issue-

1) On Enterprise Vault server, launch the Enterprise Vault Administration Console.
2) In the left pane of the administration console, expand the Vault StoreGroup container till the Vault Store for which you are running the Backup Job.
3) Check the Backup Mode status of the Vault Store, see 'Backup Mode' column in right pane.
4) Right-click the vault store whose vault stores you want to place in backup mode, and click 'Set Backup Mode'.
5) Upon successful execution of step#4, the Backup Mode column in the right pane shows that Backup Mode is set.
6) If the Enterprise Vault is able to enable the Backup Mode then disable the Backup Mode and try running the Backup Job of Vault Store or Partition.

Please share the result of above steps.

Thanks,
Prashant.

WonderTuna
Level 3

Symatec Technicians have looked at the backup architecture here, decided that there is nothing obvious wrong with our configuration and taken away Agent, Backup Exec and vxgather logs from all servers involved (SQL, EV and Media Server) to analyse.

I have built an enturely new Backup Exec 2010 R3 SP2 + latest Hotfixes Media Server from scratch on Windows 2008 R2 SP1. Even with a fresh build, the problem still persists, although the job now hangs at the initial 'pre-processing' stage instead of getting to the Vault Stores and hanging.

WonderTuna
Level 3

Have done some testing today. It looks as if I am unable to put the EV Vault Stores into backup mode and take them out of backup mode. This seems to fail if I perform the operation using the EV Console, and it seems to fail if I use the provided PowerShell script. Using the EV Console, the operation hangs indefinately with the progress screen (see screenshot). Using the PS Script, it hangs at the processing phase. Running Dtrace on the EV server, you can see that the Backup Exec Agent never seems to get a response when at the stage of trying to put the Vault Stores into Backup Mode. It seems as though the client (in this case the EV Console or the PowerShell Command Processor or the Backup Exec EV Agent), is not receiving the response from the server that the Vault Stores are being placed into or taken out of backup mode, so they hang indefinately, in the case of the Backup Exec client, at the Vault Store 'pre-processing' stage.

Richard_Hart
Level 3

Hi, I was unable to reopen my call with Symantec as they found we did not have suppport for the EV part of Backup Exec!

Anyway I have found a way round it that has been working OK but is a major pain.

Open the EV console and put the Vault store in to backup mode. With me this hangs but you stop the process and reopen the console to find they are in backup mode.

Then Stop and Restart the Storage Service

Then clear the backup mode on the Vault stores and it should clear OK! Now run the backup.

I have found that the backup hangs during the backup, but I have then done the procedure above and it has carried on OK and completed.

I am having to do this everyday, and sometimes twice if the back hangs so it is a work around no way close to a solution. Symantec please get this sorted!!!!!

Richard_Hart
Level 3

Still monitoring but we now seem to be backing up OK!

I was having random issue with archiving where I was getting 3270, 2216 & 2217 event ids. I raised a call with Symantec and they had a look at it, they found that the main issue was a known SQL issue from a 9.0.3 upgrade. The event ids you need to spot at 13360 & 6796 and then you need to apply the fix in the http://www.symantec.com/business/support/index?page=content&pmv=print&impressions=&viewlocale=&id=TE...

 

Since having this applied the backup has run OK for the last 4 nights, and archiving is OK.

WonderTuna
Level 3

I have tried Richard Hart's suggestion posted here above and discovered that it does indeed seem to help the backup to start and complete.

This very much puts the setting of the Vault Store backup mode at the centre of the issue rather than a problem with Backup Exec, specifically perhaps the Storage Service.

I have noticed that the backup job progress freezes whenever it either starts (sticks at 'pre-processing' indefinately), or moves between Vault Store partitions (byte count stops incrementing and job halts indefinately).

From the Event Log on the EV Server, it appears that on occasion the Backup Mode is either not set, or not cleared (the Backup Mode informational event is not logged as expected). However, if one clears or sets the Backup Mode manually from the EV Console (which also freezes - see above post), and then restarts the Storage Service, the backup job 'springs' back to life and continues with the next partition.

I would say that the evidence strongly points to a bug within the Backup Mode process (not the first that has been identified in past versions of EV), introduced with v9.0.3.

be-nugget
Level 5

In my scenario we have found that this is indeed Enterprise Vault not putting things in backup mode properly which therefore makes the backup job look like it has hung and not a problem with Backup Exec.  I have seen it fail when half way through FSA partitions, and indexes so there is no particular area when it is consistently failing.

I will certainly be taking a look at Richards post and the potential fix and post back with results.

WonderTuna
Level 3

This issue has now been resolved after investigation by Symantec.

The issue is actually a bug in Enterprise Vault, specifically in the usps_WatchFileCount User Stored Procedure for each of the Vault Store Databases on SQL Server. The existing Enterprise Vault v9.0.3 Stored Procedure fails to return the value associated with the number of items awaiting archiving for each database. Failure to return this value results in the Storage Service not reporting that the Backup Mode Status has changed to the client (in this case, Backup Exec Remote Agent, but also the Enterprise Vault MMC Console and the Enterprise Vault PowerShell cmdlet if used).

Symantec have told me that the issue will be resolved with a hotfix shortly. However, if the issue is presently causing a production impact within your business, I would recommend raising it as an Enterprise Vault support call and mentioning the details here which should direct them to consider this bug as a possible cause of the problem.

Russ_Perry
Level 6
Employee

EV Agent note - It appears that with EV 8 and 9 the Backup Exec EV agent has no timeout during the process of setting/clearing backup mode.  With EV 10 the agent changed over to using powershell to set/clear backup mode and there is a registry configurable timeout for the process which is 5 minutes by default. 

Russ