05-25-2016 08:06 AM
Our environment
I just noticed that the Backup Exec Job Engine (bengine.exe) periodically runs at 100% CPU. Here is a summery of my research so far:
I have a number of questions:
Thanks in advance,
-Roger
Solved! Go to Solution.
08-02-2016 05:13 AM
The case has been resolved after slightly over 2 months of working with Veritas Support. This involved many e-mails, half a dozen telephone calls (I explicitly preferred e-mail) as well as making several debug traces and process dumps.
The case started to make really meaningful progress when it was forwarded to Backline Support (after 4 weeks). Several weeks of in depth investigation and tight cooperation later we managed to crack the case: the culprit was a bug in an SQL stored procedure. And once Engineering produced a bug fix, my Job Engine stopped experiencing high CPU episodes. Trumpets !
A few closing remarks:
Regards,
-Roger
05-25-2016 06:50 PM
Stop all BE services and run catrebuildindex.exe -R from BE Install Path (command prompt). When it completes, start all BE services. See if this Issue repeats. Also let me know if you renamed the BE server recently by any chance post which you have started seeing this ? In the catalog folder do you see any other files apart from the xml and fh files. I have not yet noticed a case in tech support post FP4 for above Issue.
05-25-2016 11:44 PM
Exclude the Catalogs folder from any AV scan and see if this sorts out the issue.
Thanks!
05-26-2016 01:22 AM
First of all thank you for the responses!
> Stop all BE services and run catrebuildindex.exe -R from BE Install Path (command prompt). When it completes, start all BE services. See if this Issue repeats.
Done: the issue persists. AFAIK catrebuildindex -R rebuilds the catalog in the database from the catalog files. If the issue is caused by something in the catalog files thenselves, I'd expect the issue to persist.
> Also let me know if you renamed the BE server recently by any chance post which you have started seeing this ?
The server has never been renamed.
> In the catalog folder do you see any other files apart from the xml and fh files.
The 'Catalogs\<ServerName>' folder contains
- 169 .fh files
- 169 .xml files
- 18 .sdr files
- 2 folders ('CatalogProcessTemporaryFolder' and 'catstore', both empty)
IIRC the Procmon trace showed that only the .xml files were being read during a high CPU episode.
> I have not yet noticed a case in tech support post FP4 for above Issue.
OK.
> Exclude the Catalogs folder from any AV scan and see if this sorts out the issue.
It was already excluded.
Regards,
-Roger
05-27-2016 01:32 AM
Since the catrebuild index did not resolve the issue, I dediced to recatalog. This failed:
I then decided to rebuild the catalogs from scratch, as outlined in https://www.veritas.com/support/en_US/article.TECH45473
BE quickly created the initial catalog and then started to read from media. After 16 hours this failed:
Interestingly the media set that failed was from one of the last backup (written by BE 15 FP4)
The Windows Application Event Log has a log of these entries
The final observation is that both (failed) catalog jobs report a whopping 269 TB byte count. The total size on the backup media is just under 2 TB...
And, by the way, the Backup Exec Job Engine still jumps to 100% CPU every 5 minutes. I am inclined to think that FP4 is seriously broken.
Regards,
-Roger
05-29-2016 05:47 PM
open a case with tech support to get this looked.
05-29-2016 11:51 PM
I just did :) The case number is 21402123.
Over the weekend I manually verified all backup sets because I did not like the "format inconsistency" error. I found one faulty backup set. After I expired the faulty backup set, all backup sets would verify OK. Interestingly the faulty backup set was not the one where I initially observed the "format inconsistency" error. Apparently the place where the error appeared in the job log was misleading.
Finally I inventoried and cataloged my backup sets again and now this went smoothly. Unfortunately the original issue persisted.
Regards,
-Roger
05-31-2016 12:35 AM
I just found out something really interesting:
I suspect that BE is doing some kind of maintenance every 5 minutes. Reading each catalog once makes sense, reading it multiple times does not. I suspect a bug here. Also doing maintenance every 5 minutes seems to be a bit over the top.
Regards,
-Roger
08-02-2016 05:13 AM
The case has been resolved after slightly over 2 months of working with Veritas Support. This involved many e-mails, half a dozen telephone calls (I explicitly preferred e-mail) as well as making several debug traces and process dumps.
The case started to make really meaningful progress when it was forwarded to Backline Support (after 4 weeks). Several weeks of in depth investigation and tight cooperation later we managed to crack the case: the culprit was a bug in an SQL stored procedure. And once Engineering produced a bug fix, my Job Engine stopped experiencing high CPU episodes. Trumpets !
A few closing remarks:
Regards,
-Roger