Backup Exec Job Engine high CPU usage
Our environment
- Windows Server 2012 fully patched
- Backup Exec 15 FP4
I just noticed that the Backup Exec Job Engine (bengine.exe) periodically runs at 100% CPU. Here is a summery of my research so far:
- High CPU episodes happen exactly every 5 minutes
- Each episode lasts approximately 1 minute
- During this minute there is only disk read activity but no disk write activity
- Procmon (from Sysinternals) shows that the disk read activity consists entirely of reading Backup Exec catalog files
- Most catalog files are read multiple times, some even 100 or more of times
- The phenomenon seems to have started after I installed FP4 (not 100% certain here)
- Backups run successfully every day
- Backup Exec is not logging any errors or warnings that indicate any type of problem neither during backups nor while idle
I have a number of questions:
- Is anybody else experiencing the same phenomenon?
- If so, did it start with FP4?
- What is the Job Engine doing every 5 minutes? (If it is determining what backup sets have expired, the interval seems a bit agressive [mildly speaking]).
- Reading a catalog file more than once does not seem right. Am I correct here?
- Does it make sense to rebuild the catalogs from the backup media?
- If so, what is the correct procedure to do so for Backup Exec 15?
Thanks in advance,
-Roger
The case has been resolved after slightly over 2 months of working with Veritas Support. This involved many e-mails, half a dozen telephone calls (I explicitly preferred e-mail) as well as making several debug traces and process dumps.
The case started to make really meaningful progress when it was forwarded to Backline Support (after 4 weeks). Several weeks of in depth investigation and tight cooperation later we managed to crack the case: the culprit was a bug in an SQL stored procedure. And once Engineering produced a bug fix, my Job Engine stopped experiencing high CPU episodes. Trumpets !
A few closing remarks:
- This bug applies to all BEX installations that use B2D storage. It will only cause noticeable performance issues though if you have many media files due to choosing a small maximum media file size (in my case I had 2000+ 1 GB media files in my B2D storage; I had good reasons to choose that 1 GB file size, but it did backfire).
- I'd expect that Veritas includes the fix in a future release (it is not nearly serious enough for a hotfix).
- Working with Backline support was rewarding. I was fortunate to work with a very competent engineer (Thank You, M. !) who in turn worked closely with Engineering.
- I wish I could say the same about Frontline support. The primary KPI there seems to be 'mean time to case closure'. This resulted in several quite unhelpful suggestions. Things improved after I managed to get them out of emergency mode.
- Veritas Support seems to have improved quite a bit since the unmerge from Symantec (past experience was all but positive: all cases closed without resolution). I think this is good news for customers.
Regards,
-Roger