cancel
Showing results for 
Search instead for 
Did you mean: 

Exchange Mailbox -"Backup Exec Job Engine service terminated unexpectedly"

Colm_Coyle
Level 3
Hi - I have a customer site running BE 10d, using the Exchange agent. About one minute after completing a mailbox level backup, the Backup Exec Job Engine terminates unexpectedly. As a result the job occasionally shows as failed. I've reviewed lots of postings re these error types on the forum and have exhausted all the known issues and fixes advised such as...
* Repairing the BE database (many times and reindexing etc.)
* Ensuring the MAPI drivers are consistent on both machines (they are)
* Reinstalling the product from scratch on both servers (completed twice as per forum instructions for complete removal of software, registry, files etc.))
* Upgrading service pack versions (BE 10.d SP1 on main server, update pushed to Exchange server and dll versions cross checked)
* Backup to disk devices - hence taking tape system and drivers out of the equation (no difference - same failure at same point)

At this point, I am at a loss as to what to try next.

Symptoms are...

Approximately 1 minute after completing mailbox backup and verify, the BEngine process dies.

At this point backup and verify have completed, the tape is closed and restore can be made of any mailbox.

NT Event log on BE Server states Event ID 7031 - The Backup Exec Job Engine service terminated unexpectedly.

No error is reported on the Exchange server


ADAMM.LOG has a final entry of...

06/11/06 17:06:15 PvlMoverSession::Release()
Job = {34D0B575-9EBE-4272-A8DD-F4A2DC25FCAC} "Exchange Brick Level Data"
Drive = {19E7EAFA-6E55-44C4-B426-C5C9D405A0CB} "COMPAQ 1"
Media = {CD334933-9534-44CF-A3CA-B8E00B78BD76} "DLT000013"
Error: The bengine service has stopped without properly closing the session!

BENGINE00.LOG reports...

06/11/06 16:57:46 Device Error = 1101
06/11/06 16:57:46 Len Req = 65536 Len Got = 0
06/11/06 16:57:49 TF xfer time = 364 seconds.
06/11/06 16:57:49 TF_CloseSet
06/11/06 16:58:25 RewindDrive mover ret = 0 (0x0)
06/11/06 16:58:25 ret_val = 0
06/11/06 16:58:25 DeviceManager: incoming event fired
06/11/06 16:58:25 Updating session {eccfe957-3660-4fd6-91be-2b20e37360ed} with drive COMPAQ 1 {19e7eafa-6e55-44c4-b426-c5c9d405a0cb}
06/11/06 16:58:25 DeviceManager: processing pending requests
06/11/06 16:58:25 DeviceManager: going to sleep for 61000 msecs
06/11/06 16:58:25 BackupJob::MergeBEVSRJobLogsIfNecessary: No VSR log file found, no merging necessary
06/11/06 16:58:26 Job thread terminating
06/11/06 16:59:26 DeviceManager: timeout event fired
06/11/06 16:59:26 DeviceManager: processing pending requests
06/11/06 16:59:26 Tossing drive:
06/11/06 16:59:26 COMPAQ 1 {19e7eafa-6e55-44c4-b426-c5c9d405a0cb}
06/11/06 16:59:26 TF_FreeDriveContext( 1C8A008 )
06/11/06 16:59:26 TF_FreeTapeBuffers: from 10 to 0 buffers
06/11/06 16:59:26 FreeFormatEnv( cur_fmt=0 )

LOG_BESERVER.TXT reports...

06/11/06 16:58:25 18 Doing Notify
06/11/06 16:58:26 -1 ActiveState::doEndEvent( :( CJobManager::DoJobCompletionTasks() returned 0x0
06/11/06 16:58:26 -1 Registered virtual array rowset(6): Client:PHOTONICSVR01, Type:MD_OBJTYPE_JOBHISTORYVIEW
06/11/06 16:58:26 -1 Client 'PHOTONICSVR01' Disconnected:0x1be9ae0
06/11/06 16:58:26 18 Notify Return Code:0, batch:1, index:0
06/11/06 16:58:26 18 Notify Return Code:0, batch:1, index:1
06/11/06 16:58:27 20 Auto Clear Alerts Query
06/11/06 16:58:30 17 CJobManagerBO::Query QUERY_JOBSETUP_MONITOR
06/11/06 16:59:02 -1 SQLLog(102):AgeSession m_threadMap: SessionThreadID:11f0, CurrentThreadID:15e8
06/11/06 16:59:02 -1 SQLLog(103):AgeSession m_threadMap: SessionThreadID:1714, CurrentThreadID:15e8
06/11/06 16:59:27 20 Auto Clear Alerts Query
06/11/06 16:59:40 -1 Client Rundown Disconnected 'PHOTONICSVR01':0x1be0a60
__________________________________

It's at this timestamp point 16:59:40 where the service actually dies, so I suspect the death ties in with the final entry above from the BEServer - 06/11/06 16:59:40 -1 Client Rundown


I'd really appreciate if anyone can shed any light on this...

Thanks, Colm
18 REPLIES 18

padmaja_rajopad
Level 6
Hi,

Are you backing up system partitions as well as Shadow Copy Components with exchange data?

If yes, try backing up just the exchange data in one job?

If AOFO, is enabled, disable it...

Which version of Exchange are you using?

The size of Exchange backup?

Does the information store get backed up properly?

Is Exchange installed on the remote server.?

If yes, was the remote agent on the Exchange server also started in debug mode?

If yes, do paste the contents of the debug log on the exchange server as well?


If yes, how many NICs on the exchange server and how many on the media server?

Is NIC Teaming being used?






NOTE : If we do not receive your reply within two business days, this post would be marked assumed answered and would be moved to answered questions pool.

Colm_Coyle
Level 3
Hi

Thanks for the reply. Answers below...

Are you backing up system partitions as well as Shadow Copy Components with exchange data? - No - for the backup sequence which is failing, we are carrying out an exchange mailbox backup (checkbox ticked on 'Microsoft Exchange Mailboxes'. The normal Exchange Information Store, Public folders and System State are in a seperate backup jopb which consistently runs without any problems.

If yes, try backing up just the exchange data in one job? As above - already working at this level...

If AOFO, is enabled, disable it... Not in use

Which version of Exchange are you using? Exchange binaries are Version 6.0 (Build 6249.4 SP4

The size of Exchange backup? Full Exchange backup is 5.1 GB, Mailbox BAckup at 4.9 or so

Does the information store get backed up properly? Yes

Is Exchange installed on the remote server.? Yes

If yes, was the remote agent on the Exchange server also started in debug mode? No - can you advise process to do this, as BEUtility application not available at agent level?

If yes, do paste the contents of the debug log on the exchange server as well?
If you get advice back today on how to enable logging at agent level, I'll turn on for tonight and post results tomorrow

If yes, how many NICs on the exchange server and how many on the media server?
Is NIC Teaming being used?
On both servers - 2 NICS, first is HP NC7760 onboard, second is HP NC7771 PCI, both are teamed.


Hope this helps provide a clearer view... thanks, Colm

Rucha_Abhyankar
Level 6
Hi Colm,

Under what account are the services running?


Have you tried creating a new account and giving it the domain admin and the local admin rights and then starting the services?


==================


NOTE : If we do not receive your reply within two business days, this post would be marked assumed answered and would be moved to answered questions pool.

Colm_Coyle
Level 3
Update: The service is now failing just after backup, before the verify cycle starts.

My customer is getting very concerned as to the quality of Backup Exec, and is strongly hinting at wanting it replaced!

Any chance of some more feedback and advice from the Veritas staff on the forum?

Thanks, Colm

Colm_Coyle
Level 3
Services are running under administrator account in all cases - so there should be no permission problems.

Can we possibly get to the relevance of the earlier question about Network Teaming? Is this known to cause any issues - is so, what are they, why do they occur and how should they be diagnosed?

Thanks, Colm

Ashutosh_Tamhan
Level 6
Is the remote agent service also running under the administrator account? Well just the remote agent service should be running under LSA.
Regards, Ashutosh NOTE : If we do not receive your reply within two business days, this post would be marked assumed answered and would be moved to answered questions pool.

Colm_Coyle
Level 3
Ashutosh - the Remote Agent Service is running under LSA. Is this correct or should we change to run under Administrator? Thanks, Colm

priya_khire
Level 6
Hello Colm,

The remote agent service should be running under the local system account, so you need not change it. You had earlier mentioned that the Exchange binaries are Version 6.0, but what is the exact version of Exchange. Verify if it is listed as compatible in the SCL for 10d at the link below:
http://support.veritas.com/docs/278254

You had also mentioned that SP1 for 10d is installed. Do ensure again that it is installed and the remote agent was reinstalled to the remote machines.

Also follow the steps below:

- enable the restrict anonymous support on the media server as well as the remote servers as per the technote below:

http://support.veritas.com/docs/274272

- recrete the mailbox backup job.
- stop any anti virus services during the ojb and test the results.

Revert with details on the steps followed if the issue persists.

Note : If we do not receive your reply within two business days, this post would be marked �assumed answered� and would be moved to �answered questions� pool.

Regards.

Colm_Coyle
Level 3
Priya

1. Exchange version is Exchange 2000, SP3

2. SP1 for 10d was installed just after a clean install and the remote agent pushed from there to the exchange server. Binary file versions were checked on the remote agent deployment - verified as SP1.

3. As requested, I have checked the 'Enable Restrict Anonymous Support' flag - this was turned off and is now on.

4. I will rerun the backup cycle to see if the setting in 3 makes any difference, and in the meantime check with the customer to see if they are happy for us to disable AV on both servers for another test run.

Thanks, Colm

Colm_Coyle
Level 3
Update:

I have re ran the job with the 'Enable Restrict Anonymous Support' flag turned on.

No change in behaviour - the backup and verify cycle completed as normal, then the Job engine service failed around 45 seconds later.

Can you advise further on what you want disabled from an Anti-Virus perspective?

The Backup Exec server runs Sophos AV, the Exchange Server runs Sophos AV with the Exchange Anti Birus / Anti Spam add in.

We cannot disable AV support for the Mail Server without also shutting down External eMail connectivty as the risks would be too high. This will mean the business losing external email connectivity for 3 hours or so - which is high impact for them...

So - I need to know exactly what AV functionality you would like disabled?

Also, any chance of a response to my earlier query regarding Nnetwork Teaming?

Is this known to cause a problem - if so, what are the symptoms, why do they occur and what is the procedure to diagnose if this is the root cause?

Thanks, Colm

Colm_Coyle
Level 3
Update:

I've installed a copy of Backup Exec 10d SP1 locally on the Exchange Server using a USB tape drive, and ran an Exchange Mailbox backup there. The problem reapperaed - about a minute after the verfiy cycle had copmpleted, the Job Engine service fell over.

So - that takes Network Card teaming out of the equation.

Anyone got any ideas at this point?

Thanks, Colm

Colm_Coyle
Level 3
OK - it's been almost a full week since I've had any input from any Symantec staff on this issue.

From what I can see on the other forum threads, the Job Engine service seems to spend most of it's life falling over and messing up backup routines. It hardly qualifies as a stable piece of code, in my opinion.

I've talked to Veritas support with a view to raising a support call, but cannot accept the fact that I've got to pay you for a call even if the problem is subsequently found to be within your code or due to a lack of information on limitations of the product (compare with Microsoft - a different story!!)

Is there any chance of some real help from Symantec on solving this problem, or do I have to start advising all my customers that Backup Exec and other Symantec products are not of a satisfactory quality to support and protect their Business IT infrastructure.

The 'ball is in your court'

Regards, Colm

I've had to invest almost 30 hours of support time to date on this single issue

priya_khire
Level 6
Hello Colm,

The anti virus step was suggested as at times BE and the AV try to access the same files at the same time which can cause a deadlock. Do you also get any errors in the event logs apart from event id 7031? Ensure that you have also installed the latest OS patches.

Note : If we do not receive your reply within two business days, this post would be marked �assumed answered� and would be moved to �answered questions� pool.

Regards.

Colm_Coyle
Level 3
The scenario is now quite simple.

I have BE 10.d, sp1 installed directly on the Exchange Server.

When I use BE to carry out a "mailbox" backup, it completes the backup, completes the verify and about 1 minute later the Job Engine service fails, marking the whole backup as failed.

If Anti Virus is off - same problem

Backup to tape drive or disk device - same problem

Windows OS is fully patched and up to date

No other errors in the event log

I've reset Windows Service Manager timeout from 30 seconds to 2 minutes - same problem

Local MAPI drivers are up to date

What do I need to do to determine why the Job Engine service fails consistentrly in this way?

Regards, Colm

Howard_Brown
Level 4
Anything in the Windows event logs just preceding this?

You might want to try selectively backing up mailboxes, eliminating any system mailboxes (System Attendant, anything starting with SMTP, etc...), and selectively backing up actual mail folders only - not Spam or Deleted Items...

It may not help, though. Something is causing the engine to hang, very odd. Maybe try Dr. Watson?

Shilpa_pawar_2
Level 6
Hi,

Check if drwtsn32.log is created on the system. If it is created, scroll to the bottom of the log and then search for the word "FAULT" going back up through the log . ( Make sure the word "FAULT" is in upper case and the Match case checkbox is selected)

Directly above the FAULT will be the function that Backup Exec is in conflict with

Try to back up just one or two mailboxes and check if the job engine terminates.

Also enable the following key on exchange server: HKey_Local_Machine > Software > VERITAS > Backup Exec > Engine > Exchange > Bypass MBox properties | change it to 1.

Configure the Anti virus to exlcude following exchange locations:
http://support.veritas.com/docs/267437

Check System log for any service terminated errors and paste the error here.

NOTE : If we do not receive your reply within two business days, this post would be marked "assumed answered" and would be moved to "answered questions" pool.

Colm_Coyle
Level 3
Shilpa

Thanks for the feedback.

1. I've made the registry change and will test

2. I've tested before will AV and Anti Spam completely turned off - but is made no difference

3. I've tried backing up different numbers of mailboxes, sizes of mailboxes etc. in the past. Generally speaking - the less data backed up, the lower the likelihood of the job engine service failing - but I've had it fail on these smaller backups as well.

4. I've looked at DRWTNS32.log, and have extracted the following section. Could you have a look and see if it gives you any indication of what is happening?

Thanks, Colm

_________________________________________________

*----> Stack Back Trace <----*

FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name
0158FEE8 7C57B3DB 00000104 0000EA60 00000000 004399CE ntdll!ZwWaitForSingleObject
02F0B250 74C08502 24548B0F 244C8D04 FF525108 08C483D0 kernel32!WaitForSingleObject
F13820A1 00000000 00000000 00000000 00000000 00000000

State Dump for Thread Id 0xd68

eax=0171fe08 ebx=00000002 ecx=0171fe08 edx=00000000 esi=77f88ef8 edi=00000002
eip=77f88f03 esp=0171fdcc ebp=0171fe18 iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246


function: NtWaitForMultipleObjects
77f88ef8 b8e9000000 mov eax,0xe9
77f88efd 8d542404 lea edx, ss:01f39cb3=????????
77f88f01 cd2e int 2e
77f88f03 c21400 ret 0x14
77f88f06 8bff mov edi,edi

*----> Stack Back Trace <----*

FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name
0171FE18 7C59A10E 0171FDF0 00000001 00000000 00000000 ntdll!NtWaitForMultipleObjects
00000158 00000000 00000000 00000000 00000000 00000000 kernel32!WaitForMultipleObjects

State Dump for Thread Id 0x100c

eax=00000000 ebx=00000002 ecx=00000000 edx=00000000 esi=77f88ef8 edi=00000002
eip=77f88f03 esp=0181fef4 ebp=0181ff40 iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246


function: NtWaitForMultipleObjects
77f88ef8 b8e9000000 mov eax,0xe9
77f88efd 8d542404 lea edx, ss:02039ddb=????????
77f88f01 cd2e int 2e
77f88f03 c21400 ret 0x14
77f88f06 8bff mov edi,edi

*----> Stack Back Trace <----*

FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name
0181FF40 7C59A10E 0181FF18 00000001 00000000 00000000 ntdll!NtWaitForMultipleObjects
0181FFB4 7C57B388 00E58160 00000007 00000003 00E58160 kernel32!WaitForMultipleObjects
0181FFEC 00000000 00000000 00000000 00000000 00000000 kernel32!lstrcmpiW

State Dump for Thread Id 0x1304

eax=00300035 ebx=00e50000 ecx=00320041 edx=01ba1f40 esi=01ba2348 edi=01ba1f40
eip=77fcd79a esp=0191fd9c ebp=0191fda8 iopl=0 nv up ei ng nz na pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000283


function: RtlZeroHeap
77fcd77c e88952feff call RtlpNtEnumerateSubKey+0x5aa8 (77fb2a0a)
77fcd781 0fb706 movzx eax,word ptr ds:01ba2348=0081
77fcd784 294328 sub ,eax ds:01669ee6=????????
77fcd787 80651400 and byte ptr ,0x0 ss:02139c8e=??
77fcd78b 57 push edi
77fcd78c 53 push ebx
77fcd78d e85dfcfcff call RtlIsValidIndexHandle+0x182f (77f9d3ef)
77fcd792 8b4f0c mov ecx, ds:023bbe26=0115002e
77fcd795 8b4708 mov eax, ds:023bbe26=0115002e
77fcd798 3bc1 cmp eax,ecx
FAULT ->77fcd79a 8901 mov ,eax ds:00320041=c033fffd
77fcd79c 894804 mov ,ecx ds:00b19f1b=????????
77fcd79f 7522 jnz 77fd3dc3
77fcd7a1 668b07 mov ax, ds:01ba1f40=0035
77fcd7a4 663d8000 cmp ax,0x80
77fcd7a8 7319 jnb RtlZeroHeap+0x154d (77fce6c3)
77fcd7aa 0fb7c8 movzx ecx,ax
77fcd7ad 6a01 push 0x1
77fcd7af 8bc1 mov eax,ecx
77fcd7b1 83e107 and ecx,0x7
77fcd7b4 5a pop edx
77fcd7b5 c1e803 shr eax,0x3

*----> Stack Back Trace <----*

FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name
0191FDA8 77FCB80C 00E50000 01BA2348 0191FE20 00000000 ntdll!RtlZeroHeap
0191FE54 0023218A 00E50000 00000000 01BA2350 01B992FC ntdll!RtlFreeHeap
0191FE9C 004E28FA 01BA2350 01B99300 00E57D38 00000000 !free
01B99300 00010000 00000000 00000000 00010000 00000002 !
01BA0468 00000000 0000FF92 00000000 00000000 0001EB80

State Dump for Thread Id 0x136c

eax=01defe9c ebx=00000001 ecx=01deff24 edx=00000000 esi=77f88ef8 edi=00000001
eip=77f88f03 esp=01defe40 ebp=01defe8c iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246


function: NtWaitForMultipleObjects
77f88ef8 b8e9000000 mov eax,0xe9
77f88efd 8d542404 lea edx, ss:02609d27=????????
77f88f01 cd2e int 2e
77f88f03 c21400 ret 0x14
77f88f06 8bff mov edi,edi

*----> Stack Back Trace <----*

FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name
01DEFE8C 7C59A10E 01DEFE64 00000001 00000000 01DEFE84 ntdll!NtWaitForMultipleObjects
01DEFF60 002A5B70 00157118 01B57D50 01B57C10 01DEFFA4 kernel32!WaitForMultipleObjects
01DEFFB4 7C57B388 01B57CB8 00157118 000001B6 01B57CB8 !WorkerThreadHF::SetThreadName
01DEFFEC 00000000 00000000 00000000 00000000 00000000 kernel32!lstrcmpiW

State Dump for Thread Id 0xa50

eax=00000102 ebx=00000001 ecx=01000101 edx=00000000 esi=77f88ef8 edi=00000001
eip=77f88f03 esp=01eefe40 ebp=01eefe8c iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246


function: NtWaitForMultipleObjects
77f88ef8 b8e9000000 mov eax,0xe9
77f88efd 8d542404 lea edx, ss:02709d27=????????
77f88f01 cd2e int 2e
77f88f03 c21400 ret 0x14
77f88f06 8bff mov edi,edi

_________________________________________________

Deepali_Badave
Level 6
Employee
Hello Colm,

Have you enabled the following key on exchange server: HKey_Local_Machine > Software > VERITAS > Backup Exec > Engine > Exchange > Bypass MBox properties | change it to 1?


NOTE : If we do not receive your reply within two business days, this post would be marked assumed answered and would be moved to answered questions pool.