cancel
Showing results for 
Search instead for 
Did you mean: 

Job Engine Exception

Greg_Huntzinger
Level 4
We have BE V 10 installed on several servers. All of the servers are runni g Win2000 Server, current service packs, current updates. On one of those machines the backupExec Job service sometimes crashes during backups with a c0000005 exception at address 10432DA7 (copy). This has happened several times.

We have done a reinstall and installed all BE updates, so we already have SR1.

All of our agents are the current BE 10 versions.

I've seen several entries in this forum that seem similar to our problem, but none of them seem to have been resolved.

The only kbase article I've found was 276242, which says to contact Veritas technical services.

How can we get this problem fixed?
43 REPLIES 43

Greg_Huntzinger
Level 4
I don't know how to put the job engine in debug mode. Please attach instructions or a kbase article reference.

We only get crashes during backup operations, not during verify.

This is an autoloader, so the job is spanning across tapes.

We almost always get alerts during any single job. Normally, at least one file is locked during a backup of the 5 or so systems in a typical job.

Sharvari_Deshmu
Level 6
Hello,

please see the technote below to put the services in the debug mode:
http://support.veritas.com/docs/254212


Thanks,

NOTE : If we do not receive your reply within two business days, this post would be marked "assumed answered" and would be moved to "answered questions" pool.

Greg_Huntzinger
Level 4
We followed the instructions in KB 254212 which says to stop the job engine, then add -debug to the start parameters. The other night, the job engine crashed, but we have no log file. I noticed that when I checked the properties for the job engine service there was no -debug property. Are the instructions in the KBase article correct?

The impression I got from reading the KB article is that you could use EITHER the startup param OR the registry value change. Is that correct?

Greg_Huntzinger
Level 4
Out of three machines that had job engine crashes, we only got one log. The logfile ends at the same time as the crash time, but has no crash-specific information. I'm attaching the last chunk of the log file below:

---------------------------------------------------
ataStartBackup: ndmpSendRequest returned: 0x0, 0
07/07/05 06:46:12 TF_NDMPGetResult(): MediaServer thread done, returning TFLE 0
07/07/05 06:46:12 NDMPEngine::MessagePumpAndWaitForResults(): TF_NDMPGetResult() returned 0
07/07/05 06:46:13 data halted: SUCCESSFUL
07/07/05 06:46:13 NDMPEngine: Shutting down.
07/07/05 06:46:14 WriteEndSet( 1 ) returning 0
07/07/05 06:46:16 WriteEndSet( 1 ) returning 0
07/07/05 06:46:16 WriteEndSet( 0 ) returning 0
07/07/05 06:46:16 HARDWARE COMPRESSION ===> Setting compression off.
07/07/05 06:46:16 TF_CloseSet
07/07/05 06:46:16 ndmpConnect : Control Connection information : connection established between IP 172.20.24.25, port 2935 and IP 172.20.24.76, port 10000
07/07/05 06:46:16 NDMP version 3 connection CONNECTED
07/07/05 06:46:16 BESC: Parsing OS version info -
07/07/05 06:46:16 ndmpcSnapshotPrepare2: Warning. No devices to snap. Returning with NDMP_SNAPSHOT_NO_DEVICES2SNAP
07/07/05 06:46:16 Media Server to initiate connection for data transfer
07/07/05 06:46:16 TF_OpenSet( )
07/07/05 06:46:16 Requested Set: ID = ffffffff Seq = -1 Set = -1
07/07/05 06:46:16 Current VCB: ID = 32e5396a Seq = 2 Set = 23
07/07/05 06:46:16 PositionAtSet( :( TF Msg = 6
07/07/05 06:46:16 UI Msg = 8002
07/07/05 06:46:16 HARDWARE COMPRESSION ===> Compression is configurable.
07/07/05 06:46:16 GET_DRV_INF: bsize = 8192
07/07/05 06:46:16 SetupFormatEnv( fmt=0 )
07/07/05 06:46:16 End of TF_OpenSet: Ret_val = 0 Buffs = 2 HiWater = 0
07/07/05 06:46:16 HARDWARE COMPRESSION ===> Setting compression on.
07/07/05 06:46:16 Current Block is = 7d59f
07/07/05 06:46:16 TF_InitMediaServerReverseConnection : Data Connection information : connection established between IP 172.20.24.25, port 2938 and IP 172.20.24.76, port 4754
07/07/05 06:46:16
dataStartBackup: ndmpSendRequest returned: 0x0, 0
07/07/05 06:46:43 TF_NDMPGetResult(): MediaServer thread done, returning TFLE 0
07/07/05 06:46:43 NDMPEngine::MessagePumpAndWaitForResults(): TF_NDMPGetResult() returned 0
07/07/05 06:46:43 data halted: SUCCESSFUL
07/07/05 06:46:43 NDMPEngine: Shutting down.

Renuka_-
Level 6
Employee
Hello Greg,

1.Is this the entire debug log created?
2.Also verify whether the services exhibit the same behaviour when the jobs are run to backup to disk folders instead of a tape.
3. Hve you also split the job if it is too large as told before?
4. Try a repair installation:
http://support.veritas.com/docs/253199

Additional Information :
For information on the recent VERITAS Backup Exec security vulnerabilities, including links to the downloads for the necessary hotfixes, please refer to the following document:
Patch summary for Security Advisories VX05-001, VX05-002, VX05-003, VX05-005, VX05-006, VX05-007

http://seer.support.veritas.com/docs/277429.htm

NOTE : If we do not receive your reply within two business days, this post would be marked assumed answeredand would be moved toanswered questions pool.

Greg_Huntzinger
Level 4
The entire log is about 8000 lines long. I don't think that this web-forum thing can handle it. I would be happy to post it somewhere if you like.

We don't backup to file folders.

I don't know what you mean about a job being "too long". What's the difference? What's the limit? Where is this documented? In what version was this restriction added? I have about 50 jobs spread over 3 servers. I don't believe I have ever seen the same job fail twice.

If you had even read this thread you would know that we have already done a repair installation several times on each of the servers. It has made no difference.

I am getting job engine crashes on all three of my servers and I do not feel like I am receiving any reasonable help.

Kelly_Harper
Level 4
Greg, I feel your pain. For us, the issue ended up being the "password protected tape" feature. We have been running the option to password protect all of our tapes in every job we run. Once we disabled that option, the job engine crashes went away.

Greg_Huntzinger
Level 4
Kelly,

Here's the source of my irritation: I write windows device drivers and services for a living. The rule with all of this stuff is that the driver/service _cannot_ fault.... ever. It is up to the writer to handle every Bad Thing that can happen gracefully. Previous versions of this application were bulletproof and now it is just junk and none of these folks seem to care.

It really bothers me when someone tries to tell me that the reason that the service is crashing is because "my job is too long", or to run the repair install yet again. What nonsense!

Greg

Kelly_Harper
Level 4
I can't say that I blame you there and agree completely with your comments about services; I've never had another service crash like this.

I've found that these Veritas forums are great for general issues but once it gets complicated, like this one, I finally went with the phone support after seeing where this thread kept going. Different technicians getting on with different solutions yet always referring to repair installs or clean installs. I went through it all before an engineer up the Veritas chain suggested I disable the password protection.

I'll keep my eye on this thread and offer any info I run across...maybe you should try a repair install. Sorry, that was low :)

Deepali_Badave
Level 6
Employee
Hello,

Antivirus service is running during the backup?
If so , stop all the third party applications during the backup.

In the application event log are you getting a n event id as "4097"?

================================================
Additional Information :
For information on the recent VERITAS Backup Exec security vulnerabilities, including links to the downloads for the necessary hotfixes, please refer to the following document:
Patch summary for Security Advisories VX05-001, VX05-002, VX05-003, VX05-005, VX05-006, VX05-007

http://seer.support.veritas.com/docs/277429.htm



NOTE : If we do not receive your reply within two business days, this post would be marked assumed answered and would be moved to answered questions pool.

Greg_Huntzinger
Level 4
Yes, there certainly are event number 4097 entries in the log. Those events are Dr Watson events which occur every time the the job engine crashes.

I can try to reschedule the virus scan runs to see what happenes.

As to "stopping all third party applications", this IS a server after all. The whole point of a server is to run those applications. If I did not have those applications, I would not have any data that I needed to back up and I would not need BackupExec.

Renuka_-
Level 6
Employee
Hello Greg,

Sorry for the repetitive redundancy that you have been facing on this thread. It is true that your problem can be resolved faster via voice support.
If you are indeed recieving an event ID 4097 in the application log, refer to the following technote which explains why you requie the personal attention of a Symantec Engineer.
http://support.veritas.com/docs/276242

If you would like to resolve the issue and due to the complexity of your issue, resolution will require the personal attention of a SYMANTEC representative.
Please contact us through your local
support number.You can see a list of support numbers at our support web site:

http://support.veritas.com/prodlist_path_phone.htm

Please note that you may be charged for this service.

For the latest details on pricing, please visit http://support.veritas.com/srv_portfolio/srvc_pricing_matrix.htm

Thanks,

Additional Information :
For information on the recent VERITAS Backup Exec security vulnerabilities, including links to the downloads for the necessary hotfixes, please refer to the following document:
Patch summary for Security Advisories VX05-001, VX05-002, VX05-003, VX05-005, VX05-006, VX05-007

http://seer.support.veritas.com/docs/277429.htm

NOTE : If we do not receive your reply within two business days, this post would be marked assumed answeredand would be moved toanswered questions pool.

Amruta_Purandar
Level 6
Hello,

Please applu the above mentioned solution and update us on the status of this issue.


Additional Information :
For information on the recent VERITAS Backup Exec security vulnerabilities, including links to the downloads for the necessary hotfixes, please refer to the following document:
Patch summary for Security Advisories VX05-001, VX05-002, VX05-003, VX05-005, VX05-006, VX05-007

http://seer.support.veritas.com/docs/277429.htm

NOTE : If we do not receive your reply within two business days, this post would be marked assumed answered and would be moved to answered questions pool.

Richard_Fleming
Level 3
I'm having the exact same problem as you with BEWS 10.0 SP1.

I've been experiencing the exact same thing, but for me I was trying to back up 2TB of data to tape. I have two DLT220, 16 tape libraries.

I leave the console open, everything is up-to-date... I've repaired my database and I still get these frustrating errors. What's worse is the error will pop on on the screen about the Job Engine... everything will work fine until you click the OK button... then everything comes to a grinding halt with a "Communication Stalled" message.

I'm feelin' your pain as well Greg... and I believe the only way to get answers out of these forums is for people like you and I to answer the problems.

Having a service fault is a definate no-no, and the product's quality has dropped... I miss Cheyenne!

I hope you get an answer... as opposed to mine (http://forums.veritas.com/discussions/thread.jspa?forumID=101&threadID=45708&messageID=4356219?)

The sad thing is that even if you don't get an appropriate answer or solution... you still have to say that this topic has been answered.

Rich

Greg_Huntzinger
Level 4
Thanks for the input Richard. I'm just back from a few days off. As soon as I catch up, I'll be making a service call to Veritas/Symantec about the job engine thing. I can already imagine how that is going to go....

Service: "Can't you stop that SQL server service while backups are running?"

Me: "Ugh no, our business sort of depends on it."

Service: "Okay, then lets just reboot the machine and try a test".

Me: "No, see we have these people that are using these systems to do actual work. The machines have to keep working. That's why we go to all of this trouble to back them up."

I'm really looking forward to it.

Greg_Huntzinger
Level 4
By the way, I changed the schedule for the virus scan to not conflict with the backup run and it made no difference. I understand the job engine stopped twice while I was gone.

Patrick_Prather
Not applicable
I'm butting in here, but if you're running the Symantec AntiVirus Corp. Edition client on your VeritasBE server you might consider disabling the Filesystem Realtime Protection.

Other AntiVirus products may have a similar feature.Message was edited by:
Patrick Prather

Richard_Fleming
Level 3
Greg,

Lately I've been having problems with the DLO option with BEWS 10.0 SP1 ... The technician on the phone suggested that I upgrade to the latest build (5520) ... I am running now 5484.

This is supposed to fix a whole slew of bugs, security holes, etc.

I'll paste the link for ya... but unfortunately it's an in-place upgrade. Veritas has a history of painful upgrades but I'm going to try it. I'm sure this will mean you'll have to upgrade all your AOFO clients... but I'll let you know how my upgrade goes...

Link: http://seer.support.veritas.com/docs/277181.htm

Rich

Oh... one other thing. I was able to stop the job engine service from crashing... the stupidest thing too ... a bad tape. Would crash my libraries, stall my jobs... and kill the job engine. I doubt it's your problem... but that's what a bad DLT tape did to me.Message was edited by:
Richard Fleming

Greg_Huntzinger
Level 4
Thanks for your input Richard and Patrick.

I'll consider disabling realtime file protection, however at least one of my servers is a file server.

Interesting about the bad tape problem. Back to my initial rant, bad tapes happen but the service just shouldn't crash because of it. Tapes are a removable media just like floppy disks. What would you say if a service on your desktop crashed when you inserted a bad floppy or CDROM?

Richard_Fleming
Level 3
Hehe... if that were to happen, I'd check to see if I was running Windows 95 :)

Rich