cancel
Showing results for 
Search instead for 
Did you mean: 

Job completion issues

Dave_Secker
Not applicable
I've looked through the forums here as well as other newsgroups but info on Replication Exec seems to be a bit lacking.  Hopefully someone can help me get some problems we've been having with RE straightened out. Smiley Happy
 
We have 3 servers running RE with the latest service pack and hotfix. Two of them are at our main site and the third is at a remote site.  The two sites are connected via a hardware based site-to-site VPN tunnel.  Each night the two 'main site' servers run replication jobs and replicate data to the offsite server.  There are 3 jobs that run simultaneously (2 of which are on the same server).  Also, the offsite server never changes and is only in place for disaster recovery reasons. 
 
So that said, for quite a while it was all working properly with only the occasional error.  We've recently started having problems though and a good deal of the time the jobs are failing.  They are almost always one of the following errors:
 
First error:
Description:
Job is disconnected on pair.  Canceling the job on this pair.
Context:
Job '<job_name>'on pair '<pair_name>' is disconnected.  Cancelling the job on this pair. : Job '<job_name>' on pair '<pair_name>' is disconnected.  Cancelling the job on this pair. : Connection to the Windows socket is lost.
Note - The repeat isn't a typo - for whatever reason the context of this error has the same 2 lines repeated (same job name and pair name)
 
Second error:
Description:
Pair did not stop successfully.
Context:
Pair <pair_name> on Job '<job_name>' did not stop successfully. : Pair '<pair_name> failed to stop. : Unable to complete RPC to server '<server_name>'.  The RPC failed with NT error 'The remote procedure call failed.'.  Please check network connectivity and settings.
 
Any help or insight would be great appreciated!
 
 
Edit:
Additionally, this following error also sometimes shows up in the event log when a job fails:
Event Type: Error
Event Source: Replication Exec RSA
Event Category: None
Event ID: 1036
Date:  7/18/2007
Time:  2:45:29 AM
User:  N/A
Computer: <server_name>
Description:
Internal error: Unexpected condition encountered in file \build_e\macTemp\c31370a\mule\rxbase\pi\pie.c line 364

Message Edited by Dave Secker on 07-18-200709:31 AM

1 REPLY 1

MKnoll
Level 3
I get the same thing quite often.  It is an issue with the software.  I think having SP1HF7 on bothe the RMS side and RSA side is supposed to fix that.  But I have yet to get the hotfix deployed to all my clients so I cannot be sure that's 100%.