source master : linux red hat 7.7.3 version
target mater : same os and version
not all replication jobs are failing, some are failing wiith EC- 84(media write error) in job detail.
Thanks for the status, but there are at least several to many scenario and actual specific circumstances which can lead towards an ultimate final condition of status 84, which itself is a somewhat slightly generic final status.
Any actual job detail that you can show?
Any initial analysis which you can offer?
when you say some replications failing .....at a first glance i suspect you are running out of space or on low space at target site.
If this is not the case... then we need to investigate it with the logs.
On the same note, you can run verify against the images which are failing. If the verify fails and image is bad, then there is no point in duplicate/replicate it. So run command "bpverify -backupid <backupid> and see the results.
We would like to see bptm/bpdm logs with pdplugin enabled and spoold log from source MSDP and offcourse the job status from Activity Monitor.
We see that the replication fails after many hours with this error:
Error bpdm (pid=28875) wait failed: error 174
Difficult to read Job Details with important stuff blanked out. Can you not replace names with dummy names? e.g. server1, server2, etc, diskpool1, diskpool2, etc.
PID 28875 seems to be running on local media server
Check bpdm log on local media server - you need to follow PID 28875 in the log.
Also check bpdm and bptm logs on remote media server.
Is this only happening for long-running replications?
How big are these images and what is bandwith between sites?
Is this a first-time replication for the backup, meaning that remote server is not yet seeded with full copy?
please see attached job detail.
yes,this is happening for long running replications.
Images are not that big. WAN link is of 35mb.
Yes, this is the firest time we have setup replication. before that dulication was there on tape.
I have created new SU and DP for the replication activity with new names because it was not allowed me to change in the old SU to tick the replication source check box.So, my question is: it will only replicate the new data which is backing to the new SU or it will replicate the old data as well?
If it replicating the new data only then why its taking too much time because some of the jobs are running from last 75 hours?
if only new data will be replicated then how i can replicate the old data which is on the old SU?
one more thing to tell here:
source DDOS version: 5.6 OST plugin version : 3.2
destination having 2 DD's the one where replication is going having DDOS version: 5.7 and the other DDOS is : 5.4 and OST plugin version :3.0
is OST plugin version mismatch is the reason on failed replication?
if yes can DDOS 5.4 supports OST plugin version 3.2 or we have update DDOS to 5.6 or current version?
Guyz pl help me here. issue still there. I am working with support as well. they are asking for downgrade the DDBoost version to target side version. Or upgrade DDOS and DDBoost to source side version.
If support is telling you to get to the same DD levels, there is likely a reason. There's differences in the plugin code and Veritas may not know what all of them are, since it's EMC's code. The fact that you're going from high to low version is likely a source of the problem, since 3.2 will have things in it 3.0 doesn't, and may restructure or fundamentally change the way some functions are performed. I'd start there and see if that fixes things.
You clearly do not have connectivity or name lookup issue.
If you did, nothing would've worked.
I have made some reccommendations over here (note there is no quick or magic solution):
You will need to work with DD support, network and firewall team to trace and track comms between the sites.
Thanks for your reply.
I am not getting proper support from vendor. I am working with both VERITAS and EMC. and both blaming each other.
in our post u r saying to chk firewall timeout values and network route. u have checked these problems in logs which i have attached. can u pls underlying those points in logs file itself and attach it and i will check the same with EMC and network team. This will be a gr8 help. Appriciate it.
Check your TCP keep alive and DSP_Proxy timeouts on Data domains,if not already done.
When we had similar issues,we were told that these 2 timeouts were key reasons for these errors.
Check if this helps