Re: AIR replication failed with media write error ...

lovee · ‎11-16-2016

hi

source master : linux red hat 7.7.3 version

target mater : same os and version

not all replication jobs are failing, some are failing wiith EC- 84(media write error) in job detail.

Thanks

sdo · ‎11-16-2016

Thanks for the status, but there are at least several to many scenario and actual specific circumstances which can lead towards an ultimate final condition of status 84, which itself is a somewhat slightly generic final status.

Any actual job detail that you can show?

Any initial analysis which you can offer?

PatS729 · ‎11-17-2016

when you say some replications failing .....at a first glance i suspect you are running out of space or on low space at target site.

If this is not the case... then we need to investigate it with the logs.

lovee · ‎11-17-2016

Hi

target storage has enough space to accomodate this. What logs i need to share?

lovee · ‎11-17-2016

hi,

pls see attached job detail.

PatS729 · ‎11-17-2016

Hi,

On the same note, you can run verify against the images which are failing. If the verify fails and image is bad, then there is no point in duplicate/replicate it. So run command "bpverify -backupid <backupid> and see the results.

We would like to see bptm/bpdm logs with pdplugin enabled and spoold log from source MSDP and offcourse the job status from Activity Monitor.

Marianne · ‎11-17-2016

We see that the replication fails after many hours with this error:

Error bpdm (pid=28875) wait failed: error 174

Difficult to read Job Details with important stuff blanked out. Can you not replace names with dummy names? e.g. server1, server2, etc, diskpool1, diskpool2, etc.

PID 28875 seems to be running on local media server

Check bpdm log on local media server - you need to follow PID 28875 in the log.

Also check bpdm and bptm logs on remote media server.

Is this only happening for long-running replications?
How big are these images and what is bandwith between sites?
Is this a first-time replication for the backup, meaning that remote server is not yet seeded with full copy?

Handy NetBackup Links

lovee · ‎11-18-2016

hi

please see attached job detail.

yes,this is happening for long running replications.

Images are not that big. WAN link is of 35mb.

Yes, this is the firest time we have setup replication. before that dulication was there on tape.

I have created new SU and DP for the replication activity with new names because it was not allowed me to change in the old SU to tick the replication source check box.So, my question is: it will only replicate the new data which is backing to the new SU or it will replicate the old data as well?

If it replicating the new data only then why its taking too much time because some of the jobs are running from last 75 hours?

if only new data will be replicated then how i can replicate the old data which is on the old SU?

lovee · ‎11-18-2016

sry i forgot to attach the job detail . plssee attachment.

lovee · ‎11-18-2016

one more thing to tell here:

source DDOS version: 5.6 OST plugin version : 3.2

destination having 2 DD's the one where replication is going having DDOS version: 5.7 and the other DDOS is : 5.4 and OST plugin version :3.0

is OST plugin version mismatch is the reason on failed replication?

if yes can DDOS 5.4 supports OST plugin version 3.2 or we have update DDOS to 5.6 or current version?

Marianne · ‎11-19-2016

First time you are mentioning DD.
My guess is that replication from lower to higher version will be okay, but not from higher to lower level. You need to confirm with DD.
Remember to also check NBU compatibility in HCL.

The time taken to replicate small images seems excessive. You need to investigate network route between them with assistance from your network team and DD support.

Also check for firewall timeouts terminating connections after certain time. Read up about Veritas recommended KeepAlive settings on all servers involved in the process.

Handy NetBackup Links

Marianne · ‎11-19-2016

First time replication is best done with source and destination at same site to 'seed' the destination. If then moved to remote site, only changed blocks will be sent over the WAN.

Only new backups done with new SLP will be replicated after the backup.
If source data is on the same DD, you can read up on how to use nbreplicate to manually replicate old data (quite a number of forum posts about it).

Handy NetBackup Links

lovee · ‎11-28-2016

Guyz pl help me here. issue still there. I am working with support as well. they are asking for downgrade the DDBoost version to target side version. Or upgrade DDOS and DDBoost to source side version.

Marianne · ‎11-28-2016

Not sure what kind of help you are looking for. I see that I have made a couple of suggestions a week ago with no response from you.

Handy NetBackup Links

vtas_chas · ‎11-28-2016

If support is telling you to get to the same DD levels, there is likely a reason. There's differences in the plugin code and Veritas may not know what all of them are, since it's EMC's code. The fact that you're going from high to low version is likely a source of the problem, since 3.2 will have things in it 3.0 doesn't, and may restructure or fundamentally change the way some functions are performed. I'd start there and see if that fixes things.

Charles
VCS, NBU & Appliances

lovee · ‎12-08-2016

replication jobs are still failing with EC -191.

1. connectivity is good(ping,bptestbpcd,bpclntcmd)

2. system_nfs_filecopy_timeout has been set to 600000

3. vice-versa entry in bp.conf and /etc/hosts file has been done

4.111,2051,2052,2049,1556 ports enabled.

Marianne · ‎12-08-2016

You clearly do not have connectivity or name lookup issue.
If you did, nothing would've worked.

I have made some reccommendations over here (note there is no quick or magic solution):
https://vox.veritas.com/t5/NetBackup/AIR-replication-failed-with-media-write-error-84/m-p/821336/hig...

You will need to work with DD support, network and firewall team to trace and track comms between the sites.

Handy NetBackup Links

lovee · ‎12-15-2016

Thanks for your reply.

I am not getting proper support from vendor. I am working with both VERITAS and EMC. and both blaming each other.

in our post u r saying to chk firewall timeout values and network route. u have checked these problems in logs which i have attached. can u pls underlying those points in logs file itself and attach it and i will check the same with EMC and network team. This will be a gr8 help. Appriciate it.

Thanks

Abhishek_Jadhav · ‎12-21-2016

Check your TCP keep alive and DSP_Proxy timeouts on Data domains,if not already done.

When we had similar issues,we were told that these 2 timeouts were key reasons for these errors.

Check if this helps

VOX

AIR replication failed with media write error 84