please check if spoold is running on the media server i guess prd-nbu7?
a restart of nbu services won't hurt (check with bpps to confirm that all the services are stopped before starting them back, if some are haning, kill them..)
the other AIR that are running, are there from/to the same media servers?
also, check logs: bpdm logs spad, spoold, storage.d replication ..
I check PRD-NBU7 and PRD-NBU02 spooId is running.
I already use "bpdown" command kill nbu services and "bpup" command restart nbu services.
Please check the attached details.
I check Two Master not has same media servers.
You tell me check logs.I not found logs location.Thank you!
as per the detailled status, it is clear said :
__sosend: _crStreamWrite failed: connection reset by peer. Look at the replication logs on the source storage server for more information.
Then to find the replications logs:
On the media server which is the prd-nbu7 (which in ur case also the master) , navigate to : /storage_path/log/spad/replication.log
Log files in the storage_path/log/spoold directory, as follows:
The spoold.log file is the main log file
The storaged.log file is for queue processing.
The log files are in the /storage_path/log/spad directory, as follows:
**edit** : bpdm log is on the
master Media (my bad I thought it was bpdbm): (which is prd-nbu7) install_path\veritas\netbackup\logs (if the folder doesn't exist, create it) , then go to the media server's properties on the administration console and select logging > bpdm > verbosity level(3)
Then send these logs after that the replication jobs fails.
@paultang - hey Paul - I had the same errors and same symptoms as you, and for me it turned out that the problem was a failing / flapping network switch port on one of four 1Gb bonded cross-site WAN pipes. This customer had four NetBackup domains, configured as two pairs :
NBU setup 1 :
one 5230 v2.7.3 Master/Media at site A one 5230 v2.7.3 Master/Media at site B
NBU setup 2 :
one Win 2016 Master v8.1.1 at site A one Win 2016 Master v8.1.1 at site B
one 5240 v3.1.1 Media at site A one 5340 v3.1.1 Media at site B
Both NBU setups were using the very same cross-site WAN links for their NetBackup AIR replication, but only the first older v2.7.3 environments were affected, in both directions, by the failing flapping network switch port. As soon as the faulty network switch port was closed then NetBackup AIR replication was ok again.
So I suspect that your problem is a network related issue of some sort.
A very typical problem is that network switches close down TCP conversations that the switches think have failed / timed-out / gone-idle, so the typical solution is to implement longer TCP keep alives.
It was never clear to me why the v8.1.1 / v3.1.1 pair of NBU setup 2 was not affected, but I think that was either down to src-mac/dst-mac bonding algo, or more likely was different routing meant that the newer NetBackup environment traffic was routed via one / two of the WAN links that did not have the faulty network switch port.
@CadenL makes a very good point... which I will attempt briefly to expand upon... in my travels I've seen weird behaviour on networks when even tunnelling affects MTU by say 20 bytes. Anyway, not all jumbo devices (and that means A and B end - i.e. TCP stacks on servers (V or P) and TCP stacks in actual physical network switches) support the same max size... e.g. some switches allow / support 9500 byte jumbo frames, whereas some only allow / suppory say 9100 byte jumbo frames... and so... if all points / sides / ends are not configured with the "correct size" (which does not necessarily mean the "same size" when tunnelling or encapsulated) - did you get that ? "same size" does not neccessarily imply "correct size"... well, if all is not of "correct size" along all hops... then, well... expect bad transfers = expect bad backups.