Error 42 Network read error during tape duplication , but job still active
During tape duplicaiton, we have an error 42 Network read error in job detail. The job stays active, and nothing more happens. 96% of backup jobs are ok. but this error occurs on long jobs.
- Master and media server are able to ping each other,
- Master server is a virtual machine running windows 2008R2 and media server is an appliance 5230.
- Master and media server use DNS to resolv names. On master and media server, an entry have been added for master and media in host file to suppress risk on DNs failure.
does anyone ever experienced such case ?
Are there any error messages in the log details?
How did you set up the duplication? (Manual through catalog/CLI, SLP, or Vault)
Is the duplication from disk to tape?
Please run from .../bin/admincmd
All duplication no matter if it is vault, tape to tape, disk to disk, SLP, MSDP to tape of any other combination, is managed/ controlled by bpduplicate. This logs into /usr/openv/netbackup/logs/admin, so is a good place to start. IMHO, a better place to start is with the details from Activity Monitor for a faiing job, hopefully we can then see at what part of the job we are looking at, and then work out the likely logs needed from there.
Its a duplication through catalog from tape do disk: we are seeding around 40 GB of data between two datacenter before settings up AIR replication.
I can't publish any log for security reason.
Its a duplication through catalog from tape do disk....
Is source and destination all on the same media server (Appliance)?
Was any performance tuning done on the master?
(We know that the Appliance comes fully tuned...)
For example - Adjusting the TCP KeepAliveTime parameter, Kernel thread parameters, etc.
(Info in Performance Tuning Guide.)
Media Server is an appliance, so we don't tune it, and master server is a virtual machine with 4 VCPU and 8 GB of RAM. we suspect a network outage and network team investigates this way. my biggest problem is why job still active with such error message ?
Does a duplicate address or something like this could explain this ?
usually, a network read failure stop the job