Oracle Backup channels failing
We are having a problem when one or more of the channels for an RMAN backup fails . The backup starts with 6 channels, if one fails, the backup continues with 5. This past weekend, 3 channels going to the same drive/tape all failed with a media error, and the backup continued with only 3 channels, which is not enough bandwidth to get the backup completed in the allowed time. As a stop gap we're going to increase the channels so if something fails, we'll still have enough to continue effectivly. Is there a better solution that can be done for a failed channel to be restarted/reused?
- Have a look at this post https://vox.veritas.com/t5/NetBackup/Oracle-Application-Backups/m-p/843736 for replies by Genericus.
You will see how rman script can be customized to rerun failed jobs.
If you have problematic drives or media, then not a good idea to have multiple channels going to the same drive/media. Look at schedule MPX levels.
You may also want to consider introducing disk as first stage of your backups.
Dedupe preferrably if budget allows. I do not know of any mechanism to automatically restart channels, if one fails the job continues with one fewer channel.
Our only solution is to run the backup as SYSDATE-X, so the job could be restarted and not attempt to re-backup pieces already done. ALSO, we had our OPS staff monitor the job and kill and restart if the number of child jobs falls below a certain level.
Most of our issues went away once we went to the data domains, so far they accept the data as fast as we can send it. We were able to increase the number of channels.
BEWARE! If you are sending packets over the LAN, there is CPU overhead caused by encapsulating the TCP/IP packets. I was able to put my system to 100% CPU by increasing the number of channels too high. That is why we use Fiber Channel for most of our large data bases. We are implementing 10G networking, we shall see how that goes...