Oracle RMAN backup status 13, timer expired
Hi all
Sometimes, but not regularly, we experience one of our large Oracle backup ending with status 13.
Archive logs is always successful, so is the FULLs. This applies to our diffs.
We have a few thousand jobs that backup to a 5330HA cluster. Our Oracle backups runs NBU 8.1.2 on RHEL 7. Mediaserver is Appliance 3.1.2, with latest MSDP EEB bundle.
Here is from Job Details of Parent job:
16.mar.2020 03:39:20 - Info bphdb (pid=120725) INF - input datafile file number=00029 name=+DATA1/PXCDBL_DBM/82CDF1F23C1993F2E053C443F80AE973/DATAFILE/datex_lob.2830.1001260637
16.mar.2020 03:39:20 - Info bphdb (pid=120725) INF - input datafile file number=00024 name=+DATA1/PXCDBL_DBM/82CDF1F23C1993F2E053C443F80AE973/DATAFILE/system.2824.1001259231
16.mar.2020 03:39:21 - Info bphdb (pid=120725) INF - input datafile file number=00027 name=+DATA1/PXCDBL_DBM/82CDF1F23C1993F2E053C443F80AE973/DATAFILE/users.2828.1001259241
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - released channel: ch00
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - released channel: ch01
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - RMAN-00571: ===========================================================
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - RMAN-00571: ===========================================================
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - RMAN-03009: failure of backup command on ch00 channel at 03/16/2020 07:25:33
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - ORA-19511: non RMAN, but media manager or vendor specific failure, error text:
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - Failed to process backup file <bk_dPXCDBL_un2ur6tn8_s29410_p1_t1035171560>
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - ORA-19502: write error on file "bk_dPXCDBL_un2ur6tn8_s29410_p1_t1035171560", block number 1 (block size=8192)
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - ORA-27030: skgfwrt: sbtwrite2 returned error
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - ORA-19511: non RMAN, but media manage
16.mar.2020 07:25:50 - Info bphdb (pid=120725) INF - Recovery Manager complete.
16.mar.2020 07:25:55 - Info bphdb (pid=120725) INF - End of Recovery Manager output.
16.mar.2020 07:25:55 - Info bphdb (pid=120725) INF - End Oracle Recovery Manager.
Here is from job details of the failing job:
16.mar.2020 03:39:35 - Info bptm (pid=233514) start backup
16.mar.2020 03:40:38 - Info bptm (pid=233514) backup child process is pid 235121
16.mar.2020 03:40:38 - begin writing
16.mar.2020 06:40:39 - Error bpbrm (pid=233505) socket read failed: errno = 62 - Timer expired
16.mar.2020 06:40:41 - Error bptm (pid=233514) media manager terminated by parent process
Obviousley there is a timeout involved here, but where? The backup job shows always timeout after exactly 3 hrs.