cancel
Showing results for 
Search instead for 
Did you mean: 

Backup Oracle Error Code (13) - File Read Failed

Mr__Backup_Man
Level 3

Hello All.

Please.

I need your help with NBU 7.6.04, Oracle backup RMAN Error Code (13).

NBU Version: 7.6.0.4

NBU Master Server: Microsoft Windows Server 2008 R2 Enterprise

NBU Media Server: Linux SeSE Enterprise Server 11

NBU Client: Linux Red Hat Enterprise Server 6.3

Oracle Version: 11G Release 2

28/12/2016 10:14:34 - Info bpbrm(pid=47977) bonito002 is the host to backup data from
28/12/2016 10:14:34 - Info bpbrm(pid=47977) reading file list for client
28/12/2016 10:14:34 - Info bpbrm(pid=47977) listening for client connection
28/12/2016 10:14:35 - Info bpbrm(pid=47977) INF - Client read timeout = 36000
28/12/2016 10:14:36 - Info bpbrm(pid=47977) accepted connection from client
28/12/2016 10:14:36 - Info dbclient(pid=24904) Backup started
28/12/2016 10:14:36 - Info bpbrm(pid=47977) bptm pid: 48120
28/12/2016 10:14:36 - Info bptm(pid=48120) start
28/12/2016 10:14:36 - Info bptm(pid=48120) using 262144 data buffer size
28/12/2016 10:14:36 - Info bptm(pid=48120) using 30 data buffers
28/12/2016 10:14:41 - Info bptm(pid=48120) start backup
28/12/2016 10:14:44 - Info bptm(pid=48120) backup child process is pid 48413
28/12/2016 10:15:21 - Info nbjm(pid=36668) starting backup job (jobid=632185) for client bonito002, policy ORACLE_Bonitocl_PRATA, schedule Default-Application-Backup
28/12/2016 10:15:21 - Info nbjm(pid=36668) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=632185, request id:{8E159BC5-2DC9-41B7-AC9B-47601AB68DE4})
28/12/2016 10:15:21 - requesting resource araguaia02.stu.dedup
28/12/2016 10:15:21 - requesting resource araguaia.NBU_CLIENT.MAXJOBS.bonito002
28/12/2016 10:15:21 - requesting resource araguaia.NBU_POLICY.MAXJOBS.ORACLE_Bonitocl_PRATA
28/12/2016 10:15:21 - granted resource araguaia.NBU_CLIENT.MAXJOBS.bonito002
28/12/2016 10:15:21 - granted resource araguaia.NBU_POLICY.MAXJOBS.ORACLE_Bonitocl_PRATA
28/12/2016 10:15:21 - granted resource MediaID=@aaaa4;DiskVolume=PureDiskVolume;DiskPool=DiskPool02;Path=PureDiskVolume;StorageServer=araguaia02;MediaServer=araguaia02
28/12/2016 10:15:21 - granted resource araguaia02.stu.dedup
28/12/2016 10:15:21 - estimated 0 Kbytes needed
28/12/2016 10:15:21 - Info nbjm(pid=36668) started backup (backupid=bonito002_1482927321) job for client bonito002, policy ORACLE_Bonitocl_PRATA, schedule Default-Application-Backup on storage unit araguaia02.stu.dedup
28/12/2016 10:15:23 - started process bpbrm (47977)
28/12/2016 10:15:24 - connecting
28/12/2016 10:15:25 - connected; connect time: 0:00:01
28/12/2016 10:15:33 - Info dbclient(pid=24904) dbclient(pid=24904) wrote first buffer(size=262144)
28/12/2016 10:15:34 - begin writing
28/12/2016 12:12:29 - Info bptm(pid=48120) waited for full buffer 894 times, delayed 463136 times
28/12/2016 12:12:30 - Info bptm(pid=48120) EXITING with status 0 <----------
28/12/2016 12:12:30 - Info araguaia02(pid=48120) StorageServer=PureDisk:araguaia02; Report=PDDO Stats (multi-threaded stream used) for (araguaia02): scanned: 1071650 KB, CR sent: 600996 KB, CR sent over FC: 0 KB, dedup: 43.9%, cache hits: 1 (0.0%)
28/12/2016 12:33:38 - Error bpbrm(pid=47977) socket read failed: errno = 104 - Connection reset by peer
28/12/2016 12:33:38 - Info dbclient(pid=24904) done. status: 13: file read failed
28/12/2016 12:34:28 - end writing; write time: 2:18:54
file read failed(13)

11 REPLIES 11

Michael_G_Ander
Level 6
Certified

This line

Error bpbrm(pid=47977) socket read failed: errno = 104 - Connection reset by peer

makes me suspect an anti-virus or firewall introduced timeout, the later can often be migated by decreasing the TCP keepalive

 

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

Nicolai
Moderator
Moderator
Partner    VIP   

set CLIENT_READ_TIMEOUT = 1800 on master/media and client. 

This is a well know behaviour with Oracle backup, you may even need to set it to 3600.

Hi Nicolai.

Thanks for your reply.

Attached the screens files.

 

Hi Michael.

Thanks for your reply.

Attached the screen file the keepalive the firewall.

 

Nicolai
Moderator
Moderator
Partner    VIP   

Do you still see the issues after setting CLIENT_READ_TIMEOUT ?

Nicolai,

Yes, the error still continues. 

Nicolai
Moderator
Moderator
Partner    VIP   

Is there still 20 minutes between the two lines ?

28/12/2016 12:12:30 - Info araguaia02(pid=48120) StorageServer=PureDisk:araguaia02; Report=PDDO Stats (multi-threaded stream used) for (araguaia02): scanned: 1071650 KB, CR sent: 600996 KB, CR sent over FC: 0 KB, dedup: 43.9%, cache hits: 1 (0.0%)
28/12/2016 12:33:38 - Error bpbrm(pid=47977) socket read failed: errno = 104 - Connection reset by peer

Firewalls ?

Endpoint protection applications ?

What does the RMAN output say, is it the same stage that fails every time, e.g. backup of archives files ?

Nicolai, This is the error that appears for the bank.

 

channel ORA_SBT_TAPE_2: backup set complete, elapsed time: 01:26:44

RMAN-03009: failure of backup command on ORA_SBT_TAPE_4 channel at 01/05/2017 00:58:37

ORA-27192: skgfcls: sbtclose2 retornou erro - falha ao fechar arquivo

ORA-19511: Erro recebido da camada do gerenciador de midia, texto do erro:

   Failed to process backup file <0qrp7bg7_1_1>

ORA-19502: erro de gravac?o no arquivo "0qrp7bg7_1_1", numero do bloco 26017 (tamanho do bloco=8192)

ORA-27030: skgfwrt: sbtwrite2 retornou erro

ORA-19511: Erro recebido da camada do gerenciador de midia, texto do erro:

   VxBSASendData: Failed with error:

channel ORA_SBT_TAPE_4 disabled, job failed on it will be run on another channel

 

Marianne
Level 6
Partner    VIP    Accredited Certified

You have not really answered any of Nicolai's last set of questions.

Please ensure that all of these log folders exist for further troubleshooting:

On master: bprd (restart NBU after creating the folder)
On media server: bpbrm and bptm
On client: dbclient (with 777 permissions) and rman output file specified in script.

After the next failure, please collect all the logs and rename them as follows:
bprd.txt
bpbrm.txt
bptm.txt
dbclient.txt
rman-out.txt

Please upload all these logs as file attachments.

If there is a firewall anywhere in the picture, please configure TCP KeepAlive setting to the same value on the master, media server and client.

See these TNs:

http://www.veritas.com/docs/000083822

http://www.veritas.com/docs/000076435

 

Mr__Backup_Man
Level 3

Good morning Nicolai.
I am expecting a window to change the keepalive of the firewall to 3600ms because they are set to 1800ms.
thank you.

 

 

Good morning people.
Yesterday we changed the keepalive of the firewall to 3600 and the backup so far has not presented an error. I will wait another 3 days to check if there is any change related problem.
Thank you all.