Re: Backup Oracle Error Code (13) - File Read Fail...

Mr__Backup_Man · ‎01-02-2017

Hello All.

Please.

I need your help with NBU 7.6.04, Oracle backup RMAN Error Code (13).

NBU Version: 7.6.0.4

NBU Master Server: Microsoft Windows Server 2008 R2 Enterprise

NBU Media Server: Linux SeSE Enterprise Server 11

NBU Client: Linux Red Hat Enterprise Server 6.3

Oracle Version: 11G Release 2

28/12/2016 10:14:34 - Info bpbrm(pid=47977) bonito002 is the host to backup data from
28/12/2016 10:14:34 - Info bpbrm(pid=47977) reading file list for client
28/12/2016 10:14:34 - Info bpbrm(pid=47977) listening for client connection
28/12/2016 10:14:35 - Info bpbrm(pid=47977) INF - Client read timeout = 36000
28/12/2016 10:14:36 - Info bpbrm(pid=47977) accepted connection from client
28/12/2016 10:14:36 - Info dbclient(pid=24904) Backup started
28/12/2016 10:14:36 - Info bpbrm(pid=47977) bptm pid: 48120
28/12/2016 10:14:36 - Info bptm(pid=48120) start
28/12/2016 10:14:36 - Info bptm(pid=48120) using 262144 data buffer size
28/12/2016 10:14:36 - Info bptm(pid=48120) using 30 data buffers
28/12/2016 10:14:41 - Info bptm(pid=48120) start backup
28/12/2016 10:14:44 - Info bptm(pid=48120) backup child process is pid 48413
28/12/2016 10:15:21 - Info nbjm(pid=36668) starting backup job (jobid=632185) for client bonito002, policy ORACLE_Bonitocl_PRATA, schedule Default-Application-Backup
28/12/2016 10:15:21 - Info nbjm(pid=36668) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=632185, request id:{8E159BC5-2DC9-41B7-AC9B-47601AB68DE4})
28/12/2016 10:15:21 - requesting resource araguaia02.stu.dedup
28/12/2016 10:15:21 - requesting resource araguaia.NBU_CLIENT.MAXJOBS.bonito002
28/12/2016 10:15:21 - requesting resource araguaia.NBU_POLICY.MAXJOBS.ORACLE_Bonitocl_PRATA
28/12/2016 10:15:21 - granted resource araguaia.NBU_CLIENT.MAXJOBS.bonito002
28/12/2016 10:15:21 - granted resource araguaia.NBU_POLICY.MAXJOBS.ORACLE_Bonitocl_PRATA
28/12/2016 10:15:21 - granted resource MediaID=@aaaa4;DiskVolume=PureDiskVolume;DiskPool=DiskPool02;Path=PureDiskVolume;StorageServer=araguaia02;MediaServer=araguaia02
28/12/2016 10:15:21 - granted resource araguaia02.stu.dedup
28/12/2016 10:15:21 - estimated 0 Kbytes needed
28/12/2016 10:15:21 - Info nbjm(pid=36668) started backup (backupid=bonito002_1482927321) job for client bonito002, policy ORACLE_Bonitocl_PRATA, schedule Default-Application-Backup on storage unit araguaia02.stu.dedup
28/12/2016 10:15:23 - started process bpbrm (47977)
28/12/2016 10:15:24 - connecting
28/12/2016 10:15:25 - connected; connect time: 0:00:01
28/12/2016 10:15:33 - Info dbclient(pid=24904) dbclient(pid=24904) wrote first buffer(size=262144)
28/12/2016 10:15:34 - begin writing
28/12/2016 12:12:29 - Info bptm(pid=48120) waited for full buffer 894 times, delayed 463136 times
28/12/2016 12:12:30 - Info bptm(pid=48120) EXITING with status 0 <----------
28/12/2016 12:12:30 - Info araguaia02(pid=48120) StorageServer=PureDisk:araguaia02; Report=PDDO Stats (multi-threaded stream used) for (araguaia02): scanned: 1071650 KB, CR sent: 600996 KB, CR sent over FC: 0 KB, dedup: 43.9%, cache hits: 1 (0.0%)
28/12/2016 12:33:38 - Error bpbrm(pid=47977) socket read failed: errno = 104 - Connection reset by peer
28/12/2016 12:33:38 - Info dbclient(pid=24904) done. status: 13: file read failed
28/12/2016 12:34:28 - end writing; write time: 2:18:54
file read failed(13)

Michael_G_Ander · ‎01-02-2017

This line

Error bpbrm(pid=47977) socket read failed: errno = 104 - Connection reset by peer

makes me suspect an anti-virus or firewall introduced timeout, the later can often be migated by decreasing the TCP keepalive

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

Nicolai · ‎01-02-2017

set CLIENT_READ_TIMEOUT = 1800 on master/media and client.

This is a well know behaviour with Oracle backup, you may even need to set it to 3600.

Mr__Backup_Man · ‎01-03-2017

Hi Nicolai.

Thanks for your reply.

Attached the screens files.

Mr__Backup_Man · ‎01-03-2017

Hi Michael.

Thanks for your reply.

Attached the screen file the keepalive the firewall.

Nicolai · ‎01-04-2017

Do you still see the issues after setting CLIENT_READ_TIMEOUT ?

Mr__Backup_Man · ‎01-04-2017

Nicolai,

Yes, the error still continues.

Nicolai · ‎01-05-2017

Is there still 20 minutes between the two lines ?

28/12/2016 12:12:30 - Info araguaia02(pid=48120) StorageServer=PureDisk:araguaia02; Report=PDDO Stats (multi-threaded stream used) for (araguaia02): scanned: 1071650 KB, CR sent: 600996 KB, CR sent over FC: 0 KB, dedup: 43.9%, cache hits: 1 (0.0%)
28/12/2016 12:33:38 - Error bpbrm(pid=47977) socket read failed: errno = 104 - Connection reset by peer

Firewalls ?

Endpoint protection applications ?

What does the RMAN output say, is it the same stage that fails every time, e.g. backup of archives files ?

Mr__Backup_Man · ‎01-05-2017

Nicolai, This is the error that appears for the bank.

channel ORA_SBT_TAPE_2: backup set complete, elapsed time: 01:26:44

RMAN-03009: failure of backup command on ORA_SBT_TAPE_4 channel at 01/05/2017 00:58:37

ORA-27192: skgfcls: sbtclose2 retornou erro - falha ao fechar arquivo

ORA-19511: Erro recebido da camada do gerenciador de midia, texto do erro:

Failed to process backup file <0qrp7bg7_1_1>

ORA-19502: erro de gravac?o no arquivo "0qrp7bg7_1_1", numero do bloco 26017 (tamanho do bloco=8192)

ORA-27030: skgfwrt: sbtwrite2 retornou erro

ORA-19511: Erro recebido da camada do gerenciador de midia, texto do erro:

VxBSASendData: Failed with error:

channel ORA_SBT_TAPE_4 disabled, job failed on it will be run on another channel

Marianne · ‎01-06-2017

You have not really answered any of Nicolai's last set of questions.

Please ensure that all of these log folders exist for further troubleshooting:

On master: bprd (restart NBU after creating the folder)
On media server: bpbrm and bptm
On client: dbclient (with 777 permissions) and rman output file specified in script.

After the next failure, please collect all the logs and rename them as follows:
bprd.txt
bpbrm.txt
bptm.txt
dbclient.txt
rman-out.txt

Please upload all these logs as file attachments.

If there is a firewall anywhere in the picture, please configure TCP KeepAlive setting to the same value on the master, media server and client.

See these TNs:

http://www.veritas.com/docs/000083822

http://www.veritas.com/docs/000076435

Handy NetBackup Links

Mr__Backup_Man · ‎01-09-2017

Good morning Nicolai.
I am expecting a window to change the keepalive of the firewall to 3600ms because they are set to 1800ms.
thank you.

Mr__Backup_Man · ‎01-11-2017

Good morning people.
Yesterday we changed the keepalive of the firewall to 3600 and the backup so far has not presented an error. I will wait another 3 days to check if there is any change related problem.
Thank you all.

VOX

Backup Oracle Error Code (13) - File Read Failed