01-02-2017 05:06 AM
Hello All.
Please.
I need your help with NBU 7.6.04, Oracle backup RMAN Error Code (13).
NBU Version: 7.6.0.4
NBU Master Server: Microsoft Windows Server 2008 R2 Enterprise
NBU Media Server: Linux SeSE Enterprise Server 11
NBU Client: Linux Red Hat Enterprise Server 6.3
Oracle Version: 11G Release 2
28/12/2016 10:14:34 - Info bpbrm(pid=47977) bonito002 is the host to backup data from
28/12/2016 10:14:34 - Info bpbrm(pid=47977) reading file list for client
28/12/2016 10:14:34 - Info bpbrm(pid=47977) listening for client connection
28/12/2016 10:14:35 - Info bpbrm(pid=47977) INF - Client read timeout = 36000
28/12/2016 10:14:36 - Info bpbrm(pid=47977) accepted connection from client
28/12/2016 10:14:36 - Info dbclient(pid=24904) Backup started
28/12/2016 10:14:36 - Info bpbrm(pid=47977) bptm pid: 48120
28/12/2016 10:14:36 - Info bptm(pid=48120) start
28/12/2016 10:14:36 - Info bptm(pid=48120) using 262144 data buffer size
28/12/2016 10:14:36 - Info bptm(pid=48120) using 30 data buffers
28/12/2016 10:14:41 - Info bptm(pid=48120) start backup
28/12/2016 10:14:44 - Info bptm(pid=48120) backup child process is pid 48413
28/12/2016 10:15:21 - Info nbjm(pid=36668) starting backup job (jobid=632185) for client bonito002, policy ORACLE_Bonitocl_PRATA, schedule Default-Application-Backup
28/12/2016 10:15:21 - Info nbjm(pid=36668) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=632185, request id:{8E159BC5-2DC9-41B7-AC9B-47601AB68DE4})
28/12/2016 10:15:21 - requesting resource araguaia02.stu.dedup
28/12/2016 10:15:21 - requesting resource araguaia.NBU_CLIENT.MAXJOBS.bonito002
28/12/2016 10:15:21 - requesting resource araguaia.NBU_POLICY.MAXJOBS.ORACLE_Bonitocl_PRATA
28/12/2016 10:15:21 - granted resource araguaia.NBU_CLIENT.MAXJOBS.bonito002
28/12/2016 10:15:21 - granted resource araguaia.NBU_POLICY.MAXJOBS.ORACLE_Bonitocl_PRATA
28/12/2016 10:15:21 - granted resource MediaID=@aaaa4;DiskVolume=PureDiskVolume;DiskPool=DiskPool02;Path=PureDiskVolume;StorageServer=araguaia02;MediaServer=araguaia02
28/12/2016 10:15:21 - granted resource araguaia02.stu.dedup
28/12/2016 10:15:21 - estimated 0 Kbytes needed
28/12/2016 10:15:21 - Info nbjm(pid=36668) started backup (backupid=bonito002_1482927321) job for client bonito002, policy ORACLE_Bonitocl_PRATA, schedule Default-Application-Backup on storage unit araguaia02.stu.dedup
28/12/2016 10:15:23 - started process bpbrm (47977)
28/12/2016 10:15:24 - connecting
28/12/2016 10:15:25 - connected; connect time: 0:00:01
28/12/2016 10:15:33 - Info dbclient(pid=24904) dbclient(pid=24904) wrote first buffer(size=262144)
28/12/2016 10:15:34 - begin writing
28/12/2016 12:12:29 - Info bptm(pid=48120) waited for full buffer 894 times, delayed 463136 times
28/12/2016 12:12:30 - Info bptm(pid=48120) EXITING with status 0 <----------
28/12/2016 12:12:30 - Info araguaia02(pid=48120) StorageServer=PureDisk:araguaia02; Report=PDDO Stats (multi-threaded stream used) for (araguaia02): scanned: 1071650 KB, CR sent: 600996 KB, CR sent over FC: 0 KB, dedup: 43.9%, cache hits: 1 (0.0%)
28/12/2016 12:33:38 - Error bpbrm(pid=47977) socket read failed: errno = 104 - Connection reset by peer
28/12/2016 12:33:38 - Info dbclient(pid=24904) done. status: 13: file read failed
28/12/2016 12:34:28 - end writing; write time: 2:18:54
file read failed(13)
01-02-2017 05:49 AM
This line
Error bpbrm(pid=47977) socket read failed: errno = 104 - Connection reset by peer
makes me suspect an anti-virus or firewall introduced timeout, the later can often be migated by decreasing the TCP keepalive
01-02-2017 06:56 AM
set CLIENT_READ_TIMEOUT = 1800 on master/media and client.
This is a well know behaviour with Oracle backup, you may even need to set it to 3600.
01-03-2017 07:15 AM
Hi Nicolai.
Thanks for your reply.
Attached the screens files.
01-03-2017 09:15 AM
Hi Michael.
Thanks for your reply.
Attached the screen file the keepalive the firewall.
01-04-2017 12:56 AM
Do you still see the issues after setting CLIENT_READ_TIMEOUT ?
01-04-2017 09:18 AM
Nicolai,
Yes, the error still continues.
01-05-2017 03:07 AM - edited 01-05-2017 03:08 AM
Is there still 20 minutes between the two lines ?
28/12/2016 12:12:30 - Info araguaia02(pid=48120) StorageServer=PureDisk:araguaia02; Report=PDDO Stats (multi-threaded stream used) for (araguaia02): scanned: 1071650 KB, CR sent: 600996 KB, CR sent over FC: 0 KB, dedup: 43.9%, cache hits: 1 (0.0%)
28/12/2016 12:33:38 - Error bpbrm(pid=47977) socket read failed: errno = 104 - Connection reset by peer
Firewalls ?
Endpoint protection applications ?
What does the RMAN output say, is it the same stage that fails every time, e.g. backup of archives files ?
01-05-2017 07:13 AM
Nicolai, This is the error that appears for the bank.
channel ORA_SBT_TAPE_2: backup set complete, elapsed time: 01:26:44
RMAN-03009: failure of backup command on ORA_SBT_TAPE_4 channel at 01/05/2017 00:58:37
ORA-27192: skgfcls: sbtclose2 retornou erro - falha ao fechar arquivo
ORA-19511: Erro recebido da camada do gerenciador de midia, texto do erro:
Failed to process backup file <0qrp7bg7_1_1>
ORA-19502: erro de gravac?o no arquivo "0qrp7bg7_1_1", numero do bloco 26017 (tamanho do bloco=8192)
ORA-27030: skgfwrt: sbtwrite2 retornou erro
ORA-19511: Erro recebido da camada do gerenciador de midia, texto do erro:
VxBSASendData: Failed with error:
channel ORA_SBT_TAPE_4 disabled, job failed on it will be run on another channel
01-06-2017 04:29 AM
You have not really answered any of Nicolai's last set of questions.
Please ensure that all of these log folders exist for further troubleshooting:
On master: bprd (restart NBU after creating the folder)
On media server: bpbrm and bptm
On client: dbclient (with 777 permissions) and rman output file specified in script.
After the next failure, please collect all the logs and rename them as follows:
bprd.txt
bpbrm.txt
bptm.txt
dbclient.txt
rman-out.txt
Please upload all these logs as file attachments.
If there is a firewall anywhere in the picture, please configure TCP KeepAlive setting to the same value on the master, media server and client.
See these TNs:
http://www.veritas.com/docs/000083822
http://www.veritas.com/docs/000076435
01-09-2017 03:01 AM
Good morning Nicolai.
I am expecting a window to change the keepalive of the firewall to 3600ms because they are set to 1800ms.
thank you.
01-11-2017 02:07 AM
Good morning people.
Yesterday we changed the keepalive of the firewall to 3600 and the backup so far has not presented an error. I will wait another 3 days to check if there is any change related problem.
Thank you all.