Solved: Backup failing with error code media write error (...

dugga · ‎02-20-2017

Sometime few oracle backup jobs fail with the following for few clients.

Can someone direct me which article to follow to fix these errors.

http://www.veritas.com/docs/000009598

I'm unable to do troubleshooting. Kindly help.

.

02/19/2017 00:46:47 - Info bptm (pid=5912) start backup
02/19/2017 00:46:49 - Info bptm (pid=5912) backup child process is pid 5964
02/19/2017 00:46:49 - begin writing
02/19/2017 00:46:50 - Info dbclient (pid=36897928) dbclient(pid=36897928) wrote first buffer(size=262144)
02/19/2017 10:23:52 - Critical bptm (pid=5912) Storage Server Error: (Storage server: PureDisk:xxxxx) mtstrm_write_segment: Fatal error occured in Multi-Threaded Agent: Timed out after waiting 1200s to send data to mtstrmd on stream /xxxxxx/Policy_xxxx/xxx_xxyyyzzz_AB_F20.img V-454-95
02/19/2017 10:23:52 - Critical bptm (pid=5912) image write failed: error 2060019: error occurred on network socket
02/19/2017 10:59:17 - Error bptm (pid=5912) cannot write image to disk, Invalid argument
02/19/2017 10:59:17 - Info bptm (pid=5912) EXITING with status 84 <----------
02/19/2017 10:59:17 - Info dbclient (pid=36897928) done. status: 6
02/19/2017 10:59:17 - Info s60013 (pid=5912) StorageServer=PureDisk:s60013; Report=PDDO Stats (multi-threaded stream used) for (s60013): scanned: 454472727 KB, CR sent: 273648201 KB, CR sent over FC: 0 KB, dedup: 39.8%, cache hits: 0 (0.0%)
02/19/2017 10:59:19 - Info dbclient (pid=36897928) done. status: 84: media write error
02/19/2017 10:59:19 - end writing; write time: 10:12:30
media write error (84)

Marianne · ‎02-21-2017

Seems you forgot to attach the logs. When you do, please copy them to .txt files (e.g. bptm.txt) and then upload. To check KeepAlive settings, have a look at the last 2 tables in this TN: http://www.veritas.com/docs/000005752

Handy NetBackup Links

View solution in original post

Marianne · ‎02-20-2017

Please always mention NBU version and OS on NBU master and media server.
This is extremely relevant as various improvements have been introduced across different NBU versions.

Is PureDisk used here as NBU MSDP or separate Puredisk server?

Are you using Media server dedupe or client-side dedupe?

What is Client Read Timeout on media server?

Do you have bptm, bpdm and bpbrm log folders on the media server?
And logging level set to 3 (minimum)?

Are you only experiencing problems with long-running backups and smaller backups going through successful?

Have you checked KeepAlive settings on master, media and Oracle client?

Handy NetBackup Links

dugga · ‎02-20-2017

Master Server : NetBackup-RedHat 2.6.18 7.6.0.3

Netbackup Appliance 5220 (Media Server) : NetBackup-SuSE2.6.16 7.6.0.3

Appliance Version is 2.6.0.3.

We are using Media Server Dedup Pool.

Client connect timeout settings is 9000 seconds

Client read time out settings is 7200 seconds

Logging level is set to minimum

This is mostly seen while taking full backups which is taken during weekends.

Where and how can I check the Keepalive settings.

Also in these cases how can I validate the data which is backed up.Because what I can see is oracle jobs runs in batches triggered via the RMAN script. The subsequent jobs runs successfully. I wonder how is it running in batches , is this defined at netbackup end or in the RMAN?

I have taken the logs only for 19, attached them along

Thanks in advance.

Marianne · ‎02-21-2017

Seems you forgot to attach the logs. When you do, please copy them to .txt files (e.g. bptm.txt) and then upload. To check KeepAlive settings, have a look at the last 2 tables in this TN: http://www.veritas.com/docs/000005752

Handy NetBackup Links

dugga · ‎02-28-2017

I thought those were already attached.

I will attach them .

Thanks for the links :)

VOX

Backup failing with error code media write error (84) for Oracle backups