Oracle backup failure with status code : 84

dugga · ‎03-20-2017

In my environment sometimes I see full backups (Oracle RMAN Backups) failing with error status 84.

Master server : 7.6.0.3
Media Server 5220 Appliance : 2.6.0.3

Host : AIX 7 ( where the oracle databases are running )

Does this article even applies to Oracle backups:

STATUS CODE: 84 "media write error." When a disk storage unit in a storage unit group is 100% full, ...

Netbackup detailed log:

Critical bptm (pid=2374) Storage Server Error: (Storage server: PureDisk:Media1) mtstrm_write_segment: Fatal error occured in Multi-Threaded Agent: Timed out after waiting 1200s to send data to mtstrmd on stream /ProductionDUO/Oracle_Production_IBM/ProductionDUO_1489275180_C1_F15.img V-454-95

Critical bptm (pid=2374) image write failed: error 2060019: error occurred on network socket

Error bptm (pid=2374) cannot write image to disk, Invalid argument

Info bptm (pid=2374) EXITING with status 84 <----------

Info dbclient (pid=57737564) done. status: 6

Info Media1 (pid=2374) StorageServer=PureDisk:Media1; Report=PDDO Stats (multi-threaded stream used) for (Media1): scanned: 370495507 KB, CR sent: 206379237 KB, CR sent over FC: 0 KB, dedup: 44.3%, cache hits: 0 (0.0%)

Info dbclient (pid=57737564) done. status: 84: media write error

end writing; write time: 7:55:28

media write error (84)

RMAN log:

RMAN-03009: failure of backup command on ch00 channel at 03/12/2017 08:28:21

ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: Error received from media manager layer, error text:
Failed to process backup file <u9rutepi_1_1>
ORA-19502: write error on file "u9rutepi_1_1", block number 48331457 (block size=8192)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
VxBSASendData: Failed with error:
channel ch00 disabled, job failed on it will be run on another channel

RiaanBadenhorst · ‎03-20-2017

Hi

Are all your backups failing with 84 or just this Oracle job? If its only one then it could be that you've enabled client-side dedupe and this actually a comms error. You need to look into the bptm log more to see why the 84 occurs.

dugga · ‎03-20-2017

Not all my jobs had failed, only 3 out of 52 jobs , i dont know if I should change any settings in netbackup to avoid such errors.

I couldnt find anything in the bptm logs, moreover the verbose settings was at Minimum logging on that day.

Now I have set to 2.

Will I be able to gain anything by changing the Max. concurrent jobs.

Even I'm not sure how to validate backup with the other jobs which ran during the time, I dont knw if this failed jobs are getting retried on the subsequent jobs, how can we ensure this. As it is an RMAN backup i'm unable to verify the data which is backed up

Tape_Archived · ‎03-20-2017

Please check your timout setting, you may refer to one of the previous solutions if it helps - https://vox.veritas.com/t5/NetBackup/sts-close-handle-failed-2060019-error-occurred-on-network-socke...

Tousif · ‎03-22-2017

Hello,

Is failed backup on same block number every time?

ORA-19502: write error on file "u9rutepi_1_1", block number 48331457 (block size=8192)

The error leading to write error on disk.

Is there other clients backup for that media server working?

If we try to take incremental backup for problematic client, What happen?

Any I/O error in system log on that media server?

Is there any difference in block size of successful written and problematic block?

Thanks & Regards,

dugga · ‎03-22-2017

Is failed backup on same block number every time? Different blocks

ORA-19502: write error on file "u9rutepi_1_1", block number 48331457 (block size=8192)

The error leading to write error on disk.

Is there other clients backup for that media server working? Yes it is working well for other clients for the same media servers , no failures reported.

If we try to take incremental backup for problematic client, What happen? Incremental backups are successful.

Any I/O error in system log on that media server? Our Media server is nothing but a 5220 Netbackp disk appliance

Is there any difference in block size of successful written and problematic block? I'm not sure on this

Tousif · ‎03-22-2017

Hello,

So the oracle full backup having issue.

Are you using client site deduplication? If yes, Can we change it to media server and try to take backup?

How many stream get trigger at same time for backup?

How many stream get active at same time? Can we reduce the stream and try backup?

Can you share complete dbclient log and failed job detail status with us?

Before do change and re-start the backup. Enable below logs for more investigation.

Media:

bptm

bpdm

bpbrm

Client:

dbclient

Script output

bphdb

Thanks & Regards

dugga · ‎03-24-2017

Hi Thanks for the response.

We are not using client side deduplication.

Where can I find the number of streams allowed to run for a client.

I could see that minimum 4 job streams are running always, and sometime upto 8 streams are active. I got this from the activity monitor by checking the start time.

To enable the log which you have mentioned what should be the error logging , currently I have set the error logging level to 2

Tousif · ‎03-24-2017

Hello

Basically the stream handle by rman in Oracle but number of stream should be active can control by NBU.

You can select check box "Limit jobs per policy" can define the numbers of jobs active.

To isolate the issue we can try to take backup on different storage unit (Different disk or tape).

It will give us clear idea to troubleshoot further (Client site or Media server site).

Example: If different storage unit complete the backup it mean we need work on media server site.

Verbose 5 on both server enough to troubleshoot :)

Thanks & Regards

VOX

Oracle backup failure with status code : 84