cancel
Showing results for 
Search instead for 
Did you mean: 

Oracle Application Backups

Trf1-BRA
Level 4

Hi everyone,

_ We need help on the way of working and effectiveness of Oracle Netbackup policies. The situation is the following: We have a lot of netbackup policies with rman scripts and the schedules are Full backups and Application backups.

_ It is the question: If we have a parent job that fails because of a problem in one of many children (application backups), so I would like to know if I have to discard all the other good childs and miss all backup jobs. The question is because of space and resource limitations that we have to maintain a lot of imagens of these databases.

_ I hope to have been clear.

Regards,
Marcio Oliveira.

9 REPLIES 9

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Impossible to say without knowing what the error is for individual stream failure.

You need to troubleshoot each failed stream.

List backups afterwards within Rman and using bplist. Let the dba decide if the successful jobs are any good.

Genericus
Moderator
Moderator
   VIP   

I would check with the Oracle DBA, but my scripts call the RMAN catalog to run a backup, and if child jobs fail, the RMAN catalog tracks those and re-running the backup should just run those failed pieces.

There are lines in the script like:

backup
incremental level 0
skip inaccessible
tag EDL_ICCONTR
filesperset 3
# recommended format
format 'bk_%s_%p_%t'
database NOT BACKED UP SINCE TIME 'SYSDATE-.1';
debug off;

The line "database NOT BACKED UP SINCE TIME 'SYSDATE-.1';" can be adjusted, so it will not redo pieces backed up since SYSDATE-(VALUE) - you may need to change the value depending on how long your backup takes.

 

 

NetBackup 8.3.0.1 on Solaris 11, writing to Data Domain 9800 6.2.1.0
duplicating via SLP to LTO5 in SL8500 via ACSLS

Hi Marianne,


Thanks about your fast answer!

You said "You need to troubleshoot each failed stream.", but if only one child stream fails the parent fails too. Is it correct?

It's complicate to explain because we have a lot of oracle rman policies. My doubt is because sometimes a parent throws a lot of child streamings along days of execution and, all of sudden, a job fails because of a network problem for exemple. It's difficult to me to be sure that all the another successfull child jobs have backup images that could be used in future in a possible restoration or i can discard these images and release space within the disks/tapes.
So, Do I check this only with support of the DBA (with rman/bplist)?

The use of oracle intelligence policies could be a possible solution to this doubts?

(Our Netbackup is at 7.6.0.4).

Thanks!

Marcio Oliveira

Hi Genericus,

Thanks,

I'll suggest this to our DB area.

 

Marcio Oliveira

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

" ...  if only one child stream fails the parent fails too. Is it correct? "

Correct. So, NBU may (or may not) keep successful streams. I don't know for sure.
dba can check with rman queries, you can check with bplist on the master server.

@Genericus's suggestion seems to be the correct solution here - get rman to check and rerun failed backups. 

 

Genericus
Moderator
Moderator
   VIP   

There is a lot of internal recovery within RMAN.

We have several large DB ( large to me, 40TB ) - I run 6 child jobs over Fiber Channel to a data domain @ 200MB/Sec each. There are times when a child job fails ( although not often, and not recently ) the backup simply drops one child stream and continues as 5 child jobs. In the past, we have had our OPS let it run until it gets to 4 or less streams, then just restart it. When it restarts, it determines what to do and picks up any failed child jobs. 

However - when the backup starts and passes control to RMAN, RMAN can take a while to figure out what to backup, before it starts, so you are wasting time the more times you stop and restart it.

Interestingly this works the same for restore/recovery - if you have a recovery fail due to tape contention timeout, the DBA can just re-try the recovery and it will only retry the failed pieces.

 

NetBackup 8.3.0.1 on Solaris 11, writing to Data Domain 9800 6.2.1.0
duplicating via SLP to LTO5 in SL8500 via ACSLS

 

Thanks Marianne! I'll recommend this.

 

Marcio Oliveira

When you said "I run 6 child jobs", where did you configure this? RMAN or Netbackup Master Server - Oracle Policies?

So, I understood that it's important to control the fails and the backups' restarts within RMAN.

It's good to know a little bit about your enviroment, it will support us to improve ours.

 

Marcio Oliveira

Genericus
Moderator
Moderator
   VIP   

Well, you do have to make sure you do not limit it within NetBackup - in the client or storage definitions, but the number of child jobs is set within the scripts:

Essentially, the Oracle DBA modifies the number of child jobs by how many allocate channel commands in the script. There are 6 in the lines below, if I commented out 4,5,6 it would only run 3. You cannot restore more channels than you backup, so it is a combination of backup performance and restore performance.

connect target rcat/rcat@ORADB
run {
# Hot database level 0 whole backup
allocate channel t1 type 'SBT_TAPE';
allocate channel t2 type 'SBT_TAPE';
allocate channel t3 type 'SBT_TAPE';
allocate channel t4 type 'SBT_TAPE';
allocate channel t5 type 'SBT_TAPE';
allocate channel t6 type 'SBT_TAPE';
send 'NB_ORA_CLASS=POLICYNAME, NB_ORA_CLIENT=CLIENTNAME';
sql 'alter system archive log current';
debug io;
backup
incremental level 0
skip inaccessible
tag ORADB
filesperset 1 ( I use 1 to optimize the deduplication of my data domains, I use 3 for speed )
# recommended format
format 'bk_%s_%p_%t'
database NOT BACKED UP SINCE TIME 'SYSDATE-2';
#database ;
debug off;
sql 'alter system archive log current';

 

 

 

p.s. these scripts are simply the basic ones that came with netbackup.  A shell script and a rcv script, updated with our policy and client names and database info. 

NetBackup 8.3.0.1 on Solaris 11, writing to Data Domain 9800 6.2.1.0
duplicating via SLP to LTO5 in SL8500 via ACSLS