cancel
Showing results for 
Search instead for 
Did you mean: 

After netbackup upgrade from 7.7.3 to 8.1.1 RMAN backup logs are not getting deleted after completio

bkrishna11
Level 3

After netbackup upgrade from 7.7.3 to 8.1.1 RMAN backup logs are not getting deleted after completion of backup.

 

We have netbackup master win 2012 R2 with NBU version 8.1.1

and 2 Linux ( RHEL ) media servers with NBU version 8.1.1

and RMAN clients are solaris containers with OS version 5.10 and few are   5.11

 

We are running backup with separate backup nic.

 

Below is the RMAN log message which was provided by oracle team. But from our Netbackup console all parent and child shows as success.

 

RMAN-03009: failure of backup command on ORA_SBT_TAPE_7 channel at 03/10/2019 20:36:06
ORA-19509: failed to delete sequential file, handle="ARCH_Backup_IOP_set247628_piece1_copy1_20190310_qcts43g4_1_1", parms="ENV=(NB_ORA_POLICY=DB_RMAN_QA1_ORAARCH,NB_ORA_SERV=backupmaster01.domain.com)"
ORA-27027: sbtremove2 returned error
ORA-19511: Error received from media manager layer, error text:
Failed to remove, ARCH_Backup_IOP_set247628_piece1_copy1_20190310_qcts43g4_1_1, from image catalog.
ORA-27191: sbtinfo2 returned error

14 REPLIES 14

Mouse
Level 6
Partner    VIP    Accredited Certified

I am wondering which part of RMAN script is deleting the log, is it DELETE OBSOLETE?

Can you check if you did not have NBU Catalog running at the same time, I seen this issue when NBU Catalog is happening and RMAN cleans out some old backups through expiry - this is not allowed during Catalog backups

Mouse
Level 6
Partner    VIP    Accredited Certified

Oh, there is also a relevant technote https://www.veritas.com/support/en_US/article.100015746

It is happening all the time when no catalog backup running also same error. We have enabled bprd logs and reviewd and we don't see any clues. This is happening after the upgrade of master and media servers from 7.7.3 to 8.1.1. 

Netbackup client versions are at 7.6 and 7.7 version

Mouse
Level 6
Partner    VIP    Accredited Certified

Ok, but you still did not say which RMAN command gives you grief. It is important to establish which command gives the error.

Checked with oracle team they are not using command DELETE OBSOLETE in the script. Attached the script log.

We are facing same issue in older oracle databases 9i version backup is configured using crontab. Once log backup is completed there is no message appearing backup completed successfully to delete the backed up logs due to which backgound cron jobs not getting terminated for every hour.

 

ps -ef | grep script
oracle 19999 19978 0 11:20:01 ? 0:09 /bin/ksh /u01/home/oracle/scripts/arch_to_disk/arch_netback_1.ksh tidq D
oracle 16797 16777 0 15:20:01 ? 0:06 /bin/ksh /u01/home/oraclel/scripts/arch_to_disk/arch_netback_1.ksh kidq
oracle 24893 24852 0 07:20:01 ? 0:12 /bin/ksh /u01/home/oracle/scripts/arch_to_disk/arch_netback_1.ksh kiqr D

 

 

Mouse
Level 6
Partner    VIP    Accredited Certified

Sorry but troubleshooting does not work this way - somebody need to do the ground work and analyse the bphdb log and check which part of the script produces the error. The script itself is useful but does not point to the piece which breaks.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@bkrishna11 

Could you please share the script, the full RMAN log as well as dbclient log on the Oracle client?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Wait!

I think you have realized where the problem is: 


@bkrishna11 wrote:

Checked with oracle team they are not using command DELETE OBSOLETE in the script. Attached the script log.

We are facing same issue in older oracle databases 9i version backup is configured using crontab. Once log backup is completed there is no message appearing backup completed successfully to delete the backed up logs due to which backgound cron jobs not getting terminated for every hour.

ps -ef | grep script
oracle 19999 19978 0 11:20:01 ? 0:09 /bin/ksh /u01/home/oracle/scripts/arch_to_disk/arch_netback_1.ksh tidq D
oracle 16797 16777 0 15:20:01 ? 0:06 /bin/ksh /u01/home/oraclel/scripts/arch_to_disk/arch_netback_1.ksh kidq
oracle 24893 24852 0 07:20:01 ? 0:12 /bin/ksh /u01/home/oracle/scripts/arch_to_disk/arch_netback_1.ksh kiqr D



You need to trace the script output and NBU logs to see where it is getting stuck.
NBU logs:
dbclient on Oracle client
bprd and bpdbm on the master server (NBU needs to be restarted after these logs are created).

Have a look at this TN : https://www.veritas.com/content/support/en_US/article.100000534.html

This TN explains why image cleanup takes a long time and is not directly related to 8.x version:

The length of time taken for NetBackup to locate an image for RMAN depends upon the following factors:
- The number of RMAN backup pieces retained in the RMAN catalog and the retention settings for those pieces.
- The number of these RMAN backup pieces that are past their NetBackup retention period, i.e. expired by NetBackup but not by RMAN.
- The format of the RMAN backup piece name - does it have the Veritas recommended ' _%t' as the end of the format statement?
- The number of Oracle clients performing simultaneous RMAN operations; especially crosscheck and delete expired.
- The number of NetBackup policies of any kind.
- The length of time backups are retained by NetBackup and the number of backup images for the client in the NetBackup catalog.
- The length of time between RMAN catalog maintenance operations.
- The frequency of the RMAN catalog maintenance operations.
- The speed and accuracy of hostname resolution available to the NetBackup master server.
- The number and complexity of other operations performed by the NetBackup master server during a catalog request.
- Normal performance considerations due to processors, networks and other hardware and infrastructure.
 
Looking at above list, the problem is going to be compounded if one catalog maintenance job is still running and the another one is started... and an hour later another one....
Maybe add a check in the script - if script is already running, exit. 
 
You also need to explain to dba's that you can at only off them 'best effort' as Oracle 9 ran out of support from Oracle and Veritas some years ago. 
Even NBU 7.7 did not support Oracle 9 and 10, but still seem to have 'tollerated' the NBU 7.6 client and Oracle agent. 

Looking through the script you had attached it appears that you are deleting archive logs that have been backed up once and later on performing a crosscheck followed by deleting the expired images..

Inorder to isolate where the problem
lies I would suggest start by commenting out the last line i.e.” delete expired” from your script. Since crosscheck archivelogs would still be included after the backup runs you can connect to rman and do a list backup and determine how many pieces were marked as expired and trigger a manual delete expired archivelog command and monitor the time taken.

Hi Mikail,

 

thanks for your suggestion, i would be generating required logs and will share.

.Hi Marianne,

 

I have gone through the provided link for catalaog and image expiry. But this is not new configuration and same RMAN script we are using for years and all of sudden after upgrading to master and media servers from 7.7.3 to 8.1.1 immedeatly issue started.

Client contains multiple nics but we are using backup nic for data transfer.

I will generate dbclient log and provide output.

Hi Amol

( Inorder to isolate where the problem
lies I would suggest start by commenting out the last line i.e.” delete expired” from your script. Since crosscheck archivelogs would still be included after the backup runs you can connect to rman and do a list backup and determine how many pieces were marked as expired and trigger a manual delete expired archivelog command and monitor the time taken.)

I will check DBA team on this.

 

I have attached dbclient log error messages in attached notepad i could see error messages like not deleting images from catalog.

 

We have configured client with backup interface name in policy so what would be order in bp.conf file

 

it should be

backupmaster01

backupmaster01-backup 

 

or vice versa I mean first entry should be master server name in bp.conf in this case it should be production interface or backup interface. 

Michal_Mikulik1
Level 6
Partner    VIP    Accredited Certified

Hello,

this type of problem is better to solve with support - especially when the only change which led to the error was the upgrade.

I have quickly browsed the discussion and have several notes:

- use backup NIC names in REQUIRED_INTERFACE parameter only. Dont use them as policy client names, NB_ORA_CLIENT values etc.

- AFAIK backup NIC is used for data flow only, not for metadata flow. So it makes no sense to use it also for Master Server

Regards

Michal