07-06-2010 03:19 AM
07-06-2010 04:24 AM
07-06-2010 04:51 AM
07-06-2010 09:16 AM
07-19-2010 03:46 AM
Find the following document which will give u clear picture of background process of RMAN.
RMAN & Media Manager
Troubleshooting Guide
An Oracle White Paper
March 2005
RMAN & Media Manager Troubleshooting Guide
Executive Overview..........................................................................................3
Overview of RMAN & Media Manager........................................................3
Media Manager Components.................................................................4
How RMAN works with Media Manager Software............................4
Troubleshooting Steps......................................................................................7
1. Check Media Manager Initialization.................................................7
Media Manager Integration Troubleshooting...................................................8
Additional Media Manager Resources...........................................................10
2. Determine if OS Backups Using the Media Manager work........10
3. Perform Disk and SBT Test Disk Backups...................................10
4. Isolate the Server Process from the SBT Library.........................11
5. Check the RMAN Error Message Stack.........................................12
6. Troubleshoot OS Error Codes........................................................13
7. Check RMAN and Media Manager Logs.......................................13
Trace files in USER_DUMP_DEST directory of target database....................13
Sbtio.log in USER_DUMP_DEST directory..........................................15
Media Manager Logs....................................................................................16
8. Diagnose SBT Function Errors.......................................................16
SBTINIT and SBTINIT2 Failures................................................................16
SBTOPEN or SBTBACKUP Failures...........................................................16
SBTOPEN or SBTRESTORE Failures.........................................................17
SBTWRITE or SBTREAD Failures..............................................................17
RMAN & Media Manager Troubleshooting Guide Page 2
RMAN & Media Manager Troubleshooting Guide
EXECUTIVE OVERVIEW
Recovery Manager (RMAN) is the single tool for guaranteed, efficient, manageable Oracle database backup and recovery. RMAN integrates with leading 3rd party backup vendors for backup to tape, via the System Backup to Tape (SBT) API. These backup vendors have implemented interfaces to their products using the API, allowing RMAN to drive backups to and restores from tape. More information on these vendors can be found in the Oracle Backup Solutions Program.
This paper presents an overview of the RMAN environment and how it interacts with the media manager, followed by common troubleshooting procedures when integrating and working with 3rd party media managers. Skip to media manager-specific troubleshooting by going here.
OVERVIEW OF RMAN & MEDIA MANAGER
The high-level environment can be depicted as follows:
To store backups on tape, RMAN requires media manager software, a third party software program that writes, reads, and manages sequential media such as tapes to backup and recover data. In the case of backup, RMAN starts the Oracle Server session which reads data and sends it to the media manager which writes the data to the tape device. In the case of restore, the media manager software reads the data
RMAN & Media Manager Troubleshooting Guide Page 3
from the tape and sends it to the Oracle Server session which restores data to the disk.
The SBT API is the interface through which RMAN interacts with the media manager. The API defines the functions that create backup files, write to/read from the backup media, and search for/remove backup files. Management of backup devices and media is handled by the media manager, and is outside the scope of the SBT API.
Media Manager Components
Media manager software products can vary widely in breadth of functionality. However, there is a set of typical components shared by most media managers, and they are described below.
Device Agent
This component writes/reads data to/from a backup device (e.g., tape drive).
Robot Agent
This component receives commands from the device agent and controls the robotic interface to load and unload tapes.
Disk Agent
This component is responsible for reading the data from disk and sending it to the Device Agent. When integrated with RMAN, this component is the media management library (known in Unix as libobk.so and in Windows as ORASBT.DLL).
Media Management Database
This is a database which contains metadata about backups and media. For example, it stores Media IDs, bar-codes, and location of the tapes.
Session Manager
This component controls the transfer of backup and restore data. Some products have this component in a separate process and some integrate it with the Device Agent.
How RMAN works with Media Manager Software
The core of the RMAN and media manager software integration is the media management library, supplied by the backup vendor. This library contains an implementation of the SBT API functions, and is linked in with the Oracle server binary. In Oracle 8.1.6 and below versions, the library must be explicitly linked in with the Oracle server binary before starting RMAN. In Oracle 8.1.7, the library is implicitly loaded by the OS on the start of the Oracle server session. In Oracle9i and 10g, the library is automatically loaded when RMAN channels are allocated.
RMAN & Media Manager Troubleshooting Guide Page 4
Oracle calls these SBT functions to back up and restore data files to and from media controlled by the media manager. When performing backups or restores, the RMAN client connects to the target instance and directs the instance to talk to the media manager. No direct communication occurs between the RMAN client and the media manager; all communication occurs on the target instance.
Backup Session Flow (SBT v2.0)
The following steps are taken upon a SBT backup to tape device:
1. RMAN connects to the Oracle server and starts a server session, for each allocated channel.
2. When started, the Oracle server session initialize the media management software. This is done by calling sbtinit(). This function gives the media management software an opportunity to initialize itself, acknowledge this with RMAN, and return any initialization text. For example, the media management software can return text about its version, which is then displayed to RMAN output (message RMAN-08526:(
RMAN-08030: allocated channel: ORA_SBT_TAPE_1
RMAN-08500: channel ORA_SBT_TAPE_1: sid=8 devtype=SBT_TAPE
RMAN-08526: channel ORA_SBT_TAPE_1: VERITAS NetBackup for Oracle8 - Release 3.4GA
Sbtinit() also returns the version of SBT API supported by this media manager. This information is displayed in the RMAN output (message RMAN-08503:(
RMAN-08503: piece handle=3mckj4i8_1_2 comment=API Version 2.0,MMS Version 3.2.0.0
3. After a successful sbtinit(), then the Oracle will call sbtinit2(). Sbtinit2() is called by RMAN to supply additional information to the media management software that was not supplied to sbtinit(). For example, this additional information could be ENV values in an ALLOCATE CHANNEL … PARMS statement.
4. The Oracle session calls sbtbackup() to create a backup piece. Specifically, the Session Manager reads the Media Management Database, starts the appropriate Device Agents for writing, and loads the tapes for backup.
5. The Oracle session starts reading the input files (datafiles, archive logs or control files). The data is sent to the media manager via sbtwrite2() API call, which writes the data to tape. A typical media manager library may copy data to its internal buffers. When the buffers are exhausted, the data is sent to the Device Agent which writes it to tape.
RMAN & Media Manager Troubleshooting Guide Page 5
6. When the Oracle session finishes the backup piece, it calls sbtclose2() to close writing process. The media manager flushes any buffered data to the tape and all data previously written via sbtwrite2() are permanently stored on the tape. In order words, the sbtclose2() instruct the media manager to commit data on the tape. On this step, a typical media manager library will send all data which are still in the internal buffers to the Device Agent and wait until all the data are on the tape.
7. The Oracle session then calls sbtinfo2() to check whether the backup piece is stored in the media manager database. The sbtinfo2() function asks the media manager to return the mediumID, location, and expiration time of the tape where the backup piece was stored.
8. When sbtinfo2() finishes, RMAN records the name of the backup piece:
RMAN-08045: channel ORA_SBT_TAPE_1: finished piece 1 at MAR 13 2001 08:48:12
RMAN-08503: piece handle=41ckj865_1_1 comment=API Version 2.0,MMS Version 3.2.0.0
9. If there are additional pieces to backup, then the algorithm will continue from step 4.
10. After the Oracle session finishes backing up all its piece data, it will call sbtend(). In this function, the media management software cleans up and release resources. After sbtend() returns, the RMAN channel is released and the server session ends. A typical media manager will then instruct the Session Manager to end the backup session and unload the tapes.
Restore Session Flow (SBT v2.0)
The following steps are taken upon a SBT restore to disk from tape device:
1. RMAN connects to Oracle Server and starts a server session for each allocated channel in use
2. When started, the Oracle Server session initializes the media management software. This is done by calling sbtinit(). This is the same as step 2 from Backup Session Flow above.
3. Then the Oracle session calls the sbtrestore() in order to request the backup piece from the media manager. Sbtrestore() tells the media manager to find and load the tape containing the requested backup piece.
4. The Oracle server starts reading data from the media manager library by calling sbtread2(). The data received from sbtread2()is then written to the disk.
RMAN & Media Manager Troubleshooting Guide Page 6
5. When finished reading the backup piece, the Oracle server calls sbtclose() to close it. Sbtclose() instructs the media manager to stop reading data from the tape.
6. If there is more data to be restored, then the algorithm will continue from step 4.
7. After each server session restores all of its data, will call sbtend(). In this function, the media management software cleans up and releases resources. After sbtend() returns, the RMAN channel will be released and the server session will end. A typical media manager will instruct its Session Manager to end the restore session and unload the tapes.
TROUBLESHOOTING STEPS
1. Check Media Manager Initialization
Get the entire RMAN log, and not just the RMAN Error Message Stack. Look for messages identifying the Media Manager and version.
In Oracle8/8i, look for message RMAN-08526, for example:
RMAN-08526: channel t1: VERITAS NetBackup for Oracle8 – Release 3.4GA (030800)
RMAN-08526: channel dev1: BMO v3.0
RMAN-08526: channel t1: Tivoli Data Protection for Oracle: version 2.2.0.0
In Oracle9i, text messages will identify the Media Manager, for example:
channel dev_0: HP Open View OmniBack II A.04.10/PHSS_28582/PHSS_28583
If the media manager is correctly identified in these messages, then it has been successfully loaded and initialized. Continue to Step 5, Check the RMAN Error Message Stack.
Otherwise, the following type of errors indicate the Media Manager is not correctly integrated with Oracle:
In Oracle8:
ORA-19506: failed to create sequential file, name="X", parms=""
ORA-27006: sbtremove returned error
Additional information: 7086
In Oracle8i:
ORA-19506: failed to create sequential file, name="X", parms=""
RMAN & Media Manager Troubleshooting Guide Page 7
ORA-27006: sbtremove returned error
Additional information: 4110
In Oracle9i:
ORA-19557: device error, device type: SBT_TAPE, device name:
ORA-27211: Failed to load Media Management Library
Additional information: 2
Any errors in SBT routines sbtinit(), sbtinit2(), sbtopen() or sbtbackup() also indicate possible media manager integration or configuration problems:
RMAN-10035 - ORA-19506: failed to create sequential file,
ORA-27007: failed to open file
Additional information: 7009
Additional information: 1
ORA-19511: SBT error = 7009, errno = 0, sbtopen: can't connect with media manager
RMAN-10031 - ORA-19624 occurred during call to
DBMS_BACKUP_RESTORE.BACKUPPIECECREATE
Another indication of problem with media manager integration is the following message:
RMAN-08526: channel t1: WARNING: Oracle Test Disk API
ORA-19511: SBT error= 4110, errno = 0, BACKUP_DIR environment variable is not set
The disk API can be used to test SBT routines without involvement from the media manager. However, this message is an error since the channel is being allocated for the media manager, but due to a problem with media manager integration, the disk API is used, by default.
Media Manager Integration Troubleshooting
Ensure that the media management library is linked into Oracle correctly.
In Oracle8/8i:
RMAN expects the media manager module to be located in $ORACLE_HOME/lib and to be called libobk.<suffix> (suffix varies according to platform).
RMAN & Media Manager Troubleshooting Guide Page 8
The exception to this is on Solaris platforms running Oracle 8.1.6 where the module is called libdsbtsh8.so. If running 32-bit Oracle on a 64-bit platform check that a 32-bit version of the media manager module is being used.
The following steps will help troubleshoot:
a. Check the symbolic link:
Go to $ORACLE_HOME/lib. There should be a symbolic link between libobk.<suffix> and the media manager module.
If there is not one, create one using:
% mv libobk.<suffix> libobk.sav
% ln -s <pathname to vendor’s MML module> $ORACLE_HOME/lib/libobk.<suffix>
b. If Oracle8, relink the Oracle executable:
Solaris:
cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk ioracle LLIBOBK=/usr/lib/libobk.so LIBMM= LLIBMM=
HP:
cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk ioracle "LLIBOBK=/usr/lib/libobk.sl -lC" LIBMM= LLIBMM=
Digital Unix:
cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk ioracle LLIBOBK="/usr/lib/libobk.so" LIBMM= LLIBMM=
AIX:
cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk ioracle LLIBOBK=/usr/lib/libobk.a LIBMM= LLIBMM=
In Oracle9i:
Media management libraries (libobk.x) will be dynamically loaded when a channel is allocated with type SBT_TAPE. Ensure that the libobk.x (where x is the appropriate extension for the operating system) exists in the $ORACLE_HOME/lib directory and that the $ORACLE_HOME/lib directory is first in the LIBPATH (or LD_LIBRARY_PATH).
RMAN & Media Manager Troubleshooting Guide Page 9
Alternatively, media managers can be explicitly specified using the PARMS parameter SBT_LIBRARY.
Media manager environment variables are passed to the Media Manager Layer via the PARMS option of the ALLOCATE CHANNEL command, for example
allocate channel t1 type 'sbt_tape' PARMS 'ENV=(TDPO_OPTFILE=/tmp/tdopt.opt)';
Ensure that media manager environment variables, passed to the media manager via the PARMS option of the ALLOCATE CHANNEL command, are complete and syntactically correct.
Additional Media Manager Resources
Tivoli Storage Manager
• TDP and RMAN Problem Resolution Tips (requires IBM.com registration and IBM Tivoli customer number)
• Note 125219.1: Integration of RMAN and ADSM Connect Agent for Oracle
Legato Networker
• Note 208914.1: Quick Start for Legato Storage Manager Configuration and Troubleshooting
HP Omniback
• Note 77552.1: RMAN: Configuring HP Omniback with RMAN
VERITAS NetBackup
• Note 209117.1: How to Install, Configure, Check and Troubleshoot VERITAS NetBackup 4.5 for Oracle
• NetBackup linking instructions
2. Determine if OS Backups Using the Media Manager work
This determines whether the basic components of the media manager are correctly installed and configured. If the media manager operating system backup does not work, then the problem is not related to the Oracle media manager module. Rather, the problem is in the media manager installation and configuration. Check with media manager vendor for troubleshooting.
3. Perform Disk and SBT Test Disk Backups
If media manager backups work, confirm that RMAN disk backup works:
run {
allocate channel d1 type disk;
backup validate datafile <datafile #, etc>;
RMAN & Media Manager Troubleshooting Guide Page 10
}
As a secondary check, take a backup using the SBT disk library. This makes the same tape API calls that would normally be made to the Media Manager but the actual physical backup is written to disk. The PARMS parameter BACKUP_DIR must be set to the disk location where the backup pieces will be written.
For Oracle8/8i:
a. Shutdown all Oracle instances that use this $ORACLE_HOME.
b. Create symbolic link:
% cd $ORACLE_HOME/lib
% mv libobk.<suffix> libobk.<suffix>.save
% ln -s libdsbtsh8.<suffix> libobk.<suffix>
For Oracle8, relink Oracle as described in Media Manager Integration Troubleshooting.
c. Perform disk backup
run {
allocate channel t1 type sbt format '%U';
parms='ENV=(BACKUP_DIR=/<backup_directory>)';
backup datafile <datafile #, etc>;
}
For Oracle 9i:
The SBT disk library can be loaded dynamically:
run {
allocate channel t1 type sbt format '%U';
parms='SBT_LIBRARY=oracle.disksbt, ENV=(BACKUP_DIR=/<backup_directory)';
backup datafile <datafile #, etc>;
}
If the backup succeeds, all RMAN API calls are made correctly, and the errors lie on the media manager side.
4. Isolate the Server Process from the SBT Library
The media manager can interfere with the reading of files performed by the Oracle server session. This problem occurs because the media manager library is loaded by the Oracle server process and the library shares all operating system resources dedicated to the Oracle server process. For example, on rare occasions, the MML
RMAN & Media Manager Troubleshooting Guide Page 11
can mistakenly close data file descriptors opened by Oracle code in the server process.
You should attempt to separate the process responsible for reading from the process responsible for writing (performed by SBT functions in the media management library). You can separate these processes by setting the BACKUP_TAPE_IO_SLAVES initialization parameter to TRUE.
5. Check the RMAN Error Message Stack
The Media Manager module is successfully loaded and initialized but the backup is still failing.
Look for ORA-19511. This is always returned from the Media Manager Layer.
Media Manager errors are also characterized by the following:
• ORA-19506: failed to create sequential file
• any error on a read, write, open or close of a sequential file
• any error in an SBT routine: sbtinit, sbtinit2, sbtopen, sbtread, sbtwrite, sbtclose, sbtinfo, sbtend
Here is an example of a failed SBT routine:
RMAN-00571: =======================================
RMAN-00569: ======= ERROR MESSAGE STACK FOLLOWS=======
RMAN-00571: =======================================
RMAN-03015: error occurred in stored script bkfIST(1)
RMAN-03006: non-retryable error occurred during execution of command: backup (2)
RMAN-07004: unhandled exception during command execution on channel ch1 (3)
RMAN-10035: exception raised in RPC: ORA-27015: skgfcls: failed to close the file
ORA-19511: SBT error = 7023, errno = 29, sbtclose: system error (4)
RMAN-10031: ORA-19583 occurred during call to
DBMS_BACKUP_RESTORE.BACKUPPIECECREATE(5)
Here is what can be deduced from the errors:
1. Stored scripts are being used: RMAN script bkfIST failed
2. The failing RMAN command: backup
RMAN & Media Manager Troubleshooting Guide Page 12
3. The failing channel: ch1
4. Media manager layer error: ORA-19511
SBT error: 7023 (OS error)
OS error: 29 (The system cannot write to the specified device)
Failing routine: sbtclose
5. Failing RMAN RPC: DBMS_BACKUP_RESTORE.BACKUPPIECECREATE
Conclusion: the media manager function sbtclose failed to create the backup piece due to OS error 29.
The example below shows the failure of an SBT API 2.0 sbtbackup() function:
ORA-19506: failed to create sequential file, name="4fckrhkv_1_1",
parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
sbtbackup: Failed to open for backup. # SBT API 2.0 textual error message
If an error message refers to a sequential file, then you have identified an SBT API error. In the example above, the problem occurs in the SBT API because the ORA-19506 error refers to a sequential file. All errors involving the write, read, open, or close of a sequential file indicates that an SBT function has failed. The text after the ORA-19511 message explains the error based on data received from the media manager.
Any OS errors, RMAN trace files, sbtio.log, and media manager logs should now be investigated. If needed, proceed to Step 8 to research SBT function errors and common causes.
For additional help on ORA-19511 errors, consult Note 227517.1: Main Index of Common Causes for ORA-19511.
6. Troubleshoot OS Error Codes
For a quick reference, see Note 28778.1: Unix Error Codes.
For port specific errors, see /usr/include/sys/errno.h.
7. Check RMAN and Media Manager Logs
Trace files in USER_DUMP_DEST directory of target database
This trace file is created by the Oracle server session process that is performing the backup, and can indicate which SBT function failed, particularly for hanging issues
RMAN & Media Manager Troubleshooting Guide Page 13
or core dumps. Oracle writes all entering and exiting of the SBT API functions in the trace file.
Enable general tracing using the TRACE=1 parameter in ALLOCATE CHANNEL command:
RUN {
ALLOCATE CHANNEL tst TYPE DISK TRACE=1;
BACKUP VALIDATE DATAFILE <datafile #, etc>;
}
The trace file will include something like the following:
*** SESSION ID:(9.17) 2002-03-16 13:50:21.945
skgfalo(se=0x815ff8c8, ctx=0x262bf98, dev=0x264861c, devparms=, flags=33554432)
skgfidev(se=0x815ff8c8 ctx=0x262bf98, dev=0x264861c)
entering sbtinit on line 2203
return from sbtinit on line 2213
skgfqsbi(ctx=0x262bf98, vtapi=API Version 1.1, id=MMS Version 2.2.0.1)
skgfqcre(se=0x815ff8c8, ctx=0x262bf98, dev=0x264861c,
file=0x2647f08, fparms=, flags=0x0)
entering sbtopen on line 683
return from sbtopen on line 704
skgfwrt(ctx=0x262bf98, file=0x2647f08, iosb=0x2647cf4,
buf=0x815b0000, numblks=1)
skgfwrt(data=13020000 00000001 0003A1B1 00000104)
entering sbtwrite on line 903
...
The trace output indicates exactly which environment variables are set in the Oracle Server session.
For example, if the channel has the option:
PARMS=’ENV=(NB_ORA_CLASS=class1)’
then the trace output displays the following:
skgfidev(): processing: ENV=(NB_ORA_CLASS=fdfa)
RMAN & Media Manager Troubleshooting Guide Page 14
skgfidev(): setting environment variable: NB_ORA_CLASS=fdfa
The trace file can indicate if a function is hanging. For example:
skgfwrt(ctx=0x262bf98, file=0x2647f08, iosb=0x2647cf4, buf=0x815b0000, numblks=1)
skgfwrt(data=13020000 00000001 0003A1B1 00000104)
entering sbtwrite on line 903
If there is nothing after this, then the function sbtwrite() is hung.
If level 1 tracing does not offer guidance to the errors, verbose tracing can be enabled with TRACE=5 which will also provide performance statistics.
Additional tracing levels are as follows:
Level Name Description
~~~~ ~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~
2 KRB_TRACE_THREAD Trace krbbpc thread switches
3 KRB_TRACE_IO Trace I/O
4 KRB_TRACE_INCR Trace incremental restore
5 KRB_TRACE_PERF Performance tracing
6 KRB_TRACE_KRBBPC_OUTPUT Detailed backup piece output
Note: A trace level 0 traces all error conditions that result in a return code of -1 from an SBT function, except SBT_ERROR_EOF and SBT_ERROR_NOTFOUND which are handled by the client. A trace level 2 traces the entry and exit from each SBT function, the value of all function parameters, and the first 32 bytes of each read/write buffer, in hexadecimal. The read buffer is traced upon function entry and the read buffer is traced after it has been filled with data, prior to returning to the client.
Sbtio.log in USER_DUMP_DEST directory
This log is the only one written exclusively by the Media Manager.
Here is an example of sbtio.log written by Legato Storage Manager:
(24677) LSM 2.2.0.1: 03/19/02 10:26:27 Sbtopen: unable to start save session with server dlsun1556: There is no pool named 'fdfa'.
From this error message, the backup failed because there is no data pool.
Note that some media managers (e.g. NetBackup) do not write to this file. Refer to the log files created by the media manager for debug information.
RMAN & Media Manager Troubleshooting Guide Page 15
Media Manager Logs
A sample listing of log locations is provided below: Media Manager |
Directory |
File |
Metalink |
||
Legato Networker |
/nsr/log /nsr |
Daemon.log |
Note: 208914.1 |
||
Omniback |
/var/opt/omni/log |
oracle8.log debug.log media.log |
Note: 77552.1 |
||
NetBackup |
<install_path>/netbackup/logs |
bphdb dbclient bpdbsbora |
Note: 209117.1 |
||
Tivoli |
Logdirectory (in dsm.opt or tdpo.opt) parameter: tracefile |
Note: 125219.1 |
07-19-2010 07:33 AM
07-19-2010 12:35 PM
07-19-2010 01:15 PM
09-02-2010 02:09 PM
Hi AAlmroth,
i think you have the solution for my problem...
1) Oracle db admin want to save the istance with hot backup.
2) I have two policy: the first for hot db backup have two schedule (daily and weekly); the second policy run each hour and save the archive log.
3) Oracle db admin say that the backup of archive log that run near the hot db backup must have the same retention on the retention of db backup. So how can i implement this with NB? What is BTW? Is it a command that i can insert in the pre-schedule script?
Thank a lot,
Sergio