cancel
Showing results for 
Search instead for 
Did you mean: 

Oracle Database backup hungs!

Arshad_Khateeb
Level 5
Certified

Configuration Setup

Master Server - SunOS 5.9     UNIX    Master Server    6.5.6    Connected

Media Server - SunOS 5.10     UNIX    Media Server    6.5.6    Connected

Client - SunOS 5.9     UNIX    Client    6.5.6    Connected

 

Since last month we are experiencing this oracle db backup issue. All the child job used to complete except parent. It just gets hung.

dbclient log

> tail log.051915 
10:52:20.027 [10505] <2> int_DumpSbtInfo: INF - Media Information for Backup File : <iwhprod_20150518214547_db_a8q7ao2l_1_1>
10:52:20.027 [10505] <2> int_DumpSbtInfo: INF - Media Sharing Mode : <Multiple Concurrent Users>
10:52:20.027 [10505] <2> int_DumpSbtInfo: INF - File Ordering Mode : <Sequential file access>
10:52:20.027 [10505] <2> int_DumpSbtInfo: INF - Media ID : <AL4417>
10:52:20.027 [10505] <2> int_DumpSbtInfo: INF - File Creation Date and Time : <1432045250>
10:52:20.027 [10505] <2> int_DumpSbtInfo: INF - File Expiration Date and Time : <1440080450>
10:52:20.027 [10505] <2> int_DumpSbtInfo: INF - Comment : <Backup ID : rome_pres-bu_1432045250>
10:52:20.027 [10505] <2> int_DumpSbtInfo: INF - File Creation Method : <Stream>
10:52:20.027 [10505] <2> int_DumpSbtInfo: INF - leaving
10:52:20.027 [10505] <2> sbtinfo2: INF - leaving

 

rman log

> tail iwhprod_20150518214547_db_bkup_inc0.log
RMAN-06731: command backup:100.0% complete, time left 00:00:00
RMAN-06731: command backup:100.0% complete, time left 00:00:00
RMAN-06731: command backup:100.0% complete, time left 00:00:00
RMAN-06731: command backup:100.0% complete, time left 00:00:00
RMAN-06731: command backup:100.0% complete, time left 00:00:00
RMAN-06731: command backup:100.0% complete, time left 00:00:00
RMAN-06731: command backup:100.0% complete, time left 00:00:00
RMAN-06731: command backup:100.0% complete, time left 00:00:00
RMAN-06731: command backup:100.0% complete, time left 00:00:00
RMAN-06731: command backup:100.0% complete, time left 00:00:00

 

 

 

26 REPLIES 26

Will_Restore
Level 6

Have the DBA check the Oracle database alert file

 

Nicolai
Moderator
Moderator
Partner    VIP   

Nothing from the netbackup side you can do if RMAN hangs. Tell the DBA to look into it.

And by the way: Netbackup 6.5 is out of service for a long time, time for upgrade.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
With OS and NBU versions this old and no longer supported, my guess is that the same may be true for the Oracle version.... Probably good idea to check physical and logical resources on the client. With the environment this old, chances are that the aging Oracle server is battling with data growth.

Arshad_Khateeb
Level 5
Certified

Thanks Nicolai and Marianne for your inputs!!!

I do agree about the NBU environment being old. Our proposal is on table to upgarde NBU to 7.6.x. We'll be doing this upgrade in next quarter.

Regarding DBA need to check from their end.......they are checking and let me know what they should check or what we need here to know.

Yesterday, we upgraded the memory on client bcoz it has lots of jobs running on daily basis. We thought that it might be intruppting the backups to complete.

As stated above that, all child streams complete and its just parent job which hung everytime.

Will_Restore
Level 6

Note that NetBackup 7.6 does not support Solaris 9 SPARC

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Get them to check resources as per my previous post.

Arshad_Khateeb
Level 5
Certified

They did their checks and nothing appeared to be holding the server except backup. 

connected; connect time: 0:00:00   -    This is where the parent backup job hung in GUI

DBA too have a case open with Oracle. Looks like we can't do much from NBU :(

 

Nicolai
Moderator
Moderator
Partner    VIP   

They will need to do a trace on the RMAN process to see what is going.  Google "RMAN trace"

Arshad_Khateeb
Level 5
Certified

DBA did RMAN trace while testing the Database backups. Here is where it started showing errors.....

----------------------------------------------------------------------------------------------------------------------------------------

RMAN-06731: command backup:4.1% complete, time left 07:53:47
RMAN-03009: failure of backup command on CH01 channel at 05/19/2015 22:31:16

ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: Error received from media manager layer, error text:
   Failed to process backup file <iwhprod_20150519221015_db_a9q7c1nb_1_1>
ORA-19502: write error on file "iwhprod_20150519221015_db_a9q7c1nb_1_1", blockno 138650625 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
   VxBSASendData: Failed with error:
   Server
RMAN-12018: channel CH01 disabled, job failed on it will be run on another channel
RMAN-03009: failure of backup command on CH02 channel at 05/19/2015 22:31:16

ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: Error received from media manager layer, error text:
   Failed to process backup file <iwhprod_20150519221015_db_aaq7c1nb_1_1>
ORA-19502: write error on file "iwhprod_20150519221015_db_aaq7c1nb_1_1", blockno 169535489 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
   VxBSASendData: Failed with error:
   Server
RMAN-12018: channel CH02 disabled, job failed on it will be run on another channel
RMAN-03009: failure of backup command on CH03 channel at 05/19/2015 22:31:16

ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: Error received from media manager layer, error text:
   Failed to process backup file <iwhprod_20150519221015_db_abq7c1nb_1_1>
ORA-19502: write error on file "iwhprod_20150519221015_db_abq7c1nb_1_1", blockno 143012865 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
   VxBSASendData: Failed with error:
   Server
RMAN-12018: channel CH03 disabled, job failed on it will be run on another channel
RMAN-08031: released channel: CH01
RMAN-08031: released channel: CH02
RMAN-08031: released channel: CH03
RMAN-08031: released channel: CH04
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on CH04 channel at 05/19/2015 22:31:17

ORA-27192: skgfcls: sbtclose2 returned error - failed to close file

ORA-19511: Error received from media manager layer, error text:
   Failed to process backup file <iwhprod_20150519221015_db_acq7c1nb_1_1>
ORA-19502: write error on file "iwhprod_20150519221015_db_acq7c1nb_1_1", blockno 129326081 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
   VxBSASendData: Failed with error:
   Server

RMAN> 

Recovery Manager complete.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Database INC0 Backup of database iwhprod on server rome_pres
The RMAN script completed with a RETURN CODE of 1
End Time is 05/19/2015 22:31:19

Will_Restore
Level 6

Similar problem in this old thread

https://www-secure.symantec.com/connect/forums/oracle-fullbackups-status-6

A few suggestions were offered but the originator never came back to mark a solution. 

 

Arshad_Khateeb
Level 5
Certified

Thanks Will !

We are trying to see out of the solutions in the technote if anyone of it can be implemented in our setup

Nicolai
Moderator
Moderator
Partner    VIP   

The trace did not show any leads to a possible cause. But media manager layer does mean Netbackup in this case.

Can you as a test try add in bp.conf on master/media and client add CLIENT_READ_TIMEOUT = 1800

 

 

Pritesh_Pisal
Level 5
Partner Accredited

Hi Arshad,

Just to suggest Look for NET_BUFFER_SZ  as suggested in below link where usr has similar issue like you

http://grokbase.com/t/freelists.org/oracle-l/075x8xmade/rman-error-ora-19502

this pretty old but have a look at 

at Jun 4, 2007 at 9:18 pm

We have resolved the issue. Bouncing database server doesn't help neither.
What changes we did was that we looked into the value of NET_BUFFER_SZ file
on the client (database host) and on the netbackup server side. The value in
the file is different, in fact, client file has higher value than the server
file value. Upon synchronizing client value with Server side file value
resolved our issue and backup started running without any problems

 

Pritesh_Pisal
Level 5
Partner Accredited

Hi Arshad,

I have read this on some forum Check for NET_BUFFER_SIZE value if you have confiugred on Master/media and client.the value should be the same as you have specified on the Master/media.

 

Seems my first post reflected late...sorry

Arshad_Khateeb
Level 5
Certified

@ Nicolai : Looks like this is not going to make any difference but i will surely try to modfiy the value on master/media server as well.

On Master CLIENT_READ_TIMEOUT = 300

On Client CLIENT_READ_TIMEOUT = 30000

 

@ Pritesh: Thanks for your input. I'll try this as well.

---------------------------------------------------------------------

BTW, we had good DB Incremental backup yesterday after DBA modfied RMAN script per Oracle.

We'll be testing the full DB backups tomorrow. I would like to see how it goes before doing any changes from NBU end.

I'll keep you updated after our tomorrow's full DB backup.

Arshad_Khateeb
Level 5
Certified

DBA Guys did two changes in RMAN script after that we had a good Incremental DB backup.

Today we will be testing Full DB backups with the same changes in RMAN Script. Let us see how it goes :)

I'll keep you posted.

DBA Comment:

i changed 2 things
not sure which one did the trick
i commented out the alter statement for the control file to write the file to disk. 
i also changed the default configuration device to sbt_tape instead of disk
I'm thinking the 2nd change is the one that made the difference

Arshad_Khateeb
Level 5
Certified

The Full DB backup got hung again over the weekend. Again its just parent job which was hung in active state. All child jobs completed successfully.

sbtio.log is set now to get additional log information. We are planning to have one more attempt to get the Full DB backup done.

Discussing the timings with bussiness....

Nicolai
Moderator
Moderator
Partner    VIP   

Thanks for the update !

Varunthilak_B
Level 3
Certified

- Are you taking offline or Online backup ?

- Are you using template or Script ?

- What is Oracle Version ?

- Provide Progress log ?

- Did you check the backup to DISK (I mean SBT_DISK) instead of tape ?