oracle db backup failure

Master - 8.1, Linux
Client 8.0 Oracle ovm, windows os.
Issue: OIP. Backup is running for some period for example, 2 months after that suddenly failing with socket error. It looks like it's unable to connect to the master. But filesystem backup works. No logs provide any clue. If we reboot the client the DB backup works. It's clearly shows something on the client side. We have around 20 Masters. It's happening in a particular site for few various clients in different days. Windows team confirms there is no issues at their end. We asked db team to trigger the backup from their end. Similar issue. Only after client reboot the backup works. The reboot is just a work around and we would need to know which caused the issue. Client up time usually 2 months plus.
Tags (2)
10 Replies

Re: oracle db backup failure

Bear in mind that no client -> master connection is needed for filesystem backup.

These logs will help to troubleshoot:
bprd on master (NBU must be restarted after folder is created).
dbclient on the oracle client.

Re: oracle db backup failure

 

Re: oracle db backup failure

Yes Marianne I'm aware that fs backup doesn't need client to master connectivity. DB client is not at all getting written. Bprd created but no specific clue. As mentioned earlier, other clients backups are writing well without issues.

Re: oracle db backup failure

@ravin_a 

With any kind of backup failure, it is important to follow the process flow diagram and logs in order to see where exactly in the process the failure occurred.

For OIP, there should be connection request in client's bpcd log. 

Start of OIP process will be logged in bpdbsbora log on the client. 

Let us know if it at least gets to this stage. 
If so, then the processes will be handed off to dbclient and subsequently to bprd on the master.

You should at least have dbclient and bprd logs for successful backups (after reboot), right? 

 

Re: oracle db backup failure

The issue is mostly smilar to this TN.

https://www.veritas.com/support/en_US/article.TECH37307 

Followed the below article as well with reboot.

https://support.microsoft.com/default.aspx?scid=kb;en-us;170359

Backup worked for that time as the reboot was done. afterwards the issue recurs.

 

Re: oracle db backup failure

bpdbsora is showing unable to start the oracle command something like that.

 

02:53:51.319 [15116.11220] <2> debuglog: <16> bphdb: ERR - send() to server failed: The established connection was aborted by software on the host computer.
02:53:51.319 [15116.11220] <2> debuglog: <16> bphdb: ERR - could not write progress log message to the NAME socket


Nov 8, 2018 11:23:16 AM - Info bphdb (pid=12345) INF - RMAN-00571: ===========================================================
Nov 8, 2018 11:23:16 AM - Info bphdb (pid=14188) INF - RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
Nov 8, 2018 11:23:16 AM - Info bphdb (pid=12345) INF - RMAN-00571: ===========================================================
Nov 8, 2018 11:23:16 AM - Info bphdb (pid=12345) INF - RMAN-03009: failure of backup command on ch01 channel at 11/08/2018 14:53:09
Nov 8, 2018 11:23:16 AM - Info bphdb (pid=12345) INF - ORA-19506: failed to create sequential file, name="arch_3405_u39t5987_s59890_p1_t8948945095", parms=""
Nov 8, 2018 11:23:16 AM - Info bphdb (pid=12345) INF - ORA-27028: skgfqcre: sbtbackup returned error
Nov 8, 2018 11:23:16 AM - Info bphdb (pid=12345) INF - ORA-19511: Error received from media manager layer, error text:
Nov 8, 2018 11:23:16 AM - Info bphdb (pid=12345) INF - VxBSAValidateFeatureId: Failed with error:
Nov 8, 2018 11:23:16 AM - Info bphdb (pid=12345) INF - Recovery Manager complete.
Nov 8, 2018 11:23:16 AM - Info bphdb (pid=12345) INF - End of Recovery Manager output.
Nov 8, 2018 11:23:16 AM - Info bphdb (pid=12345) INF - End Oracle Recovery Manager.
Nov 8, 2018 11:23:17 AM - Error bpbrm (pid=1234) from client client.com: ERR - exit status: <1>
Nov 8, 2018 11:23:17 AM - Error bpbrm (pid=1234) from client client: ERR - bphdb exit status = 6: the backup failed to back up the requested files

Re: oracle db backup failure

Log snippets do not help. 
The snippet does not give any clue if 'aborted connection' has anything to do with an OS or TCP or NBU timeout, or if the client is simply running out of TCP sockets.

Full logs are needed from the same backup attempt.
You have different timestamps in log snippet and RMAN output. 

Re: oracle db backup failure

Next time before you perform the reboot on the client machine try running a netstat -ano and check how many sockets appear to be in the time_wait or close_wait state. If the number is too large reach out to your network guys to tune the tcp keep alive parameters and see if that helps.

Its possible that all the available ports on the client are used and itz running out of ports to establish a new connection for the backup.

Rebooting would release any stale sockets and ensure all ports are available for communication..

Apart from the reboot on the client machine have you tried just restarting netbackup services and then try running a backup.?

Re: oracle db backup failure

Hi Amol,

Tried restarting the services but no luck. Will see next time, just FYI we logged support case last time. They just mentioned that they are unable to find the exact clue. for trouble shooting they asked us to reboot the server and proceed further. BTW when thereboot is done, the issue vanished.

Highlighted

Re: oracle db backup failure

I often had similar errors with oracle rac in some servers, the solution when there were oracle errors was that the dba did a crosscheck and relaunch the backup, in addition to fixing that they are not running in parallel backup of base and redologs, since control files may be blocked, I hope it helps.