Forum Discussion

Arshad_Khateeb's avatar
9 years ago

parent job fails for database backup

Master        SunOS 5.9     UNIX    Master Server    6.5.6    Connected
Media        SunOS 5.10     UNIX    Media Server    6.5.6    Connected

 

I know the NBU version is not supported but to let you all know that we are already in discussion with Veritas and couple of other Backup Softwares like CommVault, etc.

NetBackup is our prefered one since many years but this time management is having word with couple of other options as well.

Okay ... regarding the issue we have. It would be great if someone could assist.

We're having issue with database backups and the parent job fails with '40, network connection broken' and rest of all child jobs completes without any issue. Sometimes the archive fails as well but with status code 6. The actual problem is with database backup. I have done some troubelshooting that is required in such issues but it still persists.

Let me know what you guys need.

  • KeepAlive interval should be reduced and should match on master, media server and Oracle client. PS: I hope there is a plan to upgrade or replace outdated Solaris 9 master server?
  • Each 'intermittent' failure needs to be looked at individually to find the exact reason for that particular failure.

    My guess is aging infrastructure is the root of the problem.

    Hopefully a complete hardware, software, network refresh is planned before you consider new backup software.
    Any backup software relies on underlying environment.

    About StorageTek 9740 - it is VERY VERY old. Older that L700.
    Last time I saw one was with DLT7000 tape drives!

    This library (although good, reliable hardware) is no longer supported by Oracle.
    You will battle to find someone to support and maintain it.
    You will not be able to use 'current' tape drive technology and media in it.

    You can Google for online documentation and support options.

    If you want to replace the L700, look in NBU Hardware Compatibility Guide for supported make/model tape libraries and the tape drives supported in them.
    Look online for local resellers, look at various make/model specs and datasheets.
    Use Google to search for potential issues with hardware and/or local support partners.

    If you have shortlisted a couple of options, ask in a new forum discussion for user experience.

     

  • We reduced the keepalive value to 900 secs on master and all media servers. One oracle database backup is working but not sure if this is the reason as it also fails intermittently and couple of other fails intermittently as well. We're wroking with DBAs and they are looking their scripts.

     

    Apart from this, one of our site has StorageTek L700 tape library and we got an option to replace it with one spare StorageTek 9740. Would this make a good replacement for the library? Any suggestions or reference?

  • Recommended KeepAlive interval is 300000. See http://www.veritas.com/docs/000083822 Don't forget client as well. As per my previous post - reduce interval to matching values on master, media and clientm
  • thanks all for reverting back on this issue. I'm sorry for the delay from my end.

    Marianne, the keepalive is same (1800000) on master and media but still the parent fails with 40 status code. And yes, the Unix team will be upgrading the OS in next month or so.

  • KeepAlive interval should be reduced and should match on master, media server and Oracle client. PS: I hope there is a plan to upgrade or replace outdated Solaris 9 master server?
  • client read and other parameters like keepalive interval is already set to more than the suggested ones

  • The post https://www.veritas.com/community/forums/general-database-backup-error-troubleshooting might be able to help you out.

    In case of an Oracle backup, I would ensure that the _%t parameter was used in the rman scripts

     

  • If any firewalls are involved, try setting TCP keepalkive to 15 minutes

    https://docs.oracle.com/cd/E19120-01/open.solaris/819-2724/fsvdg/index.html

    set CLIENT_READ_TIMEOUT = 1800 on master, media and client.