cancel
Showing results for 
Search instead for 
Did you mean: 

SQL Jobs are failing with timeout errors

onepranav
Level 4

Hello Experts!

I have been facing issue recently where thousands of DB jobs for my SQL DB windows clients started failing with error 54(Child)/2(Parent)

I tried some basic troubleshooting like limiting the amount of jobs that run per client to 10, Extending the overall backup window, 

Limiting the amount of jobs per STU to 10 and then overall limiting the jobs to use only 4 and then 3 STU's

Environment -

1 Master : HP-Ux 11.31 NBU 7505  [Client Connect timeout - 300 Client read timeout - 3600 Media server Connect timeout - 90]

6 Media : 3 X Hp-Ux 11.31 NBU 7505 + 3 X Redhat 6.2 NBU 7505 [Client Connect timeout - 300 Client read timeout - 3600 Media server Connect timeout - 90]

Multiple Clients : Win 2003 R2 and Win 2008 R2 X64

The failed jobs do run in the second attempt and get completed but idea is to get them working in the first attempt.

May be i am missing out something basic here but not sure. 

Any help will be greatly appreciated.

 

9 REPLIES 9

Marianne
Level 6
Partner    VIP    Accredited Certified

You forgot to mention NBU version on the SQL Clients?

Is there a firewall anywhere in the picture?

Have a look at this TN and see if anything may be relevant:
http://www.symantec.com/docs/TECH138071

 

onepranav
Level 4

Thanks for the reply Marianne. The agent versions are 7.5.

There is no firewall in that particular environment setup.

Michael_G_Ander
Level 6
Certified

CLIENT_READ_TIMEOUT and CLIENT_CONNECT_TIMEOUT DWORD registry keys have helped me with a lot of timout issues, the default 300 (seconds) is often insufficient for database backups and especially restores

Regards

Michael

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

onepranav
Level 4

The Timeout is currently set to 1800 on the client and 3600 on the media servers.

NathanNieman
Level 6

Windows 2008 has the firewall turned on.  Have  you tried adding rules on the windows server to allow comminication.  This was a issues I ran into.

onepranav
Level 4

Confirmed with our windows team that a exception rule for netbackup ports already exists. While troubleshooting here is what i found that it is happening randomly on saturdays during 6 AM and 10 PM which is when actually most of the jobs run. The job completes if it is rerun later on sunday or monday. I am arriving to a point where it might be a resource crunch causing the timeout. i also observed that the parent is getting initiated and fails, after which the child comes into the queue and then dies.

NathanNieman
Level 6

Is there any way to test by running a different day.  Or move some of the other jobs to different time.  I know I fire off my fulls all weekend.  Starting Friday night till Sunday morning.  To help with Speed.

onepranav
Level 4

I tried but the problem is i have ~5000 SQL policies along with other ones and i will have to accomodate them somewhere during the weekend. whereever i tried to move them, they failed during thier first run's. later got them runnning in sub-groups some how.

Marianne
Level 6
Partner    VIP    Accredited Certified

2-month-old unresolved post....

It seems as if this environment has outgrown the infrastructure... more backups than servers and network can handle?

Have a look at this Best Practice session at Vision this year: 

NetBackup 7.6 Best Practices: Optimizing Performance