sql job stream delay

bmaro · ‎07-30-2014

Netbackup Appliance 5230, 2.6.0.2 acting as master with a windows 2008 media server for offsite tape duplication. This is a new setup and on all of our sql agent backups there is a 5-7 minute delay between application jobs-streams starting. For example the parent job starts then start child job for X database, that stream finishes and then another stream doesn't start for another 5-7 minutes to back up the second database and so on. This happens on every stream. We checked all settings and couldn't find anything, dbas checked and they don't have page verification set or anything like that. We could add batch group size to start more than one database backup at a time but it doesn't explain the wait time. This isn't occuring on any of our master servers. We currently have a ticket opened with Symantec for this, has anyone expereinced this? Thanks.

RiaanBadenhorst · ‎07-30-2014

Do you have a reverse lookup zone configured in DNS? I've seen this many times, without it even the GUI response in general is a lot slower. Once it's created the response improves. If you don't you can set reverse lookup to be prohibited in the master server properties.

Marianne · ‎07-30-2014

I would start troubleshooting on the SQL client, since that is where the SQL jobs originate.

Check Event Viewer Application log and SQL VDI and errorlog, and lastly dbclient log.

Compare job generation in dbclient log with receipt of backup requests in master's bprd log.

My gut-feel is that the delay is on SQL client side. Master can only process backup requests as and when it is received from the client.

Handy NetBackup Links

bmaro · ‎07-31-2014

Thanks Marianne and Riaan.

Marianne thanks I will check those things out. I noticed something right off the bat on in the dbclient log.

3:08:23.523 [14436.5528] <4> serverResponse: Not a candidate for alternate buffer method: shared memory is not in use.
03:08:37.627 [14436.5528] <2> vnet_pbxConnect: pbxConnectEx Succeeded

03:39:49.624 [14436.10792] <2> logconnections: BPRD CONNECT FROM 172.26.255.251.57875 TO 10.198.32.20.1556 fd = 2116
03:42:03.639 [14436.10792] <8> file_to_cache_item: [vnet_addrinfo.c:6574] fopen() failed ERRNO=2 FILE=C:\Program Files\VERITAS\NetBackup\var\host_cache\14f\6cf9fd4f+57894,1,400,2,1,0+127.0.0.1.txt

Below is full dbclient log, also sent these to Symantec, thanks...

See attachment.

bmaro · ‎07-31-2014

Thanks Riaan and Marianne,

Marianne I'm checking on those thanks, one thing I noticed off the bat from the dbclient log is below:

"serverResponse: Not a candidate for alternate buffer method: shared
> > memory is not in use."

I'm trying to post the whole log here but I got a message saying the post is being reviewed...

Marianne · ‎07-31-2014

I have edited your quarantined post to put dbclient log into a file and add as attachment.

Will have a look a bit later....

Handy NetBackup Links

bmaro · ‎07-31-2014

Thanks Marianne you guys have nailed the issue thanks! There were a couple things going on here and the dbclient log really helped us get to the bottom of it.

1. we added NOSHM to the master server, that helped some but things were still inconsistent.

2. Once we were able to gain access to the sql servers we noticed reverse dns lookup by ip was not working on any of the sql servers. We were told this had been fixed but I guess need to always double-check. So we added entries in the host files and things took off right away, about a 30 seconds wait between each stream sometimes lower.

3.Instead of adding host entries on every sql client and while we are waiting for our Windows Group to fix dns we set the prohibited reverse lookup on the master server like Riaan said and things also improved greatly for the other sql servers without host entries. Thank you so much for your help and yo to Riaan, really appreciate it.

VOX

sql job stream delay