cancel
Showing results for 
Search instead for 
Did you mean: 

Many Jobs delayed "granting resources" and "connecting"

Saul_Grysman
Level 2
We've recently upgraded from 6.5.3 to 6.5.4 and we've added an additional Media Server into the environment (all Solaris based).  We are experiencing significant delays in starting jobs - Unix, Windows, SQL and Oracle/RMAN - and writing to disk or tape.  The example below is typical: 12 minute delay in granting resources and 15 minute delay is connecting.  With the prior version, these times were in seconds.  We've restarted our entire environment to no avail.  Any help would be appreciated.

03/03/2010 11:14:51 - requesting resource nburobot
03/03/2010 11:14:51 - requesting resource nbucore1.NBU_CLIENT.MAXJOBS.sqlmisp

03/03/2010 11:14:51 - requesting resource nbucore1.NBU_POLICY.MAXJOBS.sql-sqlmisp

03/03/2010 11:27:27 - granted resource  nbucore1.NBU_CLIENT.MAXJOBS.sqlmisp
03/03/2010 11:27:27 - granted resource  nbucore1.NBU_POLICY.MAXJOBS.sql-sqlmisp
03/03/2010 11:27:27 - granted resource  T01171
03/03/2010 11:27:27 - granted resource  sl8500_drv20
03/03/2010 11:27:27 - granted resource  nbucore1_sl8500
03/03/2010 11:27:29 - estimated 0 kbytes needed
03/03/2010 11:27:30 - started process bpbrm (pid=4546)
03/03/2010 11:27:31 - connecting
03/03/2010 11:27:42 - connected; connect time: 0:00:00
03/03/2010 11:42:43 - mounting T01171
03/03/2010 11:43:30 - mounted T01171; mount time: 0:00:47
03/03/2010 11:43:31 - positioning T01171 to file 596
03/03/2010 11:44:53 - positioned T01171; position time: 0:01:22
03/03/2010 11:44:53 - begin writing
03/03/2010 11:45:14 - end writing; write time: 0:00:21
the requested operation was successfully completed (0)
29 REPLIES 29

cgoliver
Level 5
Upgraded to 6.5.5 today, we shall see what progress has been made.

CRZ
Level 6
Employee Accredited Certified

I posted it earlier, but here it is again:

DOCUMENTATION: Issues resolved in the nbrb package for NetBackup Resource Broker, included in Etrack 1924099.
 http://support.veritas.com/docs/346920


venkat6_k
Level 2

For windows <inastall path>\vertias\netbackup\logs directory, there you can find  xxx.bat file. excute that file will create the all log file for netbackup.
and in registry, set value debuglogs =5 for detailed logs

cgoliver
Level 5
Just grabbed it and will patch today. Came in this morning for find jobs still queued up from last night and looking at the logs, it appears that we had many jobs that didn't finish.

Back to the drawing board.

CRZ --- Can you answer this? We have been told that 30 tape drives should not be shared amongst 25 media servers. True or False?

cgoliver
Level 5

The EEB in addition to the 6.5.5 update on the master has not done much of anything to alleviate our resources problems. The conversions are slow and utilization of the tape drives is sub optimal.

We spent three months working with third level support and got no where on resolving this issue. Still limping along with a million dollar infrastructure that will not backup more than about 20TB per day.

The media servers are still running 6.5.4, maybe we need to get those up to 6.5.5 and add the EEB


cgoliver
Level 5

We are still experiencing resource allocation problems with mount times averaging 15 minutes to one hour.

I have updated most of our media servers to 6.5.5, which has not helped much with the daily incrementals.

Last weekend we sent ~ 36TB to tape, but during the week our incrementals struggle to finish. The process of mounting tapes takes more than 15 minutes and in many cases it takes an hour or more.

Thanks,
Chris

bcblake
Level 4
Partner
cgoliver:

Have you tried splitting off your EMM db, db/images, log files directories (both legacy and VxUL), and maybe even your EMM transaction logs onto separate storage LUNs and file systems? This helped us tremendously when we were originally having nbrb and EMM performance problems.

Also, what kind of disk is EMM sitting on (local attached, SAN, etc.)?

When you say you're seeing a 15 minute delay in mounting tapes, is this a delay in waiting for the resources to be granted, or actually physically (or virtually in the case of a VTL) mounting a tape? If the latter, have you looked at syslog/messages file/EventLog with ltid in verbose mode to see if its actually getting delayed in the tape mount request being sent/acknowledged by the tape library? If it's a TLD type library, usually you'll see those messages in the syslog (opening/closing robotic path blah-blah-blah)...

Just trying to think of any other possibilities why there would be delays...

cgoliver
Level 5

The EMM database is local, db/images are split onto into two different mount points (one local, one SAN) some logging is placed on the same LUN as db/images.

Amit_Karia
Level 6
We are facing same issue while duplicating from DSSU to tape
even after upgrading 6.5.5 to all our media servers.. Please update if applying nbrb binary has helped or not..

following is the error we are facing


Copy Num: 2, NBU status: 2005000, EMM status: No media is available

crowe
Not applicable
Master Server is Linux based NB 6.5.4

Symptoms:  Client jobs sitting at "started process bpbrm (pid = xxxxxx)" where xxxxxx is the pid number.  I view this from Activity Monitor, double click job, select Detailed Status tab.   Here is the ouput:

04/27/2010 20:03:16 - requesting resource Any
04/27/2010 20:03:16 - requesting resource TPUSA-BACKUP3.NBU_CLIENT.MAXJOBS.TWI-BB
04/27/2010 20:03:16 - requesting resource TPUSA-BACKUP3.NBU_POLICY.MAXJOBS.LEX-WIN2003
04/27/2010 20:03:16 - granted resource  TPUSA-BACKUP3.NBU_CLIENT.MAXJOBS.TWI-BB
04/27/2010 20:03:16 - granted resource  TPUSA-BACKUP3.NBU_POLICY.MAXJOBS.LEX-WIN2003
04/27/2010 20:03:16 - granted resource  WZE401
04/27/2010 20:03:16 - granted resource  HP.ULTRIUM2-SCSI.002
04/27/2010 20:03:16 - granted resource  tpusa-backup2-hcart2-robot-tld-1
04/27/2010 20:03:16 - estimated 1137440 kbytes needed
04/27/2010 20:03:16 - begin Parent Job
04/27/2010 20:03:16 - begin Snapshot: Start Notify Script
04/27/2010 20:03:16 - started process RUNCMD (pid=14974)
04/27/2010 20:03:16 - ended process 0 (pid=14974)
Operation Status: 0
04/27/2010 20:03:16 - end Snapshot: Start Notify Script; elapsed time 0:00:00
04/27/2010 20:03:16 - begin Snapshot: Step By Condition
Operation Status: 0
04/27/2010 20:03:16 - end Snapshot: Step By Condition; elapsed time 0:00:00
04/27/2010 20:03:16 - begin Snapshot: Stream Discovery
Operation Status: 0
04/27/2010 20:03:17 - end Snapshot: Stream Discovery; elapsed time 0:00:01
04/27/2010 20:03:17 - begin Snapshot: Read File List
Operation Status: 0
04/27/2010 20:03:17 - end Snapshot: Read File List; elapsed time 0:00:00
04/27/2010 20:03:17 - begin Snapshot: Create Snapshot
04/27/2010 20:03:20 - begin Create Snapshot
04/27/2010 20:03:18 - started process bpbrm (pid=11314)

Jobs like this and a few others will sit in this status (with the State type of "active" on the Activity Monitor main page) until the backup window expires which is 10 hours.  This occurs with different clients in different policies.  All the while other jobs will be completing around these.  All of the clients that this occurs on are windows boxes doing flat file backups.  Some Windows with flat file backups still run normally.  Linux backups run normally.  Windows SQL backups all run normally.

This behavior has not been observed until all Windows flat file policies were set to to Multiplexing 2 and "allow multiple data streams".  Tonight I am going to change all of these windows policies back to multiplexing 1 and uncheck "allow multiple data streams" and see if this re-occurs.

In writing this post I also found a pattern.  It appears that all of the clients stuck in this status have NB client 6.0.  I am going to put all of these 6.0 clients experiencing the issue in a separate policy without streaming and multiplexing and see if the issue re-occurs.  I have not yet found a 6.5 client that behaves this way.  Needs more investigation.  Will post more as I discover it.