cancel
Showing results for 
Search instead for 
Did you mean: 

getting Info nbjm (pid=xxxx) Waiting in NetBackup scheduler work queue on server <master server> has been updated.

Raaavan
Level 5

Hi,

There is serious issue i am facing in my environment with respect to RMAN oracle backup
newly generated streems remain in queue for long time though netbackup resources are available

01/07/2015 23:19:45 - requesting resource nbu_master.NBU_POLICY.MAXJOBS.RMAN_FCRM10DB_ARCHIVE
<= skipped =>
01/07/2015 23:45:45 - Info nbjm (pid=13731) Waiting in NetBackup scheduler work queue on server xxxxxxxx

it waited approximatly 25 min if a backup with 50 streems and each streem wait for such long time resulted into large backup window please suggest 

i found below tech note but can not co relate to this
http://www.symantec.com/business/support/index?pag...
http://www.symantec.com/business/support/index?pag...
 

Master server - Solaris 10 / netbackup 7.5.0.6

Total RMAN policy - 800

Thanks 

9 REPLIES 9

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

There are many reasons why jobs are queued, such as:

MPX settings for STU or policy was reached
All drives are busy
Available media in use
Max jobs per client was reached
Max jobs per policy was reached
etc...

Above log extract seems as if Max Jobs per Policy was reached.

Best to have a look at queued jobs in Activity Monitor to see reason for jobs being queued.

Another possibility may be the Resource Broker on the master server that is battling to keep up with amount of jobs.

Have a look at these performance tuning docs:

NetBackup 7.6 Best Practices: Optimizing Performance

Updated NetBackup Backup Planning and Performance Tuning Guide for Release 7.5 and Release 7.6

mph999
Level 6
Employee Accredited

Persoanlly I would get the following:

Persoanlly I would ask for mds, nbpem, jm, rb and emm at Debug/ Diag 6 and 6 in raw format.  These should cover a restart and left until the system slows and the problem is seen.

In fact, nbrb should only be at debug 4 (hope I've remembered that right) as any higher could cause issues itself.

So, 111, 116, 117, 118 + 143

143 logs into 111, so only 4 to collect.

vxlogcfg-a -p 51216 -o 143 -s DebugLevel=6 -s DiagnosticLevel=6

vxlogcfg-a -p 51216 -o 111 -s DebugLevel=6 -s DiagnosticLevel=6

vxlogcfg -a -p 51216 -o 116 -s DebugLevel=6 -s DiagnosticLevel=6 

vxlogcfg -a -p 51216 -o 117 -s DebugLevel=6 -s DiagnosticLevel=6

vxlogcfg -a -p 51216 -o 118 -s DebugLevel=4 -s DiagnosticLevel=6

Leave the system running long enough that a jobs starts and the delay is seen after the logs have been enabled.

Collect logs nbjm, nbpem, nbrb, nbemm

nbrb log would be the place to start, need to run the raw log through vxlogview with -d all option,.

If you look for lines containing

[ResBroker_i::doEvaluation]

This should show how long the evaluation cycle is taking.

Use same commands but DebugLevel=1 to set back to defaults.

Numerous possible causes, but above may give more clues and if nothing else will give an insight as to other potential performance issues.

Is there any nbrb/ nbemm tuning applied ?

I'd also stop services and see if there are any 'stale' files in /usr/openv/netbackup/db/media/drives and /usr/openv/netbackup/db/media/tpreq  (these files would be in the media server)

Nicolai
Moderator
Moderator
Partner    VIP   

MPH999 is on a clue

If EMM has to little memory  to play with, it can cause ridiculous long resource handling in general. You might consider to set the EMM memory configuration to large

Run /usr/openv/db/bin/dbadm

Select

2) Database Space and Memory Management

4) Adjust Memory Settings,

3) Large.

Re-start of Netbackup required.

That said - Marianne also has a very valid point also - but we can't with the given information available right now tell wrong. Consider also to increase the policy priority 99999, this will let the Oracle job come first in queue.

Raaavan
Level 5

Dear All

Apologies for the delay in reply 


Tech support suggested me to remove  offline server list from EMM & also unwanted entry from bp.conf  [ a proper
decommission ( 18 was the count at that time)  ] since NBRB while scanning for the resource request it tries to contact each entry in bp.conf and it takes significant amount of time there.

I did that though issue is not totally resolved but there is improvement


Still I have 10 offline servers ,On other part some team doing POC  & servers are not gracefully added & removed in EMM
I trying to get count below 5-4 but there are some application dependacy like switch back switch overs , LDOM server etc.  so it become task to achive 


In parallel i am looking in to Oracle backup tuning since this environment having 723 Oracle Type & 584 MS-SQL-server Type polices which are quit a large in number
There are some flaws from DBA end since RMAN & netbackup repository maintenance is not done ( CROSSCHECK & Obsolete backup pieces build up over the period ) also "%t" factor in rman FORMAT templet ( NBU 7.0 )

also observed that minimum patch level to run netbackup 7.5.0.6 on solaris box was missing very soon we will patch master with latest patch level 

I can't see any thumb rule to get rid out of this

@Nicola - Checked enough memory is there

@
MPH999 - i have not collected this logs i'll check & update on this

@Marianne - I have crosschecked all setting suggested by you & they are appropriate

Raaavan
Level 5

Issue started reoccurring at this moment i don't have offline media server in EMM but still allocation is getting delay.

I have 300 + RMAN backup runs in production hours. i have observed that though DBA are using RMAN template comes with NBU

they having additional statement like

DELETE NOPROMPT ARCHIVELOG ALL BACKED UP 2 TIMES TO 'SBT_TAPE' COMPLETED BEFORE 'SYSDATE -2';

and

and other maintenance task command like crosscheck , etc ..

I suspect this request must be hitting EMM & making it busy which might be causing allocation delay but not sure so need help like what comes in bprd or bpdbm logs when any maintains statement is  run by RMAN

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Have a look at this post dated 12 Jan :

https://www-secure.symantec.com/connect/forums/getting-info-nbjm-pidxxxx-waiting-netbackup-scheduler... 

Have you enabled and collected logs as suggested?

Answer to this question in same post?

Is there any nbrb/ nbemm tuning applied ? 

Have you gone through Performance Tuning Guides such as these?

NetBackup Backup Planning and Performance Tuning Guide  

NetBackup 7.6 Best Practices: Optimizing Performance 
(Will be valid for 7.5 as well).

Raaavan
Level 5

I have enabled the logs today as there was space  crunch now unified logs are moved to another filesystem. will share details with you post today's production hours

Answer to the question

Is there any nbrb/ nbemm tuning applied ?

I don't think it is tuned explicitly , find conf file

cat /usr/openv/var/global/nbrb.conf
SECONDS_FOR_EVAL_LOOP_RELEASE = 180
RESPECT_REQUEST_PRIORITY = 0
DO_INTERMITTENT_UNLOADS = 1
 

Raaavan
Level 5

PFA logs

JOB details
07/15/2015 14:40:51 - Info nbjm (pid=15859) starting backup job (jobid=10051329) for client IHYDIBMPF-C3B4-CL7, policy RMAN_XXY_WLY, schedule VTL_MLY
07/15/2015 14:40:51 - Info nbjm (pid=15859) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=10051329, request id:{63F0A0B4-2AD1-11E5-B45F-002128825C08})
07/15/2015 14:40:51 - requesting resource DD01_XXY_VTL
07/15/2015 14:40:51 - requesting resource hydt5440-nbms.NBU_CLIENT.MAXJOBS.IHYDIBMPF-C3B4-CL7
07/15/2015 14:40:51 - requesting resource hydt5440-nbms.NBU_POLICY.MAXJOBS.RMAN_XXY_WLY
07/15/2015 14:42:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 14:44:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 14:46:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 14:48:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 14:50:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 14:52:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 14:54:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 14:56:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 14:58:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 15:00:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 15:02:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 15:04:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 15:06:51 - Info nbjm (pid=15859) Waiting in NetBackup scheduler work queue on server hydt5440-nbms
07/15/2015 15:07:56 - granted resource  hydt5440-nbms.NBU_CLIENT.MAXJOBS.IHYDIBMPF-C3B4-CL7
07/15/2015 15:07:56 - granted resource  hydt5440-nbms.NBU_POLICY.MAXJOBS.RMAN_XXY_WLY
07/15/2015 15:07:56 - granted resource  QL0106
07/15/2015 15:07:56 - granted resource  IBM.ULTRIUM-TD3.514
07/15/2015 15:07:56 - granted resource  DD01_XXY_VTL
07/15/2015 15:07:57 - estimated 0 kbytes needed
07/15/2015 15:07:57 - Info nbjm (pid=15859) started backup (backupid=IHYDIBMPF-C3B4-CL7_1436953076) job for client IHYDIBMPF-C3B4-CL7, policy RMAN_XXY_WLY, schedule VTL_MLY on storage unit DD01_XXY_VTL
07/15/2015 15:07:59 - Info bpbrm (pid=27621) starting bptm
07/15/2015 15:07:59 - Info bpbrm (pid=27621) Started media manager using bpcd successfully
07/15/2015 15:07:59 - started process bpbrm (pid=27621)
07/15/2015 15:08:01 - Info bpbrm (pid=27621) IHYDIBMPF-C3B4-CL7 is the host to backup data from
07/15/2015 15:08:01 - Info bpbrm (pid=27621) telling media manager to start backup on client
07/15/2015 15:08:01 - Info bptm (pid=27633) using 262144 data buffer size
07/15/2015 15:08:01 - Info bptm (pid=27633) using 128 data buffers
07/15/2015 15:08:02 - Info bpbrm (pid=27621) spawning a brm child process
07/15/2015 15:08:02 - Info bpbrm (pid=27621) child pid: 27643
07/15/2015 15:08:02 - Info bptm (pid=27633) start backup
07/15/2015 15:08:02 - Info bptm (pid=27633) Waiting for mount of media id QL0106 (copy 1) on server XXY.
07/15/2015 15:08:02 - mounting QL0106
07/15/2015 15:08:03 - Info bpbrm (pid=27621) sending bpsched msg: CONNECTING TO CLIENT FOR IHYDIBMPF-C3B4-CL7_1436953076
07/15/2015 15:08:03 - Info bpbrm (pid=27621) listening for client connection
07/15/2015 15:08:03 - connecting
07/15/2015 15:08:06 - Info bptm (pid=27642) setting receive network buffer to 524288 bytes
07/15/2015 15:08:07 - Info bpbrm (pid=27621) INF - Client read timeout = 32000
07/15/2015 15:08:07 - Info bpbrm (pid=27621) accepted connection from client
07/15/2015 15:08:07 - connected; connect time: 0:00:00
07/15/2015 15:08:11 - Info bptm (pid=27633) media id QL0106 mounted on drive index 66, drivepath /dev/rmt/91cbn, drivename IBM.ULTRIUM-TD3.514, copy 1
07/15/2015 15:08:12 - mounted QL0106; mount time: 0:00:10
07/15/2015 15:08:24 - positioning QL0106 to file 157
07/15/2015 15:08:25 - Info bpbkar (pid=9961730) Backup started
07/15/2015 15:08:25 - positioned QL0106; position time: 0:00:01
07/15/2015 15:08:25 - begin writing
07/15/2015 15:08:25 - Info bpbrm (pid=27621) Sending the file list to the client
07/15/2015 15:08:25 - Info bpbkar (pid=9961730) dbclient(pid=9961730) wrote first buffer(size=262144)
07/15/2015 15:08:26 - Info bpbkar (pid=9961730) done. status: 0
07/15/2015 15:08:30 - Info bpbrm (pid=27621) media manager for backup id IHYDIBMPF-C3B4-CL7_1436953076 exited with status 0: the requested operation was successfully completed
07/15/2015 15:08:31 - end writing; write time: 0:00:06
the requested operation was successfully completed  (0)

Logs including nbrb -

As per the log total evaluation time taken by nbrb is 25 min. where in DD VTL tape drives were available , Please see subsequent vmoprcmd output from respective media server & allocation details.
request you to check why nbrb takes a long to satisfy such request.

07/15/15 14:42:28.778 V-118-227 [ResBroker_i::requestResources] received resource request ID {63F0A0B4-2AD1-11E5-B45F-002128825C08}, priority 0, secondary priority 26,278, description THE_BACKUP_JOB-10051329-{63F0A0B4-2AD1-11E5-B45F-002128825C08}
07/15/15 14:42:28.779 [ResBroker_i::requestResources] RequestSeq:
Request=0 provider=MPXProvider resourcename=MpxRequest-10051329 MPXGroupRequest maxmpx=2
    SubRequest=0 provider=DriveOperationProvider resourcename=DD01_XXY_VTL userSequence=0 RB::StorageUnitRequest { StorageUnitRequest { storageUnit=DD01_XXY_VTL mediaPool=QUATERLY_VTL retentionLevel=0 mustUseLocalMediaServer=no failOnError=no mpxRequired=no mustBeNdmp=no getMaxFreeSpace=no minFreeSpaceKBytes=0 usageType=1 client=IHYDIBMPF-C3B4-CL7 shareGroup=*ANY* isLifeCycle=true NDMPHostName= } }
Request=1 provider=NamedResourceProvider resourcename=hydt5440-nbms.NBU_CLIENT.MAXJOBS.IHYDIBMPF-C3B4-CL7 CountedResourceRequest { max=200 }
Request=2 provider=NamedResourceProvider resourcename=hydt5440-nbms.NBU_POLICY.MAXJOBS.RMAN_XXY_WLY CountedResourceRequest { max=2147483647 }
07/15/15 14:42:28.780 [ResBroker_i::evaluateNow] scheduling evaluation immediately
07/15/15 14:42:28.780 [ResBroker_i::evaluateNow] evaluation cycle is in progress
07/15/15 15:07:54.022 V-118-226 [ResBroker_i::evaluateOne] Evaluating request ID {63F0A0B4-2AD1-11E5-B45F-002128825C08}

07/15/15 15:07:56.628 [EMMProvider::setDriveAllocated] persistDriveInfo: status=0, DriveInfo=DriveInfo: allocationKey=92850418 driveName=IBM.ULTRIUM-TD3.514 driveKey=2019618 STU=DD01_XXY_VTL mediaServer=XXY mediaId=QL0106 abstime=0 delay=0 state=inuse driveIsReusable=1 ejectUponUnload=0 jobtype=1 pendingRequestWasAssigned=0 unloadType=0 flags=0 reallocating=0 deallocating=3 mediaRequestDelayAbsTime=0 externalFile= waitingRequest={00000000-0000-0000-0000-000000000000} pendingGuid={00000000-0000-0000-0000-000000000000} waitingAllocation=0

STU details
Label:                DD01_XXY_VTL
Storage Unit Type:    Media Manager
Host Connection:      XXY
Number of Drives:     30
On Demand Only:       no
Max MPX/drive:        2
Density:              hcart3 - 1/2 Inch Cartridge 3
Robot Type/Number:    TLD / 16
Max Fragment Size:    512000 MB

vmoprcmd -d -h  XXY
 56 IBM.ULTRIUM-TD3.504   No       XXY                          
 57 IBM.ULTRIUM-TD3.505   No       -                                        
 58 IBM.ULTRIUM-TD3.506   No       -                                        
 59 IBM.ULTRIUM-TD3.507   No       -                                        
 60 IBM.ULTRIUM-TD3.508   No       -                                        
 61 IBM.ULTRIUM-TD3.509   No       XXY                          
 62 IBM.ULTRIUM-TD3.510   No       -                                        
 63 IBM.ULTRIUM-TD3.511   No       -                                        
 64 IBM.ULTRIUM-TD3.512   No       -                                        
 65 IBM.ULTRIUM-TD3.513   No       -                                        
 66 IBM.ULTRIUM-TD3.514   No       -                                        
 67 IBM.ULTRIUM-TD3.515   No       -                                        
 68 IBM.ULTRIUM-TD3.516   No       -                                        
 69 IBM.ULTRIUM-TD3.517   No       XXY                          
 70 IBM.ULTRIUM-TD3.518   No       XXY                          
 71 IBM.ULTRIUM-TD3.519   No       -                                        
 72 IBM.ULTRIUM-TD3.520   No       -                                        
 73 IBM.ULTRIUM-TD3.521   No       -                                        
 74 IBM.ULTRIUM-TD3.522   No       -                                        
 75 IBM.ULTRIUM-TD3.523   No       XXY                          
 76 IBM.ULTRIUM-TD3.524   No       -                                        
 77 IBM.ULTRIUM-TD3.525   No       -                                        
 78 IBM.ULTRIUM-TD3.526   No       -                                        
 79 IBM.ULTRIUM-TD3.527   No       -                                        
 80 IBM.ULTRIUM-TD3.528   No       -                                        
 81 IBM.ULTRIUM-TD3.529   No       XXY                          
 82 IBM.ULTRIUM-TD3.530   No       XXY                          
 83 IBM.ULTRIUM-TD3.531   No       -                                        
 95 IBM.ULT3580-TD5.029   Yes      -

Moved:

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified