cancel
Showing results for 
Search instead for 
Did you mean: 

nbjm behaviour

deamon
Level 4
Partner
Hi,
Running NBU 6.5.2A on Solaris Master and noticed nbjm was utilising over 40% of CPU constantly. ran vxlogview for nbjm for the past 24 hours and noticed the following entries repeated from 06:47 onwards

11/18/09 13:26:16.543 [Debug] NB 51216 nbjm 117 PID:12294 TID:8 File ID:117 [No context] 1 [JobAttr:FindJobFile] (fb67dfc8) cannot find param file for jobid=3339913(JobParam.cpp:540)
11/18/09 13:26:16.549 [Debug] NB 51216 nbjm 117 PID:12294 TID:1 File ID:117 [No context] 1 [JobManager_i::getJobById] (968098) job not found in map, jobid=3339913(JobManager_i.cpp:791)

....so looked for 3339913 in activity monitor but it was not there, problems report showed this job had failed at 02:05 with 86, ran vxlogview on the jobid and it showed entries up until 02:05

11/18/09 02:05:32.901 nbpem PID:12445 File ID:116 (ID:1774820) Active subtask count=0(PemTask.cpp:528)
11/18/09 02:05:32.910 nbpem PID:12445 File ID:116 [Info] CLIENT ***************  POLICY BIB-S1-eTrust-7_Year_Logs  SCHED Daily_Dinc  EXIT STATUS 86 (media position error)
11/18/09 02:05:32.921 nbpem PID:12445 File ID:116 [Error] backup of client *************** exited with status 86 (media position error)
11/18/09 02:05:50.553 nbrb PID:11659 File ID:118 received release of mediaId=011449, driveName=S1SL85-16-9940B, STU=S1MED1-S1-9940B
11/18/09 02:05:50.845 nbrb PID:11659 File ID:118 received release of mediaId=020428, driveName=S2SL85-13-9940B, STU=S1MED1-S2-9940B
11/18/09 02:05:50.972 mds PID:11538 File ID:111 [Info] Drive S2SL85-13-9940B cannot be made available, it has pending actions: 1

looking back at the nbjm log it showed the following for the job id

11/18/09 02:05:32.831 [Debug] NB 51216 nbjm 117 PID:12294 TID:4 File ID:117 [No context] 1 [CallbackQueue::queueRequest] queueing BPJobdExpireJob jobid=3339913, secondary jobid count=2 -- retry count=0(CallbackQueue.cpp:1268)
11/18/09 02:05:32.832 [Debug] NB 51216 nbjm 117 PID:12294 TID:4 File ID:117 [No context] 1 [JobManager_i::doForgottenJobCleanup] (968098) job has been forgotten, perform cleanup, jobid=3339913(JobManager_i.cpp:2026)
11/18/09 02:05:32.898 [Debug] NB 51216 nbjm 117 PID:12294 TID:4 File ID:117 [No context] 1 [JobManager_i::deleteFrozenImage] (968098) frozen image delete failed, no snapid for jobid=3339913(JobManager_i.cpp:1961)
11/18/09 02:05:33.177 [Debug] NB 51216 nbjm 117 PID:12294 TID:5 File ID:117 [No context] 1 [CallbackQueue::handle_input] sending BPJobdExpireJob jobid=3339913, secondary jobid count=2(CallbackQueue.cpp:1391)
11/18/09 02:05:41.841 [Debug] NB 51216 nbjm 117 PID:12294 TID:1 File ID:117 [No context] 1 [JobMapper::startDelayedJob] (75ef58) job not found in the delayed job map, jobid=3339913(JobMapper.cpp:1026)
...........................................................
11/18/09 06:47:53.681 [Debug] NB 51216 nbjm 117 PID:12294 TID:7 File ID:117 [No context] 1 [CallbackQueue::queueRequest] queueing PEMretryJob : jobid=3339913, parentid=3339913, retryType=2 -- retry count=-1(CallbackQueue.cpp:1268)
11/18/09 06:47:58.031 [Debug] NB 51216 nbjm 117 PID:12294 TID:1 File ID:117 [No context] 1 [CallbackQueue::handle_input] sending PEMretryJob : jobid=3339913(CallbackQueue.cpp:1391)
11/18/09 06:47:58.038 [Debug] NB 51216 nbjm 117 PID:12294 TID:4 File ID:117 [No context] 1 [JobManager_i::getJobById] (968098) job not found in map, jobid=3339913(JobManager_i.cpp:791)

Can anyone advise what has happened here, restarting the nbu services has dropped the nbjm cpu utilisation to a normal level.

TIA
4 REPLIES 4

Mike_Gavrilov
Moderator
Moderator
Partner    VIP    Accredited Certified
It's better to performe an upgrade up to 6.5.4 first.

deamon
Level 4
Partner
is this a known issue/bug in 6.5.2A?
i couldnt see a technote relating to this ?

Karthikeyan_Sun
Level 6
 There is a known bug in 6.5.3

 ------------------------------------------------

BugID: 1455909
Product: NetBackup_6.5.3
Problem description:
   nbjm.exe Application Faults under heavy load.
Fix:  6.5.4

------------------------------------------------

 After applying NetBackup Release Update 6.5.3, nbjm.exe terminates unexpectedly. 

   See  http://seer.entsupport.symantec.com/docs/316270.htm

SOLUTION

============================

Apply  6.5.4  to the master

              NB_6.5.4.winnt.x86.exe
              http://support.veritas.com/docs/326385

              (clients do not need to be patched)


Mike_Gavrilov
Moderator
Moderator
Partner    VIP    Accredited Certified
I can't find technote but anyway it's a fastest way to exclude possible known issues. If you see some memory leaking it can be fixed only by binary upgrade and you'll get a up to date NB.