nbjm behaviour

deamon · ‎11-18-2009

Hi,
Running NBU 6.5.2A on Solaris Master and noticed nbjm was utilising over 40% of CPU constantly. ran vxlogview for nbjm for the past 24 hours and noticed the following entries repeated from 06:47 onwards

11/18/09 13:26:16.543 [Debug] NB 51216 nbjm 117 PID:12294 TID:8 File ID:117 [No context] 1 [JobAttr:FindJobFile] (fb67dfc8) cannot find param file for jobid=3339913(JobParam.cpp:540)
11/18/09 13:26:16.549 [Debug] NB 51216 nbjm 117 PID:12294 TID:1 File ID:117 [No context] 1 [JobManager_i::getJobById] (968098) job not found in map, jobid=3339913(JobManager_i.cpp:791)

....so looked for 3339913 in activity monitor but it was not there, problems report showed this job had failed at 02:05 with 86, ran vxlogview on the jobid and it showed entries up until 02:05

11/18/09 02:05:32.901 nbpem PID:12445 File ID:116 (ID:1774820) Active subtask count=0(PemTask.cpp:528)
11/18/09 02:05:32.910 nbpem PID:12445 File ID:116 [Info] CLIENT *************** POLICY BIB-S1-eTrust-7_Year_Logs SCHED Daily_Dinc EXIT STATUS 86 (media position error)
11/18/09 02:05:32.921 nbpem PID:12445 File ID:116 [Error] backup of client *************** exited with status 86 (media position error)
11/18/09 02:05:50.553 nbrb PID:11659 File ID:118 received release of mediaId=011449, driveName=S1SL85-16-9940B, STU=S1MED1-S1-9940B
11/18/09 02:05:50.845 nbrb PID:11659 File ID:118 received release of mediaId=020428, driveName=S2SL85-13-9940B, STU=S1MED1-S2-9940B
11/18/09 02:05:50.972 mds PID:11538 File ID:111 [Info] Drive S2SL85-13-9940B cannot be made available, it has pending actions: 1

looking back at the nbjm log it showed the following for the job id

11/18/09 02:05:32.831 [Debug] NB 51216 nbjm 117 PID:12294 TID:4 File ID:117 [No context] 1 [CallbackQueue::queueRequest] queueing BPJobdExpireJob jobid=3339913, secondary jobid count=2 -- retry count=0(CallbackQueue.cpp:1268)
11/18/09 02:05:32.832 [Debug] NB 51216 nbjm 117 PID:12294 TID:4 File ID:117 [No context] 1 [JobManager_i::doForgottenJobCleanup] (968098) job has been forgotten, perform cleanup, jobid=3339913(JobManager_i.cpp:2026)

11/18/09 02:05:32.898 [Debug] NB 51216 nbjm 117 PID:12294 TID:4 File ID:117 [No context] 1 [JobManager_i::deleteFrozenImage] (968098) frozen image delete failed, no snapid for jobid=3339913(JobManager_i.cpp:1961)

11/18/09 02:05:33.177 [Debug] NB 51216 nbjm 117 PID:12294 TID:5 File ID:117 [No context] 1 [CallbackQueue::handle_input] sending BPJobdExpireJob jobid=3339913, secondary jobid count=2(CallbackQueue.cpp:1391)

11/18/09 02:05:41.841 [Debug] NB 51216 nbjm 117 PID:12294 TID:1 File ID:117 [No context] 1 [JobMapper::startDelayedJob] (75ef58) job not found in the delayed job map, jobid=3339913(JobMapper.cpp:1026)

...........................................................

11/18/09 06:47:53.681 [Debug] NB 51216 nbjm 117 PID:12294 TID:7 File ID:117 [No context] 1 [CallbackQueue::queueRequest] queueing PEMretryJob : jobid=3339913, parentid=3339913, retryType=2 -- retry count=-1(CallbackQueue.cpp:1268)

11/18/09 06:47:58.031 [Debug] NB 51216 nbjm 117 PID:12294 TID:1 File ID:117 [No context] 1 [CallbackQueue::handle_input] sending PEMretryJob : jobid=3339913(CallbackQueue.cpp:1391)

11/18/09 06:47:58.038 [Debug] NB 51216 nbjm 117 PID:12294 TID:4 File ID:117 [No context] 1 [JobManager_i::getJobById] (968098) job not found in map, jobid=3339913(JobManager_i.cpp:791)

Can anyone advise what has happened here, restarting the nbu services has dropped the nbjm cpu utilisation to a normal level.

TIA

Mike_Gavrilov · ‎11-19-2009

It's better to performe an upgrade up to 6.5.4 first.

deamon · ‎11-19-2009

is this a known issue/bug in 6.5.2A?
i couldnt see a technote relating to this ?

Karthikeyan_Sun · ‎11-20-2009

There is a known bug in 6.5.3

------------------------------------------------

BugID: 1455909
Product: NetBackup_6.5.3
Problem description:
nbjm.exe Application Faults under heavy load.
Fix: 6.5.4

------------------------------------------------

After applying NetBackup Release Update 6.5.3, nbjm.exe terminates unexpectedly.

See http://seer.entsupport.symantec.com/docs/316270.htm

SOLUTION

============================

Apply 6.5.4 to the master

NB_6.5.4.winnt.x86.exe
http://support.veritas.com/docs/326385

(clients do not need to be patched)

Mike_Gavrilov · ‎11-21-2009

I can't find technote but anyway it's a fastest way to exclude possible known issues. If you see some memory leaking it can be fixed only by binary upgrade and you'll get a up to date NB.

VOX

nbjm behaviour