cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup Policy Job not progressing and stuck

khaniqshahid
Level 2

Dear Team

As I am new to NBU I have isue where a policy which is stuck and not progressing from more that 15 hrs it is continuosly in ACTIVE state.

HARDWARE SOLARIS
VERSION NetBackup 7.0.1
RELEASEDATE Thu Jul 08 00:13:47 CDT 2010
BUILDNUMBER 2010070

SunOS  5.10 Generic_147440-12 sun4u sparc SUNW,SPARC-Enterprise

My policy is running for BMR backup for Solaris x86 server.

172216    Active    Image Cleanup        Mon Feb 25 10:21:18 CST 2013                            15109            1    Mounting                0    8798    root                Mon Feb 25 10:21:18 CST 2013    15109                                txslep18        
172215    Active    Backup        Sun Feb 24 23:07:38 CST 2013            txslep43-backup_FILES    Weekly_Full    txslep43-backup    txslep18    55529    EP18-DSU12        1    Connecting        1    /etc/dfs/sharetab    0    20304    root        172212        Sun Feb 24 23:07:40 CST 2013    55527                        Standard        txslep18    0    
172214    Active    Backup        Sun Feb 24 23:07:38 CST 2013            txslep43-backup_FILES    Weekly_Full    txslep43-backup    txslep18    55529    EP18-DSU12        1    Connecting        583    /devices/xsvc/    0    20298    root        172212        Sun Feb 24 23:07:39 CST 2013    55528                        Standard        txslep18    0    
172213    Active    Backup        Sun Feb 24 23:07:38 CST 2013            txslep43-backup_FILES    Weekly_Full    txslep43-backup    txslep18    55529    EP18-DSU12        1    Connecting        1    /    0    20291    root        172212        Sun Feb 24 23:07:38 CST 2013    55529                        Standard        txslep18    0    
172212    Active    Backup        Sun Feb 24 23:07:19 CST 2013            txslep43-backup_FILES    -    txslep43-backup    txslep18    55548    EP18-DSU12        1                    0        root        172212        Sun Feb 24 23:07:19 CST 2013    55548                        Standard        txslep18    0  

 

It is stuck ---

Feb 24, 2013 11:07:19 PM - requesting resource EP18-DSU12
Feb 24, 2013 11:07:19 PM - requesting resource txslep18.NBU_CLIENT.MAXJOBS.txslep43-backup
Feb 24, 2013 11:07:19 PM - requesting resource txslep18.NBU_POLICY.MAXJOBS.txslep43-backup_FILES
Feb 24, 2013 11:07:19 PM - granted resource  txslep18.NBU_CLIENT.MAXJOBS.txslep43-backup
Feb 24, 2013 11:07:19 PM - granted resource  txslep18.NBU_POLICY.MAXJOBS.txslep43-backup_FILES
Feb 24, 2013 11:07:19 PM - granted resource  MediaID=@aaaab;Path=/ep18-dsu12;MediaServer=txslep18
Feb 24, 2013 11:07:19 PM - granted resource  EP18-DSU12
Feb 24, 2013 11:07:19 PM - estimated 16470259 kbytes needed
Feb 24, 2013 11:07:19 PM - begin Parent Job
Feb 24, 2013 11:07:19 PM - begin Stream Discovery: Start Notify Script
Feb 24, 2013 11:07:19 PM - started process RUNCMD (pid=20201)
Feb 24, 2013 11:07:20 PM - ended process 0 (pid=20201)
Operation Status: 0
Feb 24, 2013 11:07:20 PM - end Stream Discovery: Start Notify Script; elapsed time 0:00:01
Feb 24, 2013 11:07:20 PM - begin Stream Discovery: Stream Discovery
Feb 24, 2013 11:07:22 PM - collecting BMR information
Feb 24, 2013 11:07:22 PM - connecting
Feb 24, 2013 11:07:22 PM - connected; connect time: 0:00:00
Feb 24, 2013 11:07:22 PM - transfering BMR information to the master server
Feb 24, 2013 11:07:22 PM - connecting
Feb 24, 2013 11:07:22 PM - connected; connect time: 0:00:00
Feb 24, 2013 11:07:20 PM - started process bpmount (pid=25647)
Operation Status: 0
Feb 24, 2013 11:07:20 PM - end Stream Discovery: Stream Discovery; elapsed time 0:00:00
Feb 24, 2013 11:07:20 PM - begin Stream Discovery: Bare Metal Restore Save
Feb 24, 2013 11:07:21 PM - started process bpbrm (pid=20263)
Feb 24, 2013 11:07:38 PM - BMR information transfer successful
Feb 24, 2013 11:07:38 PM - end writing
Operation Status: 0
Feb 24, 2013 11:07:38 PM - end Stream Discovery: Bare Metal Restore Save; elapsed time 0:00:18
Feb 24, 2013 11:07:38 PM - begin Stream Discovery: Policy Execution Manager Preprocessed

 

What Shall I do to check what actually is going on in background or are these process is in stale...

Guide me which log should i check more. Or Tell me how to clear all these and restart from fresh.

Note: all these policy backup were running fine and suddenly stopped from few days.

 

///K
 

 

6 REPLIES 6

khaniqshahid
Level 2

Guys

Any Idea ..

How I can kill process of jobs...

# /usr/openv/netbackup/bin/bpps -a

Gives me hundreds of process and when i am stopping netbackup and killing the jobs its giving me following error:

------------
#  /usr/openv/netbackup/bin/bpjobd -r 1195
Removing entry for the jobdid 1195
Requested ( for cleaning ) job ( 1195 ) wasn't found

Can I one guide me please ...

Mark_Solutions
Level 6
Partner Accredited Certified

The job you have posted is the parent job which deals with the BMR data collection.

The actual backup job will be seperate which is one or both of the other two jobs you list for that client.

Until that child job has completed the parent job will not complete.

When you have identified the related jobs it will have a job PID associated with it - this is the bpbrm process for the job which is running on the media server.

If you cannot cancel the job than kill off that PID, but you should investigate what is wrong with the child job (which is actually the backup itself) first

khaniqshahid
Level 2

The problem is when ever I cancel the job and restart it stuck at :

Feb 24, 2013 11:07:20 PM - end Stream Discovery: Stream Discovery; elapsed time 0:00:00
Feb 24, 2013 11:07:20 PM - begin Stream Discovery: Bare Metal Restore Save
Feb 24, 2013 11:07:21 PM - started process bpbrm (pid=20263)
Feb 24, 2013 11:07:38 PM - BMR information transfer successful
Feb 24, 2013 11:07:38 PM - end writing
Operation Status: 0
Feb 24, 2013 11:07:38 PM - end Stream Discovery: Bare Metal Restore Save; elapsed time 0:00:18
Feb 24, 2013 11:07:38 PM - begin Stream Discovery: Policy Execution Manager Preprocessed

Now it is not showing any progress of Operation Status... further.

When I check at /usr/openv/netbackup/bin/bpps -a

A huge list of process even of feb 21 are there and the latest one... I cant kill them as it gives error.

/usr/openv/netbackup/bin/bpjobd -r 1195
Removing entry for the jobdid 1195
Requested ( for cleaning ) job ( 1195 ) wasn't found

 

How I can Clean these hanging process and start from fresh ?

I tried following :

  • /opt/openv/netbackup/bin/goodies/netbackup stop
  • /opt/openv/netbackup/bin/bp.kill_all
  • Following error:

                  root@obms> /opt/openv/netbackup/bin/bp.kill_all

Looking for NetBackup processes that need to be terminated.

There may be backups and/or restores active.
Do you still want to terminate all processes? [y,n] (n) y
Killing bptm processes...
Killing bpdm processes...
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate

  • bpps -a

(verify all procs and they still available , almost 266 )
 

  • rm /usr/openv/netbackup/db/jobs/bpjobd.act.db
  • netbackup start

I guidance, it is urgent now for me... or should I give restart to whole server to remove this process.

regards

///K

Stumpr2
Level 6

I have had to go to media server and kill processes for an active job that is hung and which cannot be cancelled on the master server.

Mark_Solutions
Level 6
Partner Accredited Certified

As i said before the job you are looking at is a parent job.

It will have no further progress until its child job(s) have completed.

It is the child jobs that you need to locate - so look for what else is running for the same client - those jobs will have a PID which will relate to the bpbrm process ID on the media server running the job.

But in summary the job you list is not hanging, it is waiting for the completion notification from another job for the same client - it is those that you need to investigate

Moved:

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified