02-25-2013 12:38 PM
Dear Team
As I am new to NBU I have isue where a policy which is stuck and not progressing from more that 15 hrs it is continuosly in ACTIVE state.
HARDWARE SOLARIS
VERSION NetBackup 7.0.1
RELEASEDATE Thu Jul 08 00:13:47 CDT 2010
BUILDNUMBER 2010070
SunOS 5.10 Generic_147440-12 sun4u sparc SUNW,SPARC-Enterprise
My policy is running for BMR backup for Solaris x86 server.
172216 Active Image Cleanup Mon Feb 25 10:21:18 CST 2013 15109 1 Mounting 0 8798 root Mon Feb 25 10:21:18 CST 2013 15109 txslep18
172215 Active Backup Sun Feb 24 23:07:38 CST 2013 txslep43-backup_FILES Weekly_Full txslep43-backup txslep18 55529 EP18-DSU12 1 Connecting 1 /etc/dfs/sharetab 0 20304 root 172212 Sun Feb 24 23:07:40 CST 2013 55527 Standard txslep18 0
172214 Active Backup Sun Feb 24 23:07:38 CST 2013 txslep43-backup_FILES Weekly_Full txslep43-backup txslep18 55529 EP18-DSU12 1 Connecting 583 /devices/xsvc/ 0 20298 root 172212 Sun Feb 24 23:07:39 CST 2013 55528 Standard txslep18 0
172213 Active Backup Sun Feb 24 23:07:38 CST 2013 txslep43-backup_FILES Weekly_Full txslep43-backup txslep18 55529 EP18-DSU12 1 Connecting 1 / 0 20291 root 172212 Sun Feb 24 23:07:38 CST 2013 55529 Standard txslep18 0
172212 Active Backup Sun Feb 24 23:07:19 CST 2013 txslep43-backup_FILES - txslep43-backup txslep18 55548 EP18-DSU12 1 0 root 172212 Sun Feb 24 23:07:19 CST 2013 55548 Standard txslep18 0
It is stuck ---
Feb 24, 2013 11:07:19 PM - requesting resource EP18-DSU12
Feb 24, 2013 11:07:19 PM - requesting resource txslep18.NBU_CLIENT.MAXJOBS.txslep43-backup
Feb 24, 2013 11:07:19 PM - requesting resource txslep18.NBU_POLICY.MAXJOBS.txslep43-backup_FILES
Feb 24, 2013 11:07:19 PM - granted resource txslep18.NBU_CLIENT.MAXJOBS.txslep43-backup
Feb 24, 2013 11:07:19 PM - granted resource txslep18.NBU_POLICY.MAXJOBS.txslep43-backup_FILES
Feb 24, 2013 11:07:19 PM - granted resource MediaID=@aaaab;Path=/ep18-dsu12;MediaServer=txslep18
Feb 24, 2013 11:07:19 PM - granted resource EP18-DSU12
Feb 24, 2013 11:07:19 PM - estimated 16470259 kbytes needed
Feb 24, 2013 11:07:19 PM - begin Parent Job
Feb 24, 2013 11:07:19 PM - begin Stream Discovery: Start Notify Script
Feb 24, 2013 11:07:19 PM - started process RUNCMD (pid=20201)
Feb 24, 2013 11:07:20 PM - ended process 0 (pid=20201)
Operation Status: 0
Feb 24, 2013 11:07:20 PM - end Stream Discovery: Start Notify Script; elapsed time 0:00:01
Feb 24, 2013 11:07:20 PM - begin Stream Discovery: Stream Discovery
Feb 24, 2013 11:07:22 PM - collecting BMR information
Feb 24, 2013 11:07:22 PM - connecting
Feb 24, 2013 11:07:22 PM - connected; connect time: 0:00:00
Feb 24, 2013 11:07:22 PM - transfering BMR information to the master server
Feb 24, 2013 11:07:22 PM - connecting
Feb 24, 2013 11:07:22 PM - connected; connect time: 0:00:00
Feb 24, 2013 11:07:20 PM - started process bpmount (pid=25647)
Operation Status: 0
Feb 24, 2013 11:07:20 PM - end Stream Discovery: Stream Discovery; elapsed time 0:00:00
Feb 24, 2013 11:07:20 PM - begin Stream Discovery: Bare Metal Restore Save
Feb 24, 2013 11:07:21 PM - started process bpbrm (pid=20263)
Feb 24, 2013 11:07:38 PM - BMR information transfer successful
Feb 24, 2013 11:07:38 PM - end writing
Operation Status: 0
Feb 24, 2013 11:07:38 PM - end Stream Discovery: Bare Metal Restore Save; elapsed time 0:00:18
Feb 24, 2013 11:07:38 PM - begin Stream Discovery: Policy Execution Manager Preprocessed
What Shall I do to check what actually is going on in background or are these process is in stale...
Guide me which log should i check more. Or Tell me how to clear all these and restart from fresh.
Note: all these policy backup were running fine and suddenly stopped from few days.
///K
02-25-2013 06:30 PM
Guys
Any Idea ..
How I can kill process of jobs...
# /usr/openv/netbackup/bin/bpps -a
Gives me hundreds of process and when i am stopping netbackup and killing the jobs its giving me following error:
------------
# /usr/openv/netbackup/bin/bpjobd -r 1195
Removing entry for the jobdid 1195
Requested ( for cleaning ) job ( 1195 ) wasn't found
Can I one guide me please ...
02-26-2013 02:03 AM
The job you have posted is the parent job which deals with the BMR data collection.
The actual backup job will be seperate which is one or both of the other two jobs you list for that client.
Until that child job has completed the parent job will not complete.
When you have identified the related jobs it will have a job PID associated with it - this is the bpbrm process for the job which is running on the media server.
If you cannot cancel the job than kill off that PID, but you should investigate what is wrong with the child job (which is actually the backup itself) first
02-26-2013 10:26 AM
The problem is when ever I cancel the job and restart it stuck at :
Feb 24, 2013 11:07:20 PM - end Stream Discovery: Stream Discovery; elapsed time 0:00:00
Feb 24, 2013 11:07:20 PM - begin Stream Discovery: Bare Metal Restore Save
Feb 24, 2013 11:07:21 PM - started process bpbrm (pid=20263)
Feb 24, 2013 11:07:38 PM - BMR information transfer successful
Feb 24, 2013 11:07:38 PM - end writing
Operation Status: 0
Feb 24, 2013 11:07:38 PM - end Stream Discovery: Bare Metal Restore Save; elapsed time 0:00:18
Feb 24, 2013 11:07:38 PM - begin Stream Discovery: Policy Execution Manager Preprocessed
Now it is not showing any progress of Operation Status... further.
When I check at /usr/openv/netbackup/bin/bpps -a
A huge list of process even of feb 21 are there and the latest one... I cant kill them as it gives error.
/usr/openv/netbackup/bin/bpjobd -r 1195
Removing entry for the jobdid 1195
Requested ( for cleaning ) job ( 1195 ) wasn't found
How I can Clean these hanging process and start from fresh ?
I tried following :
root@obms> /opt/openv/netbackup/bin/bp.kill_all
Looking for NetBackup processes that need to be terminated.
There may be backups and/or restores active.
Do you still want to terminate all processes? [y,n] (n) y
Killing bptm processes...
Killing bpdm processes...
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
Waiting for active bptm processes to terminate
Waiting for active bpdm processes to terminate
(verify all procs and they still available , almost 266 )
I guidance, it is urgent now for me... or should I give restart to whole server to remove this process.
regards
///K
02-26-2013 02:18 PM
I have had to go to media server and kill processes for an active job that is hung and which cannot be cancelled on the master server.
02-27-2013 01:26 AM
As i said before the job you are looking at is a parent job.
It will have no further progress until its child job(s) have completed.
It is the child jobs that you need to locate - so look for what else is running for the same client - those jobs will have a PID which will relate to the bpbrm process ID on the media server running the job.
But in summary the job you list is not hanging, it is waiting for the completion notification from another job for the same client - it is those that you need to investigate
05-01-2017 08:31 AM