cancel
Showing results for 
Search instead for 
Did you mean: 

NDMP backups still active from a little under a month

gkman
Level 5

Greetings.

We  have a EMC NAS Storage (Isilon) we use netbackup with ndmp policies to backup chosen directories and shares. 

Today is the 20/3/16 and the activity monitor is showing active jobs for the 26/02/2016. looking at the detailed status I see: 

26/02/2016 02:00:01 - Info nbjm(pid=4472) starting backup job (jobid=983051) for client ism, policy HomeDirectory, schedule Weekly  
26/02/2016 02:00:01 - Info nbjm(pid=4472) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=983051, request id:{05861C9E-820B-4665-9CB8-39B47C6DD49D})  
26/02/2016 02:00:01 - requesting resource stu_disk_media-app
26/02/2016 02:00:01 - requesting resource puredisk.NBU_CLIENT.MAXJOBS.ism
26/02/2016 02:00:01 - requesting resource puredisk.NBU_POLICY.MAXJOBS.HomeDirectory
26/02/2016 02:00:02 - granted resource puredisk.NBU_CLIENT.MAXJOBS.ism
26/02/2016 02:00:02 - granted resource puredisk.NBU_POLICY.MAXJOBS.HomeDirectory
26/02/2016 02:00:02 - granted resource MediaID=@aaaaY;DiskVolume=PureDiskVolume;DiskPool=dp_disk_media-app;Path=PureDiskVolume;StorageServer=media-app;MediaServer=media-app
26/02/2016 02:00:02 - granted resource stu_disk_media-app
26/02/2016 02:00:10 - estimated 134317882 Kbytes needed
26/02/2016 02:00:10 - Info nbjm(pid=4472) started backup (backupid=ism_1456444810) job for client ism, policy HomeDirectory, schedule Weekly on storage unit stu_disk_media-app
26/02/2016 02:00:12 - started process bpbrm (112727)
26/02/2016 02:00:14 - Info bpbrm(pid=112727) ism is the host to backup data from     
26/02/2016 02:00:14 - Info bpbrm(pid=112727) reading file list for client        
26/02/2016 02:00:14 - connecting
26/02/2016 02:00:15 - Info bpbrm(pid=112727) starting ndmpagent on client         
26/02/2016 02:00:15 - connected; connect time: 0:00:01
26/02/2016 02:00:16 - Info ndmpagent(pid=112789) Backup started           
26/02/2016 02:00:16 - Info bpbrm(pid=112727) bptm pid: 112808          
26/02/2016 02:00:16 - Info bptm(pid=112808) start            
26/02/2016 02:00:21 - Info bptm(pid=112808) using 30 data buffers         
26/02/2016 02:00:21 - Info bptm(pid=112808) using 262144 data buffer size        
26/02/2016 02:00:23 - Info bptm(pid=112808) start backup           
26/02/2016 02:00:31 - begin writing
26/02/2016 02:00:34 - Info ndmpagent(pid=112789) 0 entries sent to bpdbm        
26/02/2016 02:03:32 - Info ndmpagent(pid=112789) 5000 entries sent to bpdbm        
26/02/2016 02:06:37 - Info ndmpagent(pid=112789) 10000 entries sent to bpdbm        
26/02/2016 02:08:35 - Info ndmpagent(pid=112789) 15000 entries sent to bpdbm        
26/02/2016 02:41:16 - Error bpbrm(pid=112727) socket read failed: errno = 62 - Timer expired    
26/02/2016 03:11:17 - Error bpbrm(pid=112727) socket read failed: errno = 62 - Timer expired    
26/02/2016 03:11:18 - Error bptm(pid=112808) media manager exiting because bpbrm is no longer active    

I tried looking for the bpbrm logs (both on the media server and master) but didn't have any that far back (26/02/2016).

should I stop the backup? (I am still fairly new to this)

it is the first time this happened to me- how can I determine the cause and avoid having this happen again?

5 REPLIES 5

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Yes. The backup has already failed on 26 Feb. It is very unusual that the backup is still showing Active in Activity Monitor. It should've failed with some network-related status code. What happened with other backups for this policy in the meantime?

gkman
Level 5

policy didn't start any new backup since 26 of Feb

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Probably because this backup is still showing Active. Please kill it and try a manual backup. Keep an eye on Job Details and ensure bpbrm and bptm log folders exist on the media server.

Nicolai
Moderator
Moderator
Partner    VIP   

This is normal working behavior, Netbackup will not start a new backup if it think one is already running.

Netbackup OpsCenter has a "long running backup jobs" report. You should have opscenter mail that report to you every day so you can monitor.

There can be a million reason as why the backup hanged, unless you have setup debugging the likelihood to find a root cause is very small.

watsons
Level 6

There is "socket read failed: errno = 62 - Timer expired"

Increase the "Client read timeout" on the media server media-app to at least 3600 sec, and retry backup