03-20-2016 06:14 AM
Greetings.
We have a EMC NAS Storage (Isilon) we use netbackup with ndmp policies to backup chosen directories and shares.
Today is the 20/3/16 and the activity monitor is showing active jobs for the 26/02/2016. looking at the detailed status I see:
26/02/2016 02:00:01 - Info nbjm(pid=4472) starting backup job (jobid=983051) for client ism, policy HomeDirectory, schedule Weekly
26/02/2016 02:00:01 - Info nbjm(pid=4472) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=983051, request id:{05861C9E-820B-4665-9CB8-39B47C6DD49D})
26/02/2016 02:00:01 - requesting resource stu_disk_media-app
26/02/2016 02:00:01 - requesting resource puredisk.NBU_CLIENT.MAXJOBS.ism
26/02/2016 02:00:01 - requesting resource puredisk.NBU_POLICY.MAXJOBS.HomeDirectory
26/02/2016 02:00:02 - granted resource puredisk.NBU_CLIENT.MAXJOBS.ism
26/02/2016 02:00:02 - granted resource puredisk.NBU_POLICY.MAXJOBS.HomeDirectory
26/02/2016 02:00:02 - granted resource MediaID=@aaaaY;DiskVolume=PureDiskVolume;DiskPool=dp_disk_media-app;Path=PureDiskVolume;StorageServer=media-app;MediaServer=media-app
26/02/2016 02:00:02 - granted resource stu_disk_media-app
26/02/2016 02:00:10 - estimated 134317882 Kbytes needed
26/02/2016 02:00:10 - Info nbjm(pid=4472) started backup (backupid=ism_1456444810) job for client ism, policy HomeDirectory, schedule Weekly on storage unit stu_disk_media-app
26/02/2016 02:00:12 - started process bpbrm (112727)
26/02/2016 02:00:14 - Info bpbrm(pid=112727) ism is the host to backup data from
26/02/2016 02:00:14 - Info bpbrm(pid=112727) reading file list for client
26/02/2016 02:00:14 - connecting
26/02/2016 02:00:15 - Info bpbrm(pid=112727) starting ndmpagent on client
26/02/2016 02:00:15 - connected; connect time: 0:00:01
26/02/2016 02:00:16 - Info ndmpagent(pid=112789) Backup started
26/02/2016 02:00:16 - Info bpbrm(pid=112727) bptm pid: 112808
26/02/2016 02:00:16 - Info bptm(pid=112808) start
26/02/2016 02:00:21 - Info bptm(pid=112808) using 30 data buffers
26/02/2016 02:00:21 - Info bptm(pid=112808) using 262144 data buffer size
26/02/2016 02:00:23 - Info bptm(pid=112808) start backup
26/02/2016 02:00:31 - begin writing
26/02/2016 02:00:34 - Info ndmpagent(pid=112789) 0 entries sent to bpdbm
26/02/2016 02:03:32 - Info ndmpagent(pid=112789) 5000 entries sent to bpdbm
26/02/2016 02:06:37 - Info ndmpagent(pid=112789) 10000 entries sent to bpdbm
26/02/2016 02:08:35 - Info ndmpagent(pid=112789) 15000 entries sent to bpdbm
26/02/2016 02:41:16 - Error bpbrm(pid=112727) socket read failed: errno = 62 - Timer expired
26/02/2016 03:11:17 - Error bpbrm(pid=112727) socket read failed: errno = 62 - Timer expired
26/02/2016 03:11:18 - Error bptm(pid=112808) media manager exiting because bpbrm is no longer active
I tried looking for the bpbrm logs (both on the media server and master) but didn't have any that far back (26/02/2016).
should I stop the backup? (I am still fairly new to this)
it is the first time this happened to me- how can I determine the cause and avoid having this happen again?
03-20-2016 08:31 AM
03-21-2016 12:58 AM
policy didn't start any new backup since 26 of Feb
03-21-2016 01:18 AM
03-21-2016 01:20 AM
This is normal working behavior, Netbackup will not start a new backup if it think one is already running.
Netbackup OpsCenter has a "long running backup jobs" report. You should have opscenter mail that report to you every day so you can monitor.
There can be a million reason as why the backup hanged, unless you have setup debugging the likelihood to find a root cause is very small.
03-21-2016 08:17 PM
There is "socket read failed: errno = 62 - Timer expired"
Increase the "Client read timeout" on the media server media-app to at least 3600 sec, and retry backup