12-04-2014 12:35 PM
Have an older NB environment and backing up workstations. The jobs run, but over time, get suspended after about 10 hours. Need help finding out why and how to solve this.
Master server is only server and backing up to DataDomain appliance using OST.
Again, backing up workstations and the speeds are horrible and the backups do run for many hours if not days. I suspect it could be a timeout setting, but need help diagnosing it.
Thanks in advance!
Solved! Go to Solution.
12-11-2014 01:16 PM
You can check the master server using "crontab -l" to list all of the scheduled jobs. Google can help you find the syntax, but just locating the path/filenames would tell you which one might be suspending them.
If the jobs are being suspended on the clients, you would have to check each client's Scheduled Tasks.
12-04-2014 03:26 PM
NetBackup does not suspend backups based on Start Window. It is possibke that backups are suspended by command or any error. Please cgeck bperror output, bpdbjobs debug log and admin debug log for any clue.
12-04-2014 09:22 PM
I agree with Yasuhisa.
There is nothing in NBU that will automatically suspend jobs - someone is doing it manually.
To troubleshoot slow backups, you have to find where the bottleneck is:
- Slow read from client filesystem
- Poor network transfer rate
- Underpowered media server or lack of tuning parameters.
This doc will guide you on how to test each of these components:
Symantec NetBackup™ 7.0 - 7.1 Backup Planning and Performance Tuning Guide
http://www.symantec.com/docs/DOC4483
12-05-2014 06:58 AM
Thanks for the replies.
To clarify, for the active jobs, suspend is greyed out, so we cannot manually suspend jobs. I did notice that queued jobs can be though. Is there a timeout setting that will automatically suspend a queued job?
Also, this is a 6.5 environment, and I know it is no longer supported and I know they should upgrade as I am pushing for!
12-05-2014 07:07 AM
12-05-2014 07:38 AM
here is log snippet of a suspended job:
06:59:09.089 [19787] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
06:59:09.089 [19787] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpjobd
06:59:09.147 [19787] <2> job_connect: SO_KEEPALIVE set on socket 5 for client dctcnbup001
06:59:09.149 [19787] <2> logconnections: BPJOBD CONNECT FROM 53.233.6.156.58974 TO 53.233.6.156.13724
06:59:09.154 [19787] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
06:59:09.154 [19787] <2> job_connect: Connected to the host dctcnbup001 contype 11 jobid <0> socket <5>
06:59:09.154 [19787] <2> job_connect: Connected on port 58974
06:59:09.211 [19787] <2> job_monitoring_ex: Got JOBENDOFSUMMARY
06:59:09.221 [19787] <2> job_monitoring_exex: ACK disconnect
06:59:09.221 [19787] <2> job_disconnect: Disconnected
06:59:09.221 [19787] <4> main: 00:00:00
06:59:10.490 [19790] <4> main: NetBackup 6.5: 2010.04.24
06:59:10.490 [19790] <4> main: VERBOSE = 0
06:59:10.490 [19790] <4> main: switches
06:59:10.490 [19790] <4> cmnlogPARAMS: -suspend 7905302
06:59:10.490 [19790] <2> bpdbjobs: VERBOSE = 0
12-08-2014 06:34 PM
Please show us all text in Details tab of 'suspended' job.
12-09-2014 09:42 AM
Dec 8, 2014 4:45:12 PM - requesting resource TCEMDD03_WS_dctcnbup001
Dec 8, 2014 4:45:12 PM - requesting resource dctcnbup001.NBU_CLIENT.MAXJOBS.LH461KSR
Dec 8, 2014 4:45:12 PM - requesting resource dctcnbup001.NBU_POLICY.MAXJOBS.LINUX_Workstations_w
Dec 8, 2014 4:45:13 PM - awaiting resource TCEMDD03_WS_dctcnbup001.
Dec 8, 2014 4:45:40 PM - granted resource dctcnbup001.NBU_CLIENT.MAXJOBS.LH461KSR.
Dec 8, 2014 4:45:40 PM - granted resource dctcnbup001.NBU_POLICY.MAXJOBS.LINUX_Workstations_w
Dec 8, 2014 4:45:40 PM - granted resource MediaID=@aaaac;DiskVolume=tcemdd03_ws_dctcnbup001;DiskPool=tcemdd03_ws_dctcnbup001;Path=tcemdd03_ws_...
Dec 8, 2014 4:45:40 PM - granted resource TCEMDD03_WS_dctcnbup001
Dec 8, 2014 4:45:41 PM - estimated 0 kbytes needed
Dec 8, 2014 4:45:41 PM - started process bpbrm (pid=17420)
Dec 8, 2014 4:45:54 PM - Error bpbrm (pid=17420) from client LH461KSR: ERR - Error occurred during initialization. Could not read logging configuration file.
Dec 8, 2014 4:45:52 PM - connecting
Dec 8, 2014 4:45:54 PM - connected; connect time: 0:00:00
Dec 8, 2014 4:45:59 PM - begin writing
Dec 9, 2014 7:06:34 AM - end writing; write time: 14:20:35
suspend requested by administrator (157)
12-09-2014 07:53 PM
I have never seen this other than someone manually suspending the job.
Are you saying that checkpoint in policy LINUX_Workstations_w is not selected/enabled?
12-10-2014 06:59 AM
Checkpoints are enabled and set to 15 min.
12-10-2014 07:02 AM
It appears the job is running or queued then gets disconnected:
06:59:09.221 [19787] <2> job_monitoring_exex: ACK disconnect
06:59:09.221 [19787] <2> job_disconnect: Disconnected
then appears to suspend the job:
06:59:10.490 [19790] <4> cmnlogPARAMS: -suspend 7905302
I have never seen this either.
12-10-2014 07:04 AM
I stick to what I've said from the start. Someone is doing it.
Speak to your colleagues.
12-10-2014 09:14 AM
Based on the time you see above, I agree with Marianne. The job being suspended is happening around 7:00am, right when I might expect a person is coming into the office for the day.
12-11-2014 08:06 AM
Ok, so how about the possibility of a script that runs at 7am to suspend any running jobs? Any suggestions on how to find such a script? This master runs Sun Solaris 5.1.
12-11-2014 01:16 PM
You can check the master server using "crontab -l" to list all of the scheduled jobs. Google can help you find the syntax, but just locating the path/filenames would tell you which one might be suspending them.
If the jobs are being suspended on the clients, you would have to check each client's Scheduled Tasks.
12-11-2014 03:09 PM
Hmmm.. might have found it:
59 6 * * 1-5 /home/nbadmin/scripts/suspend_workstations_backups >/dev/null 2>&1
Does this mean suspend the backups at 6:59am M-F?
12-11-2014 08:29 PM
You found it!
Yes, the script that looks for active backups and suspend is kicked off by cron at 6:59 M-F.
Now to find the culprit...
You may also want to have a look at my 1st post above and troubleshoot slow backups.
A quick win is normally to break up Backup Selection into multiple streams.
Just 2 streams (as opposed to 1 stream) per client should halve the backup time.
12-12-2014 12:17 PM
Thanks all!