cancel
Showing results for 
Search instead for 
Did you mean: 

Backing up workstations and jobs are being suspended

hanskniep
Level 4

Have an older NB environment and backing up workstations.  The jobs run, but over time, get suspended after about 10 hours.  Need help finding out why and how to solve this.

Master server is only server and backing up to DataDomain appliance using OST.

Again, backing up workstations and the speeds are horrible and the backups do run for many hours if not days.  I suspect it could be a timeout setting, but need help diagnosing it.

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

RonCaplinger
Level 6

You can check the master server using "crontab -l" to list all of the scheduled jobs.  Google can help you find the syntax, but just locating the path/filenames would tell you which one might be suspending them.

If the jobs are being suspended on the clients, you would have to check each client's Scheduled Tasks.

View solution in original post

17 REPLIES 17

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

NetBackup does not suspend backups based on Start Window. It is possibke that backups are suspended by command or any error. Please cgeck bperror output, bpdbjobs debug log and admin debug log for any clue.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I agree with Yasuhisa. 

There is nothing in NBU that will automatically suspend jobs - someone is doing it manually.

To troubleshoot slow backups, you have to find where the bottleneck is:
- Slow read from client filesystem
- Poor network transfer rate
- Underpowered media server or lack of tuning parameters.

This doc will guide you on how to test each of these components:

Symantec NetBackup™ 7.0 - 7.1 Backup Planning and Performance Tuning Guide 
http://www.symantec.com/docs/DOC4483

 

hanskniep
Level 4

Thanks for the replies.

To clarify, for the active jobs, suspend is greyed out, so we cannot manually suspend jobs.  I did notice that queued jobs can be though.  Is there a timeout setting that will automatically suspend a queued job?

Also, this is a 6.5 environment, and I know it is no longer supported and I know they should upgrade as I am pushing for!

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Queued jobs will fail with status 196 when the backup window closes, not suspend it. There has never been an automatic suspend in NBU. Suspend on active jobs will be greyed out when checkpoints are not selected in backup policies.

hanskniep
Level 4

here is log snippet of a suspended job:

06:59:09.089 [19787] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
06:59:09.089 [19787] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpjobd
06:59:09.147 [19787] <2> job_connect: SO_KEEPALIVE set on socket 5 for client dctcnbup001
06:59:09.149 [19787] <2> logconnections: BPJOBD CONNECT FROM 53.233.6.156.58974 TO 53.233.6.156.13724
06:59:09.154 [19787] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
06:59:09.154 [19787] <2> job_connect: Connected to the host dctcnbup001 contype 11 jobid <0> socket <5>
06:59:09.154 [19787] <2> job_connect: Connected on port 58974
06:59:09.211 [19787] <2> job_monitoring_ex: Got JOBENDOFSUMMARY
06:59:09.221 [19787] <2> job_monitoring_exex: ACK disconnect
06:59:09.221 [19787] <2> job_disconnect: Disconnected
06:59:09.221 [19787] <4> main:  00:00:00
06:59:10.490 [19790] <4> main: NetBackup 6.5:  2010.04.24
06:59:10.490 [19790] <4> main: VERBOSE = 0
06:59:10.490 [19790] <4> main: switches
06:59:10.490 [19790] <4> cmnlogPARAMS: -suspend 7905302
06:59:10.490 [19790] <2> bpdbjobs: VERBOSE = 0
 

 

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Please show us all text in Details tab of 'suspended' job.

hanskniep
Level 4

Dec 8, 2014 4:45:12 PM - requesting resource TCEMDD03_WS_dctcnbup001
Dec 8, 2014 4:45:12 PM - requesting resource dctcnbup001.NBU_CLIENT.MAXJOBS.LH461KSR
Dec 8, 2014 4:45:12 PM - requesting resource dctcnbup001.NBU_POLICY.MAXJOBS.LINUX_Workstations_w
Dec 8, 2014 4:45:13 PM - awaiting resource TCEMDD03_WS_dctcnbup001.
Dec 8, 2014 4:45:40 PM - granted resource  dctcnbup001.NBU_CLIENT.MAXJOBS.LH461KSR.
Dec 8, 2014 4:45:40 PM - granted resource  dctcnbup001.NBU_POLICY.MAXJOBS.LINUX_Workstations_w
Dec 8, 2014 4:45:40 PM - granted resource  MediaID=@aaaac;DiskVolume=tcemdd03_ws_dctcnbup001;DiskPool=tcemdd03_ws_dctcnbup001;Path=tcemdd03_ws_...
Dec 8, 2014 4:45:40 PM - granted resource  TCEMDD03_WS_dctcnbup001
Dec 8, 2014 4:45:41 PM - estimated 0 kbytes needed
Dec 8, 2014 4:45:41 PM - started process bpbrm (pid=17420)
Dec 8, 2014 4:45:54 PM - Error bpbrm (pid=17420) from client LH461KSR: ERR - Error occurred during initialization.  Could not read logging configuration file.
Dec 8, 2014 4:45:52 PM - connecting
Dec 8, 2014 4:45:54 PM - connected; connect time: 0:00:00
Dec 8, 2014 4:45:59 PM - begin writing
Dec 9, 2014 7:06:34 AM - end writing; write time: 14:20:35
suspend requested by administrator  (157)
 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I have never seen this other than someone manually suspending the job.

Are you saying that checkpoint in policy LINUX_Workstations_w is not selected/enabled?

hanskniep
Level 4

Checkpoints are enabled and set to 15 min.

hanskniep
Level 4

It appears the job is running or queued then gets disconnected:

06:59:09.221 [19787] <2> job_monitoring_exex: ACK disconnect
06:59:09.221 [19787] <2> job_disconnect: Disconnected

then appears to suspend the job:

06:59:10.490 [19790] <4> cmnlogPARAMS: -suspend 7905302

I have never seen this either.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I stick to what I've said from the start. Someone is doing it.
Speak to your colleagues.

RonCaplinger
Level 6

Based on the time you see above, I agree with Marianne.  The job being suspended is happening around 7:00am, right when I might expect a person is coming into the office for the day.

hanskniep
Level 4

Ok, so how about the possibility of a script that runs at 7am to suspend any running jobs?  Any suggestions on how to find such a script?  This master runs Sun Solaris 5.1.

RonCaplinger
Level 6

You can check the master server using "crontab -l" to list all of the scheduled jobs.  Google can help you find the syntax, but just locating the path/filenames would tell you which one might be suspending them.

If the jobs are being suspended on the clients, you would have to check each client's Scheduled Tasks.

hanskniep
Level 4

Hmmm.. might have found it:

 

59 6 * * 1-5 /home/nbadmin/scripts/suspend_workstations_backups >/dev/null 2>&1

 

Does this mean suspend the backups at 6:59am M-F?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

You found it! 

Yes, the script that looks for active backups and suspend is kicked off by cron at 6:59 M-F.

Now to find the culprit...

You may also want to have a look at my 1st post above and troubleshoot slow backups.
A quick win is normally to break up Backup Selection into multiple streams.
Just 2 streams (as opposed to 1 stream) per client should halve the backup time.

hanskniep
Level 4

Thanks all!