Jobs are not "RETRYING"

Randy_Samora · ‎07-11-2006

On the Master Server, under Global Attributes, there is an option for "Schedule backup attempts:. I have my master configured for 4 tries per 12 hours. We had isssues with our tapes once and NB won't try a new tape until it gets 3 errors on a piece of media so I set my retries to 4 thinking if a job hit a bad tape, it still had a shot at completing successfully when NB finally grabbed a new tape.

Anyway...I had a job that failed last night but it only ran once; it never requeued or tried again. Is there another setting somewhere else that is causing the job to fail after one attempt? The job ran for almost 3 hours, backed up quite a bit of data and then failed with a Status 23, socket read failed. Are there certain failures that won't retry? I've seen this before and thought I'd ask the experts this time. They were out to lunch so I'll ask ya'll :)

Any ideas?

Stumpr2 · ‎07-11-2006

Check the "end time" and then check the schedule's window. If the window closed then the job won't retry.
The only failures that I know about that are not influenced by the retry parameter is error code 134Message was edited by:
Bob Stump

Randy_Samora · ‎07-11-2006

The job failed at 11:00 PM and the window doesn't close until 7:00 AM the following morning. Here's a little more info.

This is part of a Microsoft cluster backup. There's A LOT of data on the cluster and I have my "jobs per client" set to 2 because that's all a lot of my clients can handle. I needed to get more streams from this cluster in order to get all of the data off within the backup window so I created two policies; one for the physical node and one for the virtual node which allowed me to extract 4 streams at once from the same data. One job failed on the virtual node around the same time that the failure on the physical node occured. The virutal node failure requeud and kept going; the physical node failure quit after one try. I'm guessing, but maybe NetBackup thought the virtual node was the same server and got confused. If I go to client properties on the virtual node, it has the actuve physical node's name and there's no way to change that. So maybe NetBackup thought, "Hey, I already reran that client" and bailed out.

Just a thought.

Lance_Hoskins · ‎07-11-2006

What version of NBU are you running? Ever since we've upgraded our Master and Media servers to NBU6.0 we've been fighting this very problem. We actually have custom binaries for the NBPEM.exe and NBJM.exe and we still have the problem with clients failing and not restaring.

Do the jobs go into an "Incomplete" state or do they possibly go into a "Waiting for Retry" state while the parent job is sitting there in a failed state?

If these are your symptoms and you're at NBU 6.0, I'll apologize to you from Symantec from the bottom of my heart. :)

Randy_Samora · ‎07-11-2006

I wish it were that easy; i've read some of the posts regarding the issue and NB 6. I'm still NB 5.1 MP4. The jobs go to true failures, not incomplete. It is a rare occurence but I thought i'd ask in case it was an easy fix.

Lance_Hoskins · ‎07-11-2006

Have you tried re-creating the policy?

Stumpr2 · ‎07-11-2006

Do you have checkpoint restart enabled on the policy?

how about just posting the output from
bppllist -U

but please, clean it up and don't post the actual server names!

Randy_Samora · ‎07-12-2006

I'm not sure how much "cleanup" I could do but I did remove the server name.

------------------------------------------------------------

Policy Name: TITLESEARCH_II

Policy Type: MS-Windows-NT
Active: yes
Effective date: 06/09/2003 11:45:41
Backup network drvs: no
Collect TIR info: no
Mult. Data Streams: yes
Client Encrypt: no
Checkpoint: yes
Interval: 30
Policy Priority: 99
Max Jobs/Policy: Unlimited
Disaster Recovery: 0
Residence: ASP
Volume Pool: DataStore
Keyword: (none specified)

HW/OS/Client: PC Windows2000

Include: F:\
G:\
H:\
I:\
J:\

Schedule: Quarterly
Type: Full Backup
Maximum MPX: 4
Synthetic: 0
PFI Recovery: 0
Retention Level: 8 (1 year)
Number Copies: 1
Fail on Error: 0
Residence: (specific storage unit not required)
Volume Pool: (same as policy volume pool)
Calendar sched: Enabled
Allowed to retry after run day
SPECIFIC DATE 0 - 12/30/2005
SPECIFIC DATE 1 - 03/31/2006
SPECIFIC DATE 2 - 06/30/2006
SPECIFIC DATE 3 - 09/29/2006
SPECIFIC DATE 4 - 12/29/2006
Daily Windows:
Friday 20:00:00 --> Sunday 17:00:00

Schedule: Weekly
Type: Full Backup
Frequency: every 3 days
Maximum MPX: 4
Synthetic: 0
PFI Recovery: 0
Retention Level: 5 (13 weeks)
Number Copies: 1
Fail on Error: 0
Residence: (specific storage unit not required)
Volume Pool: (same as policy volume pool)
Daily Windows:
Friday 20:00:00 --> Sunday 17:00:00

Schedule: Daily
Type: Cumulative Incremental Backup
Frequency: every 14 hours
Maximum MPX: 4
Synthetic: 0
PFI Recovery: 0
Retention Level: 3 (28 days)
Number Copies: 1
Fail on Error: 0
Residence: (specific storage unit not required)
Volume Pool: (same as policy volume pool)
EXCLUDE DATE 0 - 06/08/2003
EXCLUDE DATE 1 - 06/14/2003
Daily Windows:
Sunday 20:00:00 --> Monday 07:00:00
Monday 20:00:00 --> Tuesday 07:00:00
Tuesday 20:00:00 --> Wednesday 07:00:00
Wednesday 20:00:00 --> Thursday 07:00:00
Thursday 20:00:00 --> Friday 07:00:00

VOX

Jobs are not "RETRYING"