cancel
Showing results for 
Search instead for 
Did you mean: 

Scheduled jobs are not starting

Dan_Giberson
Level 5

I've been working on a problem for a little over a month and have managed to even stump the fine folks at Symantec so far, so I though I would reach out to all of you and see if I can get this fixed.

Back in March, my scheduled jobs suddenly just stopped working.  They just don't start. If I restart my servers, or services, the jobs will queue up and run fine for approximately 24 hours, then not run again. I have approximately 150 policies that will show up if I type in:

nbpemreq -predict -date (next 24 hours)

If I do the same command the next day, there will only be 4 policies that show up. These policies have worked seemlessly throughout the problem, which doesn't make troubleshooting any easier. The policies that work are wide ranging, from a flat file backup, Exchange, and RMAN's. Some write to a Data Domain, some write direct to tape. New backup policies behave in the same manner. Manually starting a job works fine.

Master - W2K8 running NBU 7.5.0.3

Clients - vary between Windows and Redhat (both fail)

Any suggestions would be greatly appreciated. I will post any log files that you need.

1 ACCEPTED SOLUTION

Accepted Solutions

Dan_Giberson
Level 5

Ok....I think we have a fix...I hope. It turns out there was a wildcard (*) in a policy that caused NBPEM to try and calculate over 700,000 files into new streams which caused it to constantly hang. As such, I have now gone through and removed all wildcards from my policies. 

I will continue to monitor this through the weekend. Thanks everyone for all the suggestions.

View solution in original post

35 REPLIES 35

LucSkywalker195
Level 4
Certified

Do you see anything in your Windows system and application event logs that might indicate something going on with the server? Did you make or introduce any changes or patches last March?

Dan_Giberson
Level 5

Nope, I looked at that too.....the only work I was doing was troubleshooting SLP replication.

Marianne
Level 6
Partner    VIP    Accredited Certified

What type of scheduling are you using? Calendar or Frequency?

If Calendar - do the schedules span midnight?

Were any backups kicked of manually after backup window closed?

Dan_Giberson
Level 5

I only use Calendar based scheduling in my environment. Some of these jobs will span midnight, but not all. I can manually kick a backup off at any time and it will run perfectly fine.

Marianne
Level 6
Partner    VIP    Accredited Certified

The problem with manual backups is that it will affect scheduled backups. If a daily has run during the day, another backup will not be started that night.

LucSkywalker195
Level 4
Certified

When's the last time you rebooted your master?

Jaykullar
Level 5

I have see this previously, however in 7.1.0.4.

Can you pick a policy that does not auto run, delete your schedules from it. Then run nbpemreq -updatepolicies. The re-create your schedules in the same policy and again run nbpemreq -updatepolicies.

Then run nbpemreq -predict_all -date **/**/**

Let us know how that figures.

Dan_Giberson
Level 5

Luc, I have rebooted my master several times in the last couple of weeks. If I reboot, everything appears to work fine for the first 24 hours, then it halts again.

 

Jaykullar, tried that with the same result. The only jobs in the predict list are the one's that have never stopped working, but good suggestion. If I create a new policy, it's the same as well. Even if I copy from one of the policies that still work, it fails....

mph999
Level 6
Employee Accredited

" ... and have managed to even stump the fine folks at Symantec "

Yikes, this will be fun then ...

I doubt the type of job is relevant, can't see why pem would care.

Anyhows ...

in the pem log, for a given client/ policy you will see lines like this :

 

[PolicyClientTask::cancel] scheduling for policy <policy name>, client <client name>, has been abandoned because of image expiration, will recalculate
 
then, a bit later you should see :
 
[PolicyClientTask::run] policy <policy name>, client <client name>, schedule <schedule name> will be submitted for execution at <date /time> 
 
For a given policy/ client do you find that at some point, the "submitted for execution" lines suddenly stop ?
 
Cheers,
 
M

Jaykullar
Level 5

Sounds like a bit of problem there bud.

Have Symantec recommened MP5 at all?

I know this sounds silly, but are you using Java or Admin Console? Have you tried creating policies in both?

Dan_Giberson
Level 5

MPH, are you talking about PEM logs on the client itself? If not, then none of my logs for the last 6 days have a line that says "submitted for execution".

The one odd thing I have noticed in my troubleshooting is that I can't get a full out put from "nbpem subsystems screen all", it will fail on screen 1 which I found out is the Task / Job Factory. If I do a "nbpemreq subsystems screen 1" it just fills up about 2-3 log files and says it can't connect to nbpem, which I have attached a sample of for your ligh treading.

So far that's all I have to go on.....

 

Jaykullar, no one has suggested going up to MP5 yet, and I would prefer to avoid it if possible as i'm waiting for 7.6 to be released. However, if it will fix it, I might have to do it. And, I just use the Admin console.

Dan_Giberson
Level 5

Hey,

Just noticed this line, which exists for every failed policy...any thoughts?

Prediction data not available for WestNet_MKSC/srvmksq01 because schedule calculation is pending(PolicyClientTask.cpp:1325),34:PolicyClientTask::formatPrediction,1

Dyneshia
Level 6
Employee

There was an issue in 7.5.0.3 and nbpem, please see : The 7.5 New bullitan http://www.symantec.com/docs/TECH178334

The fix is in 7.5.0.4, and since you are going to patch up, you might as well go to 7.5.0.5

http://www.symantec.com/docs/TECH199269

(ET2838857) <<Fixed in 7.5.0.4>> NB_7.5.0.3_ET2838857_4.zip is an Emergency Engineering Binary (EEB) replacement for nbpem for NetBackup 7.5.0.3.
 http://www.symantec.com/docs/TECH192530

This EEB includes resolutions for the following issues:

(ET2836511) <<Fixed in 7.5.0.4>> After upgrading to NetBackup 7.5.0.3 virtual machine (VMware) backups run multiple times, eventually failing with status code 196 reported.
 http://www.symantec.com/docs/TECH192104

(ET2746518) <<Fixed in 7.5.0.4>> Calendar schedules will be run multiple times in the backup window in 7.5 if the Backup window spans midnight and the backup starts prior to midnight and finished on the next day.
 http://www.symantec.com/docs/TECH189216

(ET2836015) <<Fixed in 7.5.0.4>> Query base VMware Backup using Calendar schedule may fail with Status: 196 (client backup was not attempted because backup window closed)
 http://www.symantec.com/docs/TECH190338

Dan_Giberson
Level 5

I will be upgrading to 7.5.0.5 in hopes that it will fix it, however the issues that are listed are not quite the problem i'm having. It's not just policies that span midnight that are failing. I will keep everyone posted on the Symantec root cause if they find one.

Dyneshia
Level 6
Employee

7.5.0.5 includes a later version of nbpemm which I hope to resolves your issue.  I know the issues are not exact , but we did have numerous issues.  In addition, if support needs to escalate your case , backline will push back until you are at 7.5.0.5.  Please let us know how it goes cool

mph999
Level 6
Employee Accredited

There are no pem logs on the client, master only.

You'll need to process the logs for me, I haven't got access to a machine at the moment;

vxlogview -p 51216 -i 116 -d all -t 07:00:00

... for example, would give the last 7 hours of logs.

Thanks,

Martin

Dyneshia
Level 6
Employee

You could give the following a try.  ( you will interrupt backups )

1. shutdown NBU services

2. use bpps or Services to check all the NBU service including PBX service are not running.

3. delete or move to a safe directory  the below files:
C:\program files\veritas\netbackup\bin\bpsched.d\pempersist
C:\program files\veritas\netbackup\bin\bpsched.d\retirepersist
C:\program files\veritas\netbackup\bin\dbdbm.lock
C:\program files\veritas\netbackup\db\jobs\restart\*
C:\program files\veritas\netbackup\db\jobs\pempersist
C:\program files\veritas\netbackup\db\jobs\pempersist2
C:\program files\veritas\netbackup\var\TaoNotifSvc*.*
C:\program files\veritas\netbackup\db\failure_history\*

Rename the following files by adding ".old" to the end:

C:\program files\veritas\netbackup\var\nbproxy_jm.ior
C:\program files\veritas\netbackup\var\nbproxy_pem.ior
C:\program files\veritas\netbackup\var\nbproxy_pem_email.ior

 

4. startup NBU

5. Run the following command:
C:\program files\veritas\netbackup\bin\admincmd/nbrbutil -resetAll
C:\program files\veritas\netbackup\bin\admincmd\nbpemreq -updatepolicies
C:\program files\veritas\netbackup\bin\admincmd\nbpemreq -tables screen

For more info, please refer to the technote below:

http://www.symantec.com/docs/TECH62714

LucSkywalker195
Level 4
Certified

I know you likely know, but it's worth saying. Make you get a 100% successful catalog backup before you upgrade. :)

Jaykullar
Level 5

If support have not suggested an upgrade to MP5, then they are unware of any fixes for this in MP5. Is your case with backline?

I've had some very wired problems with NBU, taken time to resolve, but backline have always come good.