I've been working on a problem for a little over a month and have managed to even stump the fine folks at Symantec so far, so I though I would reach out to all of you and see if I can get this fixed.
Back in March, my scheduled jobs suddenly just stopped working. They just don't start. If I restart my servers, or services, the jobs will queue up and run fine for approximately 24 hours, then not run again. I have approximately 150 policies that will show up if I type in:
nbpemreq -predict -date (next 24 hours)
If I do the same command the next day, there will only be 4 policies that show up. These policies have worked seemlessly throughout the problem, which doesn't make troubleshooting any easier. The policies that work are wide ranging, from a flat file backup, Exchange, and RMAN's. Some write to a Data Domain, some write direct to tape. New backup policies behave in the same manner. Manually starting a job works fine.
Master - W2K8 running NBU 126.96.36.199
Clients - vary between Windows and Redhat (both fail)
Any suggestions would be greatly appreciated. I will post any log files that you need.
Solved! Go to Solution.
Ok....I think we have a fix...I hope. It turns out there was a wildcard (*) in a policy that caused NBPEM to try and calculate over 700,000 files into new streams which caused it to constantly hang. As such, I have now gone through and removed all wildcards from my policies.
I will continue to monitor this through the weekend. Thanks everyone for all the suggestions.
I have see this previously, however in 188.8.131.52.
Can you pick a policy that does not auto run, delete your schedules from it. Then run nbpemreq -updatepolicies. The re-create your schedules in the same policy and again run nbpemreq -updatepolicies.
Then run nbpemreq -predict_all -date **/**/**
Let us know how that figures.
Luc, I have rebooted my master several times in the last couple of weeks. If I reboot, everything appears to work fine for the first 24 hours, then it halts again.
Jaykullar, tried that with the same result. The only jobs in the predict list are the one's that have never stopped working, but good suggestion. If I create a new policy, it's the same as well. Even if I copy from one of the policies that still work, it fails....
" ... and have managed to even stump the fine folks at Symantec "
Yikes, this will be fun then ...
I doubt the type of job is relevant, can't see why pem would care.
in the pem log, for a given client/ policy you will see lines like this :
MPH, are you talking about PEM logs on the client itself? If not, then none of my logs for the last 6 days have a line that says "submitted for execution".
The one odd thing I have noticed in my troubleshooting is that I can't get a full out put from "nbpem subsystems screen all", it will fail on screen 1 which I found out is the Task / Job Factory. If I do a "nbpemreq subsystems screen 1" it just fills up about 2-3 log files and says it can't connect to nbpem, which I have attached a sample of for your ligh treading.
So far that's all I have to go on.....
Jaykullar, no one has suggested going up to MP5 yet, and I would prefer to avoid it if possible as i'm waiting for 7.6 to be released. However, if it will fix it, I might have to do it. And, I just use the Admin console.
Just noticed this line, which exists for every failed policy...any thoughts?
Prediction data not available for WestNet_MKSC/srvmksq01 because schedule calculation is pending(PolicyClientTask.cpp:1325),34:PolicyClientTask::formatPrediction,1
There was an issue in 184.108.40.206 and nbpem, please see : The 7.5 New bullitan http://www.symantec.com/docs/TECH178334
The fix is in 220.127.116.11, and since you are going to patch up, you might as well go to 18.104.22.168
(ET2838857) <<Fixed in 22.214.171.124>> NB_126.96.36.199_ET2838857_4.zip is an Emergency Engineering Binary (EEB) replacement for nbpem for NetBackup 188.8.131.52.
This EEB includes resolutions for the following issues:
(ET2836511) <<Fixed in 184.108.40.206>> After upgrading to NetBackup 220.127.116.11 virtual machine (VMware) backups run multiple times, eventually failing with status code 196 reported.
(ET2746518) <<Fixed in 18.104.22.168>> Calendar schedules will be run multiple times in the backup window in 7.5 if the Backup window spans midnight and the backup starts prior to midnight and finished on the next day.
(ET2836015) <<Fixed in 22.214.171.124>> Query base VMware Backup using Calendar schedule may fail with Status: 196 (client backup was not attempted because backup window closed)
I will be upgrading to 126.96.36.199 in hopes that it will fix it, however the issues that are listed are not quite the problem i'm having. It's not just policies that span midnight that are failing. I will keep everyone posted on the Symantec root cause if they find one.
188.8.131.52 includes a later version of nbpemm which I hope to resolves your issue. I know the issues are not exact , but we did have numerous issues. In addition, if support needs to escalate your case , backline will push back until you are at 184.108.40.206. Please let us know how it goes
There are no pem logs on the client, master only.
You'll need to process the logs for me, I haven't got access to a machine at the moment;
vxlogview -p 51216 -i 116 -d all -t 07:00:00
... for example, would give the last 7 hours of logs.
You could give the following a try. ( you will interrupt backups )
1. shutdown NBU services
2. use bpps or Services to check all the NBU service including PBX service are not running.
3. delete or move to a safe directory the below files:
Rename the following files by adding ".old" to the end:
4. startup NBU
5. Run the following command:
C:\program files\veritas\netbackup\bin\admincmd/nbrbutil -resetAll
C:\program files\veritas\netbackup\bin\admincmd\nbpemreq -updatepolicies
C:\program files\veritas\netbackup\bin\admincmd\nbpemreq -tables screen
For more info, please refer to the technote below:
If support have not suggested an upgrade to MP5, then they are unware of any fixes for this in MP5. Is your case with backline?
I've had some very wired problems with NBU, taken time to resolve, but backline have always come good.