cancel
Showing results for 
Search instead for 
Did you mean: 

Big issue on the activity monitor with 7.5.0.5

Fabrice_P_
Level 4
Certified

Hello,

We installed 7.5.0.5 on last friday and since we have a lot of issues regarding the "Activity Monitor" which seems to be inconsistent. 

Basically, we have a lot of backup jobs (more than 200) that are either queued, active or "unknown". If we look more closely to the details, we can see that the jobs are actually finished.

  • We can see the info "done, status 0" but the job is stuck at the "validating image for client xxx".
  • The backup somehow successful because it is validated in the catalog and we can restore data from it.
  • OPSCenter do not see those jobs at all.

Also, if I check "Report/Problems" on the console we see a lot of "socket open failed" critical errors. It seems that some nbu process fail to communicate between each other.

We did not had any issue of this type before this 7.5.0.5 (we were in 7.5.0.4).

I logged a support case but it was more than 2 days ago and was supposed to be contacted 2h later.

...I don't have any answer yet.

 

Regards,

1 ACCEPTED SOLUTION

Accepted Solutions

CRZ
Level 6
Employee Accredited Certified

Wide-spread panic?  A small (and vocal) group of users posted about this issue here, in addition to opening Support cases.  Support and Engineering followed their processes (which, admittedly, never seem to go fast enough when YOU'RE the one experiencing the problem) and through those processes, we're close to being able to say that we know with certainty what this issue is, how it arises and how to resolve it.

Keep in mind that all the indications we've had thus far are that despite how they are appearing in the Activity Monitor, the backups in question SUCCEEDED.  No data has been lost.  NetBackup is not indicating that your systems are protected when they're not - in fact, you could interpret it as the exact opposite - we're indicating that your system may not be protected when in fact it is!  And, as has been said previously in this thread, a very very large majority of the folks who have applied 7.5.0.5 haven't seen this issue at all.  As problems go...well, we'd prefer to have zero problems, obviously, but this one - while aggravating to those experiencing it - is rather tame when you place it on the "catastrophe" scale.  I'm not trying to make any excuses, and I much would have preferred to say "7.5.0.5 is defect free!" but I think we all knew I was never going to be able to say that despite all our best efforts prior to this release.  What's the saying?  "Stuff happens."  This sure happened!  It MAY have also hit the fan as well.  We're working hard to fix it AND get you feeling better.

Of course, we will release a TechNote - but we can't do it until we have all of the information.  We want to be sure we've fixed it before we tell you we've fixed it!  We also want to be sure we've narrowed the scope as much as we can, because as has been noted, this only affected a small number of folks who have applied 7.5.0.5.  We want to be able to provide a specific set of conditions so you can quickly determine if you might be affected (or not!) and if you'd need to call us for the EEB.

When the TechNote is released, the Late Breaking News will also be updated to point to the document.  I'm absolutely sure someone will let you know in this thread as well.  I don't want to give any specific dates on WHEN that'll happen, but know that it will be as soon as humanly possible.

Finally, you've probably already figured this out, but we're not pulling 7.5.0.5 over this issue.  Some people may find they'll need to install an EEB after they upgrade to 7.5.0.5.  Some people may want to wait on a 7.5.0.5 upgrade until we announce the availability of that EEB.  Some folks may apply 7.5.0.5 anyway and discover that they're not affected.  (Most folks, I hope!)  Some folks may apply 7.5.0.5, find their Activity Monitor IS affected, and live with it until they can call us up for an EEB.  And some folks will do something completely different which I haven't even dreamed up, because that's how the world works.  :)

What I WILL say is the ONLY way to get a defect addressed is to open a Support case and get Symantec working on it.  You can post here - and posting here is a great way to see if other people are sharing your trouble, and if they ARE, to get more people aware there are some cases that need to be resolved - but if you don't have that Support case, Connect alone will probably not get you help if you're experiencing a well and true defect in NetBackup.  This is probably obvious to everybody, but I add it here because there are some folks who try to use Connect as a substitute for our technical support - and sometimes you can even get away with that, but the truth is while Connect is great for some stuff, it should only be a SUPPLEMENT to a true tech support case when you have a "real" issue.  Let us help you!

View solution in original post

50 REPLIES 50

Mark_Solutions
Level 6
Partner Accredited Certified

I cannot assist with the issue (but would like to be updated) but I would suggest phoning support back and asking to speak to the manager or asking for it to be "escalted" to get a faster response.

Keep us updated please

Fabrice_P_
Level 4
Certified

Sure I will. I also noticed we have same issue on the duplication (ost - optimized) jobs. Status is 50 ("client process aborted") but they seems to be successful as well...

Overall, 10% of our weekend jobs were affected.

angus_the_bull
Level 3
Certified

i upgraded to 7.5.0.5 last weekend and have managed to hit both issues, i have logged a call with support and i will update here when i get a fix

Fabrice_P_
Level 4
Certified

Me too I have an open case with the support. Maybe it's time to remove the download link ?

The issue is totally random, I did not get any problem on monday evening backups but yesterday I did.

angus_the_bull
Level 3
Certified

does anyone know of a fix yet ?, 3 days have passed with my support call and apart from me sending screen shots of the issues i have heard nothing.

Fabrice_P_
Level 4
Certified

Nothing yet, I'm just sending logs...

TimWillingham
Level 5

Seeing the same issue here after the upgrade to 7.5.0.5.

angus_the_bull
Level 3
Certified

i sent an email 1st thing friday morning asking for an update and symantec cant even be bothered to respond, why havent they pulled this software ? , anyone from symantec like to respond ?

Ankit_Maheshwar
Level 5

same issue with one of my upgraded server..

can we move backup to 7.5.0.4?

.

dthor
Level 3

I have been using NBU 7.5.0.5 for some time now can you send me a screen shot of your activity monitor of the problems you are seeing.  It seems I might have had the same issue and might be able to help.

Fabrice_P_
Level 4
Certified

They are still investigating on the logs files on my case. More than 250 affected jobs this weekend. This bug is annoying because it makes the whole platform almost unmanagable and it is very hard to be 100% sure that all the jobs were successfuly completed.

dthor, nothing particular to see on a screenshot, the issue is the jobs are in running or queued state but if you check the detail you will see that the job is actually completed (status 0) but is stuck at the validating client image" step. For duplication jobs, they exit with the status 50 ("client process aborted"). If you have the same issue, please open a ticket.

Marianne
Level 6
Partner    VIP    Accredited Certified

I suggest all of you exchange case numbers via PM and forward to your Symantec case engineers.

If all the relevant engineers start talking to one another, the escalation to back line may be expedited.

dthor
Level 3

Ok I just thought I might have had the same issue before and opened up a case I can look back at my cases and see if it is the same problem you are having

angus_the_bull
Level 3
Certified

i also have a 3rd issue with the monitor at 7.5.0.5 , i have a job that is finished but the GUI shows the job state as DONE but the type heading is empty as is the job policy, when you check the job details the client is blank

Fabrice_P_
Level 4
Certified

Yes, I have this issue too and some "unknown" jobs as well...

dthor
Level 3

This has been happening since 6.5.x here is what I have done:

clear up error 50's
1. stop NetBackup (netbackup stop)
2. bpps -a , kill the remaining processes
3. cd to the /usr/openv/netbackup/db/jobs
4. rm the bpjobd.act.db
5. cd to the restart folder
6. pwd to verify, rm all files
7. cd..\trylogs
8. pwd to verify, rm all files
9. cd..\ffilelogs
10. pwd to verify, rm all files
11. restart NetBackup (netbackup start)

I have also installed the windows admin console and instead of doing the above I am able to cancel the jobs there.

I belive this is a case I opened up a few years ago Support might be able to reference it  411-938-339

TimWillingham
Level 5

I opened a case with them last week and escalated it this morning.  Support has found nothing on the issue so far.  Make sure you are opening cases on the problems.

I discovered another issue in the process of troubleshooting which is minor but irritating nonetheless: You can no longer resize the details window for Disk Pools under Devices.

I upgraded my Java version to 7.5.0.5 as well just to see if it made a difference, but no luck.

dthor
Level 3

this is what support had me do:

1. stop NetBackup (netbackup stop)
2. bpps -a , kill the remaining processes
3. cd to the /usr/openv/netbackup/db/jobs
4. rm the bpjobd.act.db
5. cd to the restart folder
6. pwd to verify, rm all files
7. cd..\trylogs
8. pwd to verify, rm all files
9. cd..\ffilelogs
10. pwd to verify, rm all files
11. restart NetBackup (netbackup start)

I have found another way to do this without having to stop and start nbu is to install the Windows Administrator Console.  I have been able to clear these up

Jonathan_D_
Level 3

Hi, I have the same problem. any answers yet?

Can you send me your call ref# that you open with symantec, i will do a reference with mine

 

Thanks