cancel
Showing results for 
Search instead for 
Did you mean: 

Big issue on the activity monitor with 7.5.0.5

Fabrice_P_
Level 4
Certified

Hello,

We installed 7.5.0.5 on last friday and since we have a lot of issues regarding the "Activity Monitor" which seems to be inconsistent. 

Basically, we have a lot of backup jobs (more than 200) that are either queued, active or "unknown". If we look more closely to the details, we can see that the jobs are actually finished.

  • We can see the info "done, status 0" but the job is stuck at the "validating image for client xxx".
  • The backup somehow successful because it is validated in the catalog and we can restore data from it.
  • OPSCenter do not see those jobs at all.

Also, if I check "Report/Problems" on the console we see a lot of "socket open failed" critical errors. It seems that some nbu process fail to communicate between each other.

We did not had any issue of this type before this 7.5.0.5 (we were in 7.5.0.4).

I logged a support case but it was more than 2 days ago and was supposed to be contacted 2h later.

...I don't have any answer yet.

 

Regards,

50 REPLIES 50

TimWillingham
Level 5

That clears up existing jobs, but does it prevent future occurances?

Mark_Solutions
Level 6
Partner Accredited Certified

Interesting comment by dthor ...

So are you saying that this is simply a Java Console issue rather than a NetBackup issue? (i.e. the Java Console is not receiving all updates but the Windows ones does?)

If that is the case then that would not be quite so bad!

Does anyone see this issue that is using the Windows Admin Console?

Thanks

Fabrice_P_
Level 4
Certified

The solution proposed by the support to dthor only cure the effect, not the root cause. I bet the issue will return on next backup run ! I'm using the Windows console *only* and I ran into those issues since day one.

My findings so far is it seems to be an in-process communication issue during the nbu activity spike (basically when the backup window kicks). At this particular time I get a lot of "socket open failed" on the report/errors console log.

Mark_Solutions
Level 6
Partner Accredited Certified

Fabrice .. thanks for that .. exactly what i needed to know .. so it really is a "NetBackup" issue with processes.

Have they asked you to do a netstat -a? Just wondering if this version uses a lot more ports for what ever reason and the system is just running out.

I already greatly increase the number of ports on a Master anyway when i do installations (At least on Windows Masters) but wondering if it has increased even more?

dthor
Level 3

I opened up a case about this 4/13/2010 411-938-339... It has not stopped the occurrance from happening again however I know I also opened up another case on this regard after upgrading to 7.5.0.3 but cannot not find it.  I know they gave me the same answer as before.  Let me ask my co worker if he remembers anything about this.

I don't know if this is a Java problem or not but I am not able to cancel them thru the GUI I have been able to cancel them using the Windows Console. 

 

Fabrice_P_
Level 4
Certified

FYI my case has been internaly escalated and I get the confirmation that similar situations have been encountered with 7.5.0.5. The whole thing is currently been reviewed by the engineering.

Stay tuned (and away from 7.5.0.5 for the time being) !

Mark, I was also wondering about an issue with tcp ports but I think in that case we would probably see other and more serious side-effects (jobs failed or going in timeout etc..).

Mark_Solutions
Level 6
Partner Accredited Certified

Fabrice .. thanks for that .. keep us updated .. wonder if a memory leak has crept in again?

I think my money is on nbjm but it could be pxb_exchange, nbjm or nbsl

Is any process using unusually large amounts of memory (or CPU)?

Interesting to see what the file versions say on those files  - will take a look to see which ones were replaced in 7.5.0.5 and which versions they are

Omar_Villa
Level 6
Employee

Regarding the Socket Errors when the backup window start have you try to raise the File Descriptors maybe double them and see if this goes away? NBU 75 uses PBX more than the past versions so is probable that on 7505 the demand for File Descriptors is higher.

 

Just a thought.

Regards.

TimWillingham
Level 5

Symantec recommends a minimum of 8000.  Our master was set to 65535:

[root]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 399360
max locked memory       (kbytes, -l) 3145728
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 32768
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16384
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
 

Do you think that could still be too low?

Fabrice_P_
Level 4
Certified

How can I increase that File Descriptors limit number on my NBU Windows 2008 master server ?

I saw this but not sure if it apply : http://www.ehow.com/how_6103386_increase-file-handles.html

 

Dip
Level 4

Thank you this forum....  We were looking to upgrade from 7.1.0.4 to 7.5.0.5 in a week. But now will wait and watch......

RDG
Level 3
Employee Accredited

Symantec is currently investigating the problem with high urgency. We will keep you posted.

Fabrice_P_
Level 4
Certified

I think the link to the download should be removed...

fdassonville
Level 4
Partner Accredited

I  have the same problem in my netbackup infrastructure

TimWillingham
Level 5

Got a call from support moments ago confirming engineering was working on this.  The good news is backups are working, you just can't tell it.

Get ready for 7.5.0.5.1 or 7.5.0.5a!!

TimWillingham
Level 5

Etrack Et3106719 has been created to address this issue.

Fabrice_P_
Level 4
Certified

Some update: I'm currently testing a EEB with the support to fix the issue. I will keep you informed.

TimWillingham
Level 5

I installed the EEB last night also and AM looks normal this morning after a full slate of backups.  Symantec owned this problem!  Great job!

Fabrice_P_
Level 4
Certified

Good to know ! Full weekend backups will be the real test !

RDG
Level 3
Employee Accredited

We have found an issue which causes our activity monitor to misreport success of some backups. We are testing a fix and will be releasing it shortly. The backups are successful and usable but the activity monitor reports them as not completing. We have a very small number of customers who have reported this issue and  a lot of  customers who have successfully installed 7505. Thank you for contributing to this forum and thank you for your patience.