Concerning the collection

RonCaplinger · ‎07-26-2013

Has anyone that upgraded to OCA 7.5.0.6 have an opinion, good or bad, on whether the upgrade is stable? I've got a problem with data collection on 7.5.0.5 showing partially connected, and some of the collectors seem to have stopped working over time. I'm expecting support to tell me to upgrade OCA to 7.5.0.6, but previous experience has shown they usually can't identify an etrack that references the problem I'm calling on, and there's no guarantee that the effort to upgrade will actually fix OCA.

tom_sprouse · ‎07-26-2013

Ron,

The behavior you are experiencing was seen in 7.5.0.5 and I have had several customer's upgrade to 7506.

I would recommend 7.5.0.6 - there are multiple improvements... and here are Technotes / Etracks for it...

(ET3103018) <<Fixed in 7.5.0.6>> OpsCenter NetBackup Data Collection appears to hang and Java.exe process consumes allocated memory
http://www.symantec.com/docs/TECH205951

(ET2898214) <<Fixed in 7.5.0.6>> OpsCenter causing high CPU utilization
http://www.symantec.com/docs/TECH204943

(ET3106719) <<Fixed in 7.5.0.6>> <<Fix Downloadable>> After upgrading a master server to NetBackup 7.5.0.5 (or applying NetBackup 5200/5220 Appliances 2.5.2 to an Appliance running as a master server), the Activity Monitor reports inconsistent information on job status for some jobs. Additionally, some jobs may not display in OpsCenter.
http://www.symantec.com/docs/TECH203521

--Tom

RonCaplinger · ‎07-26-2013

Awesome, thanks, Tom!

chirag_bhalodiy · ‎07-28-2013

Hi Ron,

Which collectors are failing in your case?

Can you mention collector name and last exception message?

Thanks in advance for your help,

Chirag.

RonCaplinger · ‎07-29-2013

I just cam ein this morning and data collection again shows "partially connected".

Data Type "Hold", error message "ErrorCode - 25 : cannot connect on socket ()". Is reporting "Failed" status.

Data Type "Index", error message "com.symantec.nbu.nom.scl.common.agent.exception.AgentException: IDL:Symantec/NetBackup/SL/NBSLOpException:1.0". Is reporting "Failed" status.

Data Type "SubJobs", no error message, just not reported, not reported, and not started.

Data Type "Storage Unit" and Data Type "Storage Unit Group" show last successful load and last run date of July 5.

Data Type "Images" also shows last successful load and last run date of July 5, BUT it also shows "Running" and has since July 5.

Data Type "Media Server" and Data Type "Service" show last successful load and last run date of July 23.

Data Type "Policy & Schedule" and Data Type "Host Properties" both show last successful load and last run date of July 26.

All other Data Types show today, July 29, as their last successful load and last run date.

RonCaplinger · ‎07-29-2013

Upgraded to version 7.5.0.6 a few moments ago. Will watch for further problems with data collection. I have noticed that the reports render quite a bit faster then they had been. At least one of the "features" that is "working as designed" is still reporting the wrong information, though.

RonCaplinger · ‎07-30-2013

Data collection was showing "partially connected" again this morning. Both "Hold" and "Index" data types still showing the previous messages mentioned above.

Just called in a case to Support.

MilanPalian-A · ‎09-04-2013

I had all three of the issues mentioned above and upgraded to 7.5.0.6. The upgrade is smooth and the product is improved, however we still have "partially connected" problems with appliances and the "high CPU" which actually maxes 6 cores permanently ... for now, support is unable to solve the problem. There seems to be an issue with Alerts and/or Appliances that cause the database to max out CPUs.

Ramazan_Cakir · ‎09-17-2013

I have still data collection issues with 7.5.0.6 and it gets worst. You can restart nbsl, OpsCenter, datacollection and hope that it somehow collects and i can say that i can not rely on data it is reporting actually.

It is annoying as datacollection shows 'Completed' which is not. Have opened a case since more or less 3 month and a solution is still not there as i think all OpsCenter developpers have been fired because of the 'unquality' of this product.

It is also annoying that Support wants to force to put some EEB in Netbackup 7.0.1 for to solve any nbsl issue where they don't know what it will fix. Sending logs which fills up the system with much pain and never get an answer about the analysis is the normal behaviour; and if the Support guy changes is the first thing they are asking -- please set verbose level to 6 !

I think Symantec hires somebody and puts them a 'Checklist' infront and they have to proceed on each call the same damn phrases, without a knowledge about the customer and his history.

So if Symantec says :

Symantec helps consumers and organizations secure and manage their information-driven world.

I would expect that they can manage their organization; but probably not with OpsCenter !

So waiting for a solution which needs more solution ......

fozz · ‎09-17-2013

One of our largest OpsCenter customers recently upgraded to 7.5.0.6 and all of their data collection issues went away.

Some of the things they had to do was tune nbsl and also the server.conf to preform better. They are a very large environment (almost 100K jobs per day), so they have 96GB of RAM on their OCA box with half of it dedicated to database cache.

So the key in this case was tuning nbsl on the master servers and the server.conf settings. If you haven't already done that, i would recommend you work with support on tuning those items.

Ramazan_Cakir · ‎09-18-2013

Thanks for this information.

Support did not told me to tune in nbsl; so i would be interested in it (no EEB please). Dou you mean server.conf in OpsCenter ? (have already tuned that with support )

Thanks

MilanPalian-A · ‎09-18-2013

There are quite a lot of possible entries in scl.conf in the form "nbu.scl..." that are for tuning, but you need support, as they are not published.

Ramazan_Cakir · ‎09-18-2013

Is also checked with Symantec, but i didn't hear since now how to config nbsl on Netbackup side.

sgt_why · ‎11-04-2013

Concerning the collection failure issues ... imho upgrading OpsCenter doesn't resolve the actual cause of the problems.

Sometime in the life of 7.x they removed the "collection agents" and its part of the binary on the master servers now.

In the process, they also hard coded in the checks for CLOUD collections ... ie. it cannot be removed and will fail repeatedly, until the master servers themselves are upgraded to 7.5.0.6 or upgraded to 7.5.0.5 and EEB applied.

Upgrading the OpsCenter server from 7.5.0.3 to 7.5.0.6 (in our case) made no difference. Collections would still halt and not restart until we bounced Ops Center daemons.

So, we retired the orginal Linux VM and created a fresh install under Windows of 7.5.0.6 ... I think it runs much better on a windows platform. Now, the cloud collections still fail ... but it doesn't cause all collections to halt, just cloud collections fail ... and it moves on.

But we have yet to get approval to upgrade our Master servers from 7.5.0.3 to 7.5.0.6 ... which is supposed to perm. fix the collector issues.

sgt_why · ‎11-04-2013

and RE: the max CPUs behavior ... check your server for (zombie) SQL queries ... if you get lots of hung SQL jobs, it can really slow down and muck things up. If so, try a reboot of the Ops Center server and re-run a query to see if it improves.

And make sure you max out the memory used by Ops Center as well ... don't leave the defaults, even in our environment with (2) master servers ... we increased it to startup up using 4GB of RAM (with 8GB max) and made a big improvement in performance.

MilanPalian-A · ‎11-05-2013

In our case, 7.5.0.6 seems to drop data collection after the daily purge. It still says "connected", but is not collecting. This is when services need to be bumped. It would be interesting to know whether anyone else is seeing this.

MilanPalian-A · ‎11-05-2013

Yes, we have this issue, a few hanging sql jobs block a CPU core at 100% CPU. We upgraded the hardware, so the offending jobs no longer affect other operations, until a solution is found. As there are never many of these at the same time, having enough CPU cores helps.

VOX

Opinion of OpsCenter Analytics 7.5.0.6