Has anyone that upgraded to OCA 126.96.36.199 have an opinion, good or bad, on whether the upgrade is stable? I've got a problem with data collection on 188.8.131.52 showing partially connected, and some of the collectors seem to have stopped working over time. I'm expecting support to tell me to upgrade OCA to 184.108.40.206, but previous experience has shown they usually can't identify an etrack that references the problem I'm calling on, and there's no guarantee that the effort to upgrade will actually fix OCA.
The behavior you are experiencing was seen in 220.127.116.11 and I have had several customer's upgrade to 7506.
I would recommend 18.104.22.168 - there are multiple improvements... and here are Technotes / Etracks for it...
I just cam ein this morning and data collection again shows "partially connected".
Data Type "Hold", error message "ErrorCode - 25 : cannot connect on socket ()". Is reporting "Failed" status.
Data Type "Index", error message "com.symantec.nbu.nom.scl.common.agent.exception.AgentException: IDL:Symantec/NetBackup/SL/NBSLOpException:1.0". Is reporting "Failed" status.
Data Type "SubJobs", no error message, just not reported, not reported, and not started.
Data Type "Storage Unit" and Data Type "Storage Unit Group" show last successful load and last run date of July 5.
Data Type "Images" also shows last successful load and last run date of July 5, BUT it also shows "Running" and has since July 5.
Data Type "Media Server" and Data Type "Service" show last successful load and last run date of July 23.
Data Type "Policy & Schedule" and Data Type "Host Properties" both show last successful load and last run date of July 26.
All other Data Types show today, July 29, as their last successful load and last run date.
Upgraded to version 22.214.171.124 a few moments ago. Will watch for further problems with data collection. I have noticed that the reports render quite a bit faster then they had been. At least one of the "features" that is "working as designed" is still reporting the wrong information, though.
I had all three of the issues mentioned above and upgraded to 126.96.36.199. The upgrade is smooth and the product is improved, however we still have "partially connected" problems with appliances and the "high CPU" which actually maxes 6 cores permanently ... for now, support is unable to solve the problem. There seems to be an issue with Alerts and/or Appliances that cause the database to max out CPUs.
I have still data collection issues with 188.8.131.52 and it gets worst. You can restart nbsl, OpsCenter, datacollection and hope that it somehow collects and i can say that i can not rely on data it is reporting actually.
It is annoying as datacollection shows 'Completed' which is not. Have opened a case since more or less 3 month and a solution is still not there as i think all OpsCenter developpers have been fired because of the 'unquality' of this product.
It is also annoying that Support wants to force to put some EEB in Netbackup 7.0.1 for to solve any nbsl issue where they don't know what it will fix. Sending logs which fills up the system with much pain and never get an answer about the analysis is the normal behaviour; and if the Support guy changes is the first thing they are asking -- please set verbose level to 6 !
I think Symantec hires somebody and puts them a 'Checklist' infront and they have to proceed on each call the same damn phrases, without a knowledge about the customer and his history.
So if Symantec says :
Symantec helps consumers and organizations secure and manage their information-driven world.
I would expect that they can manage their organization; but probably not with OpsCenter !
So waiting for a solution which needs more solution ......
One of our largest OpsCenter customers recently upgraded to 184.108.40.206 and all of their data collection issues went away.
Some of the things they had to do was tune nbsl and also the server.conf to preform better. They are a very large environment (almost 100K jobs per day), so they have 96GB of RAM on their OCA box with half of it dedicated to database cache.
So the key in this case was tuning nbsl on the master servers and the server.conf settings. If you haven't already done that, i would recommend you work with support on tuning those items.
Thanks for this information.
Support did not told me to tune in nbsl; so i would be interested in it (no EEB please). Dou you mean server.conf in OpsCenter ? (have already tuned that with support )
Concerning the collection failure issues ... imho upgrading OpsCenter doesn't resolve the actual cause of the problems.
Sometime in the life of 7.x they removed the "collection agents" and its part of the binary on the master servers now.
In the process, they also hard coded in the checks for CLOUD collections ... ie. it cannot be removed and will fail repeatedly, until the master servers themselves are upgraded to 220.127.116.11 or upgraded to 18.104.22.168 and EEB applied.
Upgrading the OpsCenter server from 22.214.171.124 to 126.96.36.199 (in our case) made no difference. Collections would still halt and not restart until we bounced Ops Center daemons.
So, we retired the orginal Linux VM and created a fresh install under Windows of 188.8.131.52 ... I think it runs much better on a windows platform. Now, the cloud collections still fail ... but it doesn't cause all collections to halt, just cloud collections fail ... and it moves on.
But we have yet to get approval to upgrade our Master servers from 184.108.40.206 to 220.127.116.11 ... which is supposed to perm. fix the collector issues.
and RE: the max CPUs behavior ... check your server for (zombie) SQL queries ... if you get lots of hung SQL jobs, it can really slow down and muck things up. If so, try a reboot of the Ops Center server and re-run a query to see if it improves.
And make sure you max out the memory used by Ops Center as well ... don't leave the defaults, even in our environment with (2) master servers ... we increased it to startup up using 4GB of RAM (with 8GB max) and made a big improvement in performance.
In our case, 18.104.22.168 seems to drop data collection after the daily purge. It still says "connected", but is not collecting. This is when services need to be bumped. It would be interesting to know whether anyone else is seeing this.
Yes, we have this issue, a few hanging sql jobs block a CPU core at 100% CPU. We upgraded the hardware, so the offending jobs no longer affect other operations, until a solution is found. As there are never many of these at the same time, having enough CPU cores helps.