08-15-2013 09:03 PM
Hi all,
I'm experiencying the same issue as https://www-secure.symantec.com/connect/forums/heavy-load-emm-db-high-cpu-utilization... with NBU 7.1.0.4 on hpux master server and having a lot of jobs starting but going to queue state showing "Waiting in NetBackup scheduler work queue on server ..." I noted that NB_dbsrv is consuming 100% of CPU... I do not have a big EMM_DATA.db file as you can see below:
ebrbsnp05 >> find /nbdb_catalog -name EMM_DATA.db
/nbdb_catalog/data/EMM_DATA.db
ebrbsnp05 >> ls -l /nbdb_catalog/data/EMM_DATA.db
-rw------- 1 root sys 51531776 Aug 15 12:26 /nbdb_catalog/data/EMM_DATA.db
ebrbsnp05 >> du -k /nbdb_catalog/data/EMM_DATA.db
50324 /nbdb_catalog/data/EMM_DATA.db
ebrbsnp05 >>
But I'm having 100 active jobs and 140 queued... all of the queued jobs are "Waiting in NetBackup scheduler work queue on server <master server name>"
I have opened a case with Symantec but it looks like they have not a clear understanding about the issue.
Have you finally got a solution for this case? How does the nbdb rebuild/reorganize that mph999 suggested above work? Did this help? Have you tried this?
Let me mention that I tried to perform this rebuild/reorganize yesterday, but I got an error while trying to take a backup of the nbdb vefore to do the first rebuild (I tried to take a backup of the nbdb just to be safe)... see below what I got...
I'm receiving the following error Segmentation fault (core dumped) when trying to perform a backup of the NDBD... could you please give us a hand?
Check below please…
ebrbsnp05 >> /usr/openv/netbackup/bin/nbdbms_start_stop start
ebrbsnp05 >> ../bpps -x
NB Processes
------------
root 11363 1 0 11:22:32 ? 0:00 /usr/openv/db//bin/NB_dbsrv @/usr/openv/var/global/server.conf @/usr/openv/var/global/databases.conf -hn 7
MM Processes
------------
Shared Symantec Processes
-------------------------
root 11324 1 0 11:22:27 ? 0:00 /opt/VRTSpbx/bin/pbx_exchange
ebrbsnp05 >> /usr/openv/db/bin/nbdb_ping
Database [NBDB] is alive and well on server [NB_ebrvmsnp01].
ebrbsnp05 >> mkdir /nbdb_catalog/backup
ebrbsnp05 >> /usr/openv/db/bin/nbdb_backup -dbn NBDB -online /nbdb_catalog/backup/backup1
Segmentation fault (core dumped)
ebrbsnp05 >>
Symantec is looking into some logs now... could anybody please help? Thanks in advance...
Let me add that I have also rebooted the master server yesterday, but the issue is still there... the reboot did not help. :(
Seba.
Solved! Go to Solution.
08-16-2013 12:25 PM
The crash of the nbdb_backup is a bit disturbing. A segmentation violation is due to some sort of memory allocation/deallocation or reference error. Check the system error logs to see what it is reporting for that.
The NB_dbsrv process performs SQL actions against all of the NBU databases (NBDB, EMMDB, BMRDB). If it is consuming 100% of the CPU then a query action is looping rather badly. This can cause lock-out conditions against other database connections. This may be a result of some data corruption internal to the DB Do a check of the server.log file to see if it displays any sort of DB access errors or problems. Always a good place to start. The NB_dvsrv process logs it's activity to the file noted in its 'server.conf' file. The location is specified by the "-o " option. Default location is "/usr/openv/db/log/server.log ".
To make a copy of the DB files without the use of the nbdb_backup command:
1. Stop all NBU services: ../netbackup/bin/rc.kill_all
This will quiesce the databases and all connections to them.
2. Manually copy the contents of the directory to another location. Based on your initial input:
cd /nbdb_catalog/data
cp * /nbdb_catalog/backup/backup1
Make any changes you want to the "server.conf" file at this time as well. If something goes awry you can revert back by stopping all sercies and copying back the files to their original location.
3. Restart services: ./netbackup/bin/rc.start_all
4. Perform the nbdb rebuild/reorganize noted previously.
5. The NB_dbsrv is multi-threaded and by default can have up to 20 open connections. The "-gn ##" modifies this. Larger values will help under specific loads. However, stay below 40.
6. The "-ch ###" sets the high water mark of cached shared memory for the process. I consider the value "-ch 3G" to be a bit too aggressive as it can tie up overall system resources. The information submitted does not indicate memory resources of the server. I try to limit this to "-ch 1G" under most circumstances. To see if NB_dbsrv is actively allocating more shared memory, look in the '/usr/openv/db/log/server.log' file.
Let's see how things go after making the changes.
08-15-2013 10:57 AM
@Seba
Before a year we dump an HPUX master server to a AIX system because of memory, CPU and many CORBA problems.
I suggest you to do the same. Go to an other platform, Linux is a good choice.
08-16-2013 05:29 AM
Changing from HPUX to AIX or Linux is not an option for me as I work on HP :)
......
@mph999 I've already modifed some files, per symantec advise, and the issue has gone for almost 3 weeks... but last monday the issue was back and today I have 300 queued jobs and 150 active... I'm already using USE_HASH=1 as you mentioned...
The changes I did 3 weeks ago, when the issue was "fixed" (for 3 weeks) are these:
My friends, I'll continue on the new post --> https://www-secure.symantec.com/connect/forums/nbd...
Please continue on this new post.
Thanks in advance!
Seba.
08-16-2013 12:25 PM
The crash of the nbdb_backup is a bit disturbing. A segmentation violation is due to some sort of memory allocation/deallocation or reference error. Check the system error logs to see what it is reporting for that.
The NB_dbsrv process performs SQL actions against all of the NBU databases (NBDB, EMMDB, BMRDB). If it is consuming 100% of the CPU then a query action is looping rather badly. This can cause lock-out conditions against other database connections. This may be a result of some data corruption internal to the DB Do a check of the server.log file to see if it displays any sort of DB access errors or problems. Always a good place to start. The NB_dvsrv process logs it's activity to the file noted in its 'server.conf' file. The location is specified by the "-o " option. Default location is "/usr/openv/db/log/server.log ".
To make a copy of the DB files without the use of the nbdb_backup command:
1. Stop all NBU services: ../netbackup/bin/rc.kill_all
This will quiesce the databases and all connections to them.
2. Manually copy the contents of the directory to another location. Based on your initial input:
cd /nbdb_catalog/data
cp * /nbdb_catalog/backup/backup1
Make any changes you want to the "server.conf" file at this time as well. If something goes awry you can revert back by stopping all sercies and copying back the files to their original location.
3. Restart services: ./netbackup/bin/rc.start_all
4. Perform the nbdb rebuild/reorganize noted previously.
5. The NB_dbsrv is multi-threaded and by default can have up to 20 open connections. The "-gn ##" modifies this. Larger values will help under specific loads. However, stay below 40.
6. The "-ch ###" sets the high water mark of cached shared memory for the process. I consider the value "-ch 3G" to be a bit too aggressive as it can tie up overall system resources. The information submitted does not indicate memory resources of the server. I try to limit this to "-ch 1G" under most circumstances. To see if NB_dbsrv is actively allocating more shared memory, look in the '/usr/openv/db/log/server.log' file.
Let's see how things go after making the changes.
08-16-2013 04:26 PM
the error I got when tried to backup the nbdb was due a bad command syntaxis :) I should not specify a file name, just the path to save the backup... for instance...
/usr/openv/db/bin/nbdb_backup -dbn NBDB -online /nbdb_catalog/backup/
this one worked perfect :)
I'm still having the issue... symantec suggested to make some more tunning but the issue is still there... I did the rebuild and reorganize of the NBDB and I changed the server.conf file and installed the following eeb patch (by symantec suggestion) but the issue has not gone :(
08-16-2013 05:48 PM
Seba:
I do not think I said anything about lack of resources for this problem. I tried to explain that a NB_dbsrv connection thread process appears to be just spinning inside a database. Understand that NB_dbsrv handles connections from the NBU processes to all of the databases. That would be NBDB. EMMDB, and (if configured) BMRDB. NB_dbsrv is multi-threaded and as such will have multiple threads running associated within its process scope. The "spining" thread, if one exists as I suspect, will be operating strictly in memory, using shared memory space, so nothing will be stoping it from taking over a CPU time slice. For me that is the best guess.
Did you look at the file "/usr/openv/db/log/server.log " to see what is being written to it? As I said before, the information may be valuable for this.
Also, when do you see the CPU hit the 100% load value? Is it that way after stopping and then starting NBU processes?
08-16-2013 06:08 PM
Hi Jaime,
I do not see anything wrong on this log... I cut the lastest 100 lines (I have 103 queued jobs and 98 active currently)..
On the other hand we do not have BMRDB as we do not use Bare Metal Restore option... most of our backups are just "standard" "windows" or "sap" or "oracle"...
08-26-2013 12:20 AM
Hi,