NB_dbsrv consuming 100% of CPU
Hi all,
I'm experiencying the same issue as https://www-secure.symantec.com/connect/forums/heavy-load-emm-db-high-cpu-utilization... with NBU 7.1.0.4 on hpux master server and having a lot of jobs starting but going to queue state showing "Waiting in NetBackup scheduler work queue on server ..." I noted that NB_dbsrv is consuming 100% of CPU... I do not have a big EMM_DATA.db file as you can see below:
ebrbsnp05 >> find /nbdb_catalog -name EMM_DATA.db
/nbdb_catalog/data/EMM_DATA.db
ebrbsnp05 >> ls -l /nbdb_catalog/data/EMM_DATA.db
-rw------- 1 root sys 51531776 Aug 15 12:26 /nbdb_catalog/data/EMM_DATA.db
ebrbsnp05 >> du -k /nbdb_catalog/data/EMM_DATA.db
50324 /nbdb_catalog/data/EMM_DATA.db
ebrbsnp05 >>
But I'm having 100 active jobs and 140 queued... all of the queued jobs are "Waiting in NetBackup scheduler work queue on server <master server name>"
I have opened a case with Symantec but it looks like they have not a clear understanding about the issue.
Have you finally got a solution for this case? How does the nbdb rebuild/reorganize that mph999 suggested above work? Did this help? Have you tried this?
Let me mention that I tried to perform this rebuild/reorganize yesterday, but I got an error while trying to take a backup of the nbdb vefore to do the first rebuild (I tried to take a backup of the nbdb just to be safe)... see below what I got...
I'm receiving the following error Segmentation fault (core dumped) when trying to perform a backup of the NDBD... could you please give us a hand?
Check below pleaseā¦
ebrbsnp05 >> /usr/openv/netbackup/bin/nbdbms_start_stop start
ebrbsnp05 >> ../bpps -x
NB Processes
------------
root 11363 1 0 11:22:32 ? 0:00 /usr/openv/db//bin/NB_dbsrv @/usr/openv/var/global/server.conf @/usr/openv/var/global/databases.conf -hn 7
MM Processes
------------
Shared Symantec Processes
-------------------------
root 11324 1 0 11:22:27 ? 0:00 /opt/VRTSpbx/bin/pbx_exchange
ebrbsnp05 >> /usr/openv/db/bin/nbdb_ping
Database [NBDB] is alive and well on server [NB_ebrvmsnp01].
ebrbsnp05 >> mkdir /nbdb_catalog/backup
ebrbsnp05 >> /usr/openv/db/bin/nbdb_backup -dbn NBDB -online /nbdb_catalog/backup/backup1
Segmentation fault (core dumped)
ebrbsnp05 >>
Symantec is looking into some logs now... could anybody please help? Thanks in advance...
Let me add that I have also rebooted the master server yesterday, but the issue is still there... the reboot did not help. :(
Seba.
The crash of the nbdb_backup is a bit disturbing. A segmentation violation is due to some sort of memory allocation/deallocation or reference error. Check the system error logs to see what it is reporting for that.
The NB_dbsrv process performs SQL actions against all of the NBU databases (NBDB, EMMDB, BMRDB). If it is consuming 100% of the CPU then a query action is looping rather badly. This can cause lock-out conditions against other database connections. This may be a result of some data corruption internal to the DB Do a check of the server.log file to see if it displays any sort of DB access errors or problems. Always a good place to start. The NB_dvsrv process logs it's activity to the file noted in its 'server.conf' file. The location is specified by the "-o " option. Default location is "/usr/openv/db/log/server.log ".
To make a copy of the DB files without the use of the nbdb_backup command:
1. Stop all NBU services: ../netbackup/bin/rc.kill_all
This will quiesce the databases and all connections to them.
2. Manually copy the contents of the directory to another location. Based on your initial input:
cd /nbdb_catalog/data
cp * /nbdb_catalog/backup/backup1Make any changes you want to the "server.conf" file at this time as well. If something goes awry you can revert back by stopping all sercies and copying back the files to their original location.
3. Restart services: ./netbackup/bin/rc.start_all
4. Perform the nbdb rebuild/reorganize noted previously.
5. The NB_dbsrv is multi-threaded and by default can have up to 20 open connections. The "-gn ##" modifies this. Larger values will help under specific loads. However, stay below 40.
6. The "-ch ###" sets the high water mark of cached shared memory for the process. I consider the value "-ch 3G" to be a bit too aggressive as it can tie up overall system resources. The information submitted does not indicate memory resources of the server. I try to limit this to "-ch 1G" under most circumstances. To see if NB_dbsrv is actively allocating more shared memory, look in the '/usr/openv/db/log/server.log' file.
Let's see how things go after making the changes.