02-26-2013 07:01 AM
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12426 root 18 0 6647m 463m 7344 S 20.5 1.4 304:27.37 NB_dbsrv 13148 root 16 0 572m 174m 27m S 5.9 0.5 81:56.29 nbemm
In nbemm log a have huge amount of entries like below:
0,51216,111,111,1894847,1361811878412,13148,1089464640,0:,65:Preallocated <5> elements in curViewSeq for <c-sapbwp_1361345386>,22:ImageObject::fetchView,1 0,51216,111,111,1894848,1361811878413,13148,1089464640,0:,12:retval - <0>,36:ImageCatalogImpl::getImageWithCopies,1 1,51216,111,111,1894849,1361811878413,13148,1106905408,0:,0:,37:ImageCatalogImpl::getImageCopyDetails,1,(1061|A65:HOST=<tygrys.xxxx.xx> VER=<700000> APP=<nbstserv> PID=<13415> |) 1,51216,111,111,1894850,1361811878421,13148,1106905408,0:,0:,37:ImageCatalogImpl::getImageCopyDetails,1,(1062|u32:2020006|) 1,51216,111,111,1894851,1361811878422,13148,47815214316672,0:,0:,37:ImageCatalogImpl::getImageCopyDetails,1,(1061|A65:HOST=<tygrys.xxxx.xx> VER=<700000> APP=<nbstserv> PID=<13415> |) 0,51216,111,111,1894852,1361811878428,13148,47815214316672,0:,12:retval - <0>,37:ImageCatalogImpl::getImageCopyDetails,1 1,51216,111,111,1894853,1361811878429,13148,1089464640,0:,0:,41:ImageCatalogImpl::getTotalSizeAndMediaIds,1,(1061|A65:HOST=<tygrys.xxxx.xx> VER=<700000> APP=<nbstserv> PID=<13415> |) 0,51216,111,111,1894854,1361811878429,13148,1089464640,0:,35:<Need to send unique media sequnce>,41:ImageCatalogImpl::getTotalSizeAndMediaIds,1 0,51216,111,111,1894855,1361811878435,13148,1089464640,0:,12:retval - <0>,41:ImageCatalogImpl::getTotalSizeAndMediaIds,1 1,51216,111,111,1894856,1361811878517,13148,1106905408,0:,0:,36:ImageCatalogImpl::getImageWithCopies,1,(1061|A65:HOST=<tygrys.xxxx.xx> VER=<700000> APP=<nbstserv> PID=<13415> |) 0,51216,111,111,1894857,1361811878524,13148,1106905408,0:,68:Preallocated <5> elements in curViewSeq for <ichmura-025_1361345428>,22:ImageObject::fetchView,1 0,51216,111,111,1894858,1361811878525,13148,1106905408,0:,12:retval - <0>,36:ImageCatalogImpl::getImageWithCopies,1 1,51216,111,111,1894859,1361811878525,13148,47815214316672,0:,0:,37:ImageCatalogImpl::getImageCopyDetails,1,(1061|A65:HOST=<tygrys.xxxx.xx> VER=<700000> APP=<nbstserv>
Solved! Go to Solution.
02-28-2013 12:16 PM
Just wondering if the netbackup logs folder are has not failed over with the rest of the cluster or has a write issue and NetBackup is desperately trying and failing to write its log files but cannot causing a blockage
Check everything out on the node you are on in case that node has a configuration file of similar which is pointing the logs to a non existent or write protected area
02-26-2013 07:36 AM
Try running /usr/openv/db/bin/dbadm (default password nbusql) and select 2) Database Space and Memory Management and select 4) Adjust Memory Settings, 3) Large.
Re-start of Netbackup required (Yes I know - you need to schedule it then)
But you may also have connection orinted errors as well
02-26-2013 01:34 PM
Thanks for response.
I have cache size as follows:
(Setting) (Initial) (Minimum) (Maximum) Current 500M 500M 6000M Small 25M 25M 500M Medium 200M 200M 750M Large 500M 500M 1G
The cache size is not a problem. With the same setting NBU worked well on primary site. This is support recommendation from few months ago.
How i can diagnose what type of activity causing this strange behavior? Sometimes is impossible to run vxlogview command. It returns:
]~# /usr/openv/netbackup/bin/vxlogview -o 111 -t 00:10:00 > /tmp/111.txt /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists /var/log/VRTSpbx/50936-103-8323328-130226-0000000201.log does not exists
I am looking the confirmation that there is possibility to safety restart NBU, clear cache and restart PBX. I am afraid that after this operation the database inconsistency can occur.
Regards
Madej
02-28-2013 11:50 AM
The problem looks serious and even technical support has no idea what is going on.
NBU and pbx has been restarted, cache has been cleared and the symptoms still the same.
When there are no running jobs the CPU is utilized on 20-30%. But when jobs are running (~200 active and ~100 queued) CPU utilization is about 140% - 180% (4 CPU)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12426 root 18 0 6859m 1.1g 8940 S 170.5 3.4 2225:35 NB_dbsrv 13148 root 15 0 640m 186m 29m S 56.8 0.6 684:40.26 nbemm 13337 root 15 0 422m 161m 16m S 9.5 0.5 41:16.39 nbjm
And in addition i have found that logs are not being created. On primary site logs were growing very quickly.
Support will analize NBSU logs. I hope that this help.
Regards
Madej
02-28-2013 12:04 PM
cd /usr/openv/db/data
ls -l (post the output)
02-28-2013 12:16 PM
Just wondering if the netbackup logs folder are has not failed over with the rest of the cluster or has a write issue and NetBackup is desperately trying and failing to write its log files but cannot causing a blockage
Check everything out on the node you are on in case that node has a configuration file of similar which is pointing the logs to a non existent or write protected area
02-28-2013 03:01 PM
Yikes nasty one ...
I would add USE_HAS=1 into /usr/openv/var/global/nbemm.conf
(create this if not alreay there).
1. Stop NBU
2. Start just the DB - /usr/openv/db/bin/nbdbms_start_server
3. nbdb_unload –rebuild
4. nbdb_admin –reorganize
5. nbdb_unload –rebuild
Think of this as a 'defrag' on the NBDB.
(Not sure how long this willl take, as an idea, I ran this a while back on a 15GB DB (the size of the sql db, not the image db) and it took about 2.5 hours, but this was on a powerful machine with lots of memory (it runs in memory so the more the better)).
Martin
02-28-2013 06:48 PM
Some quick questions come up. What is the size of NBDB? If you look at the contents of /usr/openv/db/data (or it's appropriate link) does it require the cache values set in server.conf (-c/-cl-ch)? Have you added -gn <value> (I normally recommend setting this to 30 for initial testing. Can you check how the systems semaphore values are set? The following process can be used to set this:
sysctl -a | grep kernel.sem
Examine current settings, and if they are under the recommended values, you can increase them. This process is described in (http://www.symantec.com/docs/TECH203066). The other nice thing about changing semaphore values is that it can be done without a restart.
If you see that EMM_DATA.db is excessively large in size (multiple GB or higher) then you will need to perform the rebuild/reorganize steps referenced by mph999. If you end up having to use rebuild/reorganize you may want to run it multiple times. When you run rebuild/reorg if it doesn't complete processing *.dbR files will be left over. If that happens you will want to run another rebuild/reorganize.
One last thing I would recommend looking at is pack.summary from both sites. In NetBackup 7.0.1 there were some issues regarding EMM growth and deadlocks that required EEB's to fix. The NetBackup Late Breaking News article may be of use identifying specific failure points.
03-01-2013 02:20 AM
ll /opt/netbck/db/data total 7265632 -rw------- 1 root root 26218496 May 9 2011 DARS_DATA.db -r-------- 1 root root 135168 May 8 2012 DARS_DATA.dbR -rw------- 1 root root 26218496 May 9 2011 DARS_INDEX.db -r-------- 1 root root 36864 May 8 2012 DARS_INDEX.dbR -rw------- 1 root root 314798080 Mar 1 11:18 DBM_DATA.db -r-------- 1 root root 216436736 May 8 2012 DBM_DATA.dbR -rw------- 1 root root 35803136 Mar 1 11:18 DBM_INDEX.db -r-------- 1 root root 16977920 May 8 2012 DBM_INDEX.dbR -rw------- 1 root root 4837781504 Mar 1 11:18 EMM_DATA.db -r-------- 1 root root 97587200 May 8 2012 EMM_DATA.dbR -rw------- 1 root root 26218496 Mar 1 11:18 EMM_INDEX.db -r-------- 1 root root 724992 May 8 2012 EMM_INDEX.dbR -rw------- 1 root root 36724736 Mar 1 11:16 NBDB.db -r--r--r-- 1 root root 39100416 May 8 2012 NBDB.dbR -rw------- 1 root root 1752170496 Mar 1 11:18 NBDB.log -r-------- 1 root root 327680 May 8 2012 NBDB.logR -rw------- 1 root root 460 Mar 1 00:01 vxdbms.conf
Linked to the shared volume.
Regards
Madej
03-01-2013 02:28 AM
I have found that the log in nbdb directory is old and not being created new.
Other logs are in place.
The volume for logs is not in cluster configration. This is standalone and is mounted administratively. The directory structure is correct created by mklogdir script.
Have to analize the nbsu logs to find some differences in configration between primary and secondary server.
Regards
Madej
03-01-2013 02:45 AM
Database is not very big (without images):
du -sk /opt/netbck/db/data/* 25608 /opt/netbck/db/data/DARS_DATA.db 136 /opt/netbck/db/data/DARS_DATA.dbR 25608 /opt/netbck/db/data/DARS_INDEX.db 40 /opt/netbck/db/data/DARS_INDEX.dbR 307432 /opt/netbck/db/data/DBM_DATA.db 211376 /opt/netbck/db/data/DBM_DATA.dbR 34968 /opt/netbck/db/data/DBM_INDEX.db 16584 /opt/netbck/db/data/DBM_INDEX.dbR 4724408 /opt/netbck/db/data/EMM_DATA.db 95304 /opt/netbck/db/data/EMM_DATA.dbR 25608 /opt/netbck/db/data/EMM_INDEX.db 712 /opt/netbck/db/data/EMM_INDEX.dbR 46912 /opt/netbck/db/data/NBDB.db 38184 /opt/netbck/db/data/NBDB.dbR 1728808 /opt/netbck/db/data/NBDB.log 320 /opt/netbck/db/data/NBDB.logR 8 /opt/netbck/db/data/vxdbms.conf
Semafores are set the same as on primary site:
sysctl -a | grep kernel.sem kernel.sem = 250 32000 32 128
Pack summary looks as follows:
# DO NOT EDIT THIS FILE ! # * means installed patch was preceded by this patch. # + means that the installed patch installed this patch as a dependency. NB_CLT_7.0.1 installed. +NB_7.0.1 +NB_JAV_7.0.1 NB_7.0.1 installed. *NB_CLT_7.0.1 NB_JAV_7.0.1 installed. *NB_CLT_7.0.1 EEB_NetBackup_7.0.1_PET2140767_SET2140749_EEB2 EEB_NetBackup_7.0.1_PET2201350_SET2200966_EEB2
Is identical with primary site.
I have to shutdown NBU to rebuild/reorganize database. Need discuss with administrator.
Regards
Madej
08-15-2013 10:41 AM
Hi all,
I'm experiencying the same issue as you... with NBU 7.1.0.4 on hpux master server and having a lot of jobs starting but going to queue state showing "Waiting in NetBackup scheduler work queue on server ..." I noted that NB_dbsrv is consuming 100% of CPU... I do not have a big EMM_DATA.db file as you can see below:
ebrbsnp05 >> find /nbdb_catalog -name EMM_DATA.db
/nbdb_catalog/data/EMM_DATA.db
ebrbsnp05 >> ls -l /nbdb_catalog/data/EMM_DATA.db
-rw------- 1 root sys 51531776 Aug 15 12:26 /nbdb_catalog/data/EMM_DATA.db
ebrbsnp05 >> du -k /nbdb_catalog/data/EMM_DATA.db
50324 /nbdb_catalog/data/EMM_DATA.db
ebrbsnp05 >>
But I'm having 100 active jobs and 140 queued... all of the queued jobs are "Waiting in NetBackup scheduler work queue on server <master server name>"
I have opened a case with Symantec but it looks like they have not a clear understanding about the issue.
Have you finally got a solution for this case? How does the nbdb rebuild/reorganize that mph999 suggested above work? Did this help? Have you tried this?
Let me mention that I tried to perform this rebuild/reorganize yesterday, but I got an error while trying to take a backup of the nbdb vefore to do the first rebuild (I tried to take a backup of the nbdb just to be safe)... see below what I got...
I'm receiving the following error Segmentation fault (core dumped) when trying to perform a backup of the NDBD... could you please give us a hand?
Check below please…
ebrbsnp05 >> /usr/openv/netbackup/bin/nbdbms_start_stop start
ebrbsnp05 >> ../bpps -x
NB Processes
------------
root 11363 1 0 11:22:32 ? 0:00 /usr/openv/db//bin/NB_dbsrv @/usr/openv/var/global/server.conf @/usr/openv/var/global/databases.conf -hn 7
MM Processes
------------
Shared Symantec Processes
-------------------------
root 11324 1 0 11:22:27 ? 0:00 /opt/VRTSpbx/bin/pbx_exchange
ebrbsnp05 >> /usr/openv/db/bin/nbdb_ping
Database [NBDB] is alive and well on server [NB_ebrvmsnp01].
ebrbsnp05 >> mkdir /nbdb_catalog/backup
ebrbsnp05 >> /usr/openv/db/bin/nbdb_backup -dbn NBDB -online /nbdb_catalog/backup/backup1
Segmentation fault (core dumped)
ebrbsnp05 >>
Symantec is looking into some logs now... could anybody please help? Thanks in advance...
Let me add that I have also rebooted the master server yesterday, but the issue is still there... the reboot did not help. :(
Seba.
08-15-2013 11:09 AM
Hi my friend... just a little detail... if you see my email it is @hp.com so... I work on HP, my management will not like your suggestion LOL :)
Changing from HP to IBM is not an option for me... anyway thank you for your advise... I'll keep looking...
08-15-2013 12:14 PM
08-15-2013 01:48 PM
08-15-2013 09:08 PM
I have moved this new issue to a new discussion.
The original thread is almost 6 months old and the OP stopped responding.
Can I ask everyone who replied o this new post to retype replies in the new discussion?
https://www-secure.symantec.com/connect/forums/nbdbsrv-consuming-100-cpu
08-16-2013 05:27 AM
@mph999 I've already modifed some files, per symantec advise, and the issue has gone for almost 3 weeks... but last monday the issue was back and today I have 300 queued jobs and 150 active... I'm already using USE_HASH=1 as you mentioned...
The changes I did 3 weeks ago, when the issue was "fixed" (for 3 weeks) are these:
My friends, I'll continue on the new post --> https://www-secure.symantec.com/connect/forums/nbdbsrv-consuming-100-cpu
Please continue on this new post.
Thanks in advance!
Seba.
08-16-2013 06:35 AM
Please continue discussion in this new thread:
https://www-secure.symantec.com/connect/forums/nbdbsrv-consuming-100-cpu