Forum Discussion

mrmadej's avatar
mrmadej
Level 4
12 years ago

Heavy load on EMM db, high CPU utilization

 

Hello,
 
 
 
I want ask for any suggestion regarding heavy load on EMM database after switching to second cluster site using VCS GCO and catalog replication by VVR. 
 
The physical hostname is changed and virtual IP address for master server. Virtual hostname is the same. In bp.conf i have the parameter ANY_CLUSTER_INTERFACE = 1.
 
NBU version 7.0.1 on Linux RedHat 5.6
 
Yesterday I switched NBU to backup site and everything was looking good except high CPU utilization by NB_dbsrv process. The utilization is permanent on 20% CPU and in peaks reach 100%. I thought that this situation is for few minutes, but it is still the same.

 

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12426 root      18   0 6647m 463m 7344 S 20.5  1.4 304:27.37 NB_dbsrv
13148 root      16   0  572m 174m  27m S  5.9  0.5  81:56.29 nbemm

In nbemm log a have huge amount of entries like below:

 

0,51216,111,111,1894847,1361811878412,13148,1089464640,0:,65:Preallocated <5> elements in curViewSeq for <c-sapbwp_1361345386>,22:ImageObject::fetchView,1
0,51216,111,111,1894848,1361811878413,13148,1089464640,0:,12:retval - <0>,36:ImageCatalogImpl::getImageWithCopies,1
1,51216,111,111,1894849,1361811878413,13148,1106905408,0:,0:,37:ImageCatalogImpl::getImageCopyDetails,1,(1061|A65:HOST=<tygrys.xxxx.xx> VER=<700000> APP=<nbstserv> PID=<13415> |)
1,51216,111,111,1894850,1361811878421,13148,1106905408,0:,0:,37:ImageCatalogImpl::getImageCopyDetails,1,(1062|u32:2020006|)
1,51216,111,111,1894851,1361811878422,13148,47815214316672,0:,0:,37:ImageCatalogImpl::getImageCopyDetails,1,(1061|A65:HOST=<tygrys.xxxx.xx> VER=<700000> APP=<nbstserv> PID=<13415> |)
0,51216,111,111,1894852,1361811878428,13148,47815214316672,0:,12:retval - <0>,37:ImageCatalogImpl::getImageCopyDetails,1
1,51216,111,111,1894853,1361811878429,13148,1089464640,0:,0:,41:ImageCatalogImpl::getTotalSizeAndMediaIds,1,(1061|A65:HOST=<tygrys.xxxx.xx> VER=<700000> APP=<nbstserv> PID=<13415> |)
0,51216,111,111,1894854,1361811878429,13148,1089464640,0:,35:<Need to send unique media sequnce>,41:ImageCatalogImpl::getTotalSizeAndMediaIds,1
0,51216,111,111,1894855,1361811878435,13148,1089464640,0:,12:retval - <0>,41:ImageCatalogImpl::getTotalSizeAndMediaIds,1
1,51216,111,111,1894856,1361811878517,13148,1106905408,0:,0:,36:ImageCatalogImpl::getImageWithCopies,1,(1061|A65:HOST=<tygrys.xxxx.xx> VER=<700000> APP=<nbstserv> PID=<13415> |)
0,51216,111,111,1894857,1361811878524,13148,1106905408,0:,68:Preallocated <5> elements in curViewSeq for <ichmura-025_1361345428>,22:ImageObject::fetchView,1
0,51216,111,111,1894858,1361811878525,13148,1106905408,0:,12:retval - <0>,36:ImageCatalogImpl::getImageWithCopies,1
1,51216,111,111,1894859,1361811878525,13148,47815214316672,0:,0:,37:ImageCatalogImpl::getImageCopyDetails,1,(1061|A65:HOST=<tygrys.xxxx.xx> VER=<700000> APP=<nbstserv> 

 

 

There is difficult to log in to java gui, but after few minutes gui being responding. But is impossible to open Device Monitor, Devices and other tabs where i can monitor devices. This is big inconvenience for administrators.
 
Most of policies can run and working well, also restores can be started. Unfortunately not all media servers are working as is expected because some of them is active only for disk, and should be active for disk and tape. When i try to activate the host via "vmoprcmd -activate_host -h <host>" i recieve "network protocol error (39)" message.
 
I can ask EMM db using nbemmcmd command (i.e. nbemmcmd -listhosts). I can also change some setting using this command. EMM is responding immediately.
It is possible that the database has to validate itself? NBU database has about 1.1TB. I am affraid to restart NBU, this is very critical system for others production databases.
 
Thanks for any suggestions.
 
Regards
Madej
  • Just wondering if the netbackup logs folder are has not failed over with the rest of the cluster or has a write issue and NetBackup is desperately trying and failing to write its log files but cannot causing a blockage

    Check everything out on the node you are on in case that node has a configuration file of similar which is pointing the logs to a non existent or write protected area

17 Replies

Replies have been turned off for this discussion