EMM_DATA.db - huge size can't be reduced

asg2ki · ‎04-21-2020

Dear All,

I recently got stuck with an issue where the size of my EMM_DATA.db has grown enormously and I can't find a way to reduce it at all. As a short description of my environment (LAB), I have 2 NBU master servers participating in AIR scenario where the first one (which is the source for AIR) has 3 separate MSDP based media servers connected to it. The second master server also has MSDP pool plus a tape library for offloading purposes. I'm replicating all backed up images between my two NBU master domains regularly as backup images gets produced by the clients.

The EMM_DATA.db file on the source master server somehow managed to grow to the enourmous size of 8.9GB while on the target master server it is not larger than 30MB. Although the infrastructure works well enough and I have no major failure out of this DB size except maybe for the increased size of the catalog backups, I wanted to optimize the DB to a somewhat normal size.

After trying to validate, reorganize and rebuild the DB section as per the tech. articles via the regular "nbdb_admin" and "nbdb_unload" commands nothing really has changed to the DB size therefore I started digging a bit further. I noticed one particular discrepancy through the NBDBAdmin utility which shown me visually that the EMM_DATA portion is indicating "-1 bytes" for both "Free DBspace" and "Total DBspace" metrics. Also the size of the DB is indicated to be as 4096MB while in reality it's twice as big which makes me think that there is some kind of a corruption throughout NBDB or perhaps just the EMM_DATA portion although all the validation activities I did via CLI indicated that there are no errors at all. Reorganizing the DB via CLI indicated very few fragments and it also didn't indicate any errors. The rebuild command didn't give me any output but just returned to command prompt, however I noticed slight modification to the NBDB log files (got truncated almost immediately) so I assume that it didn't fail but it was doing it's job.

On the second master server everything works absolutely fine and the sizes are definitely in order. Even though there is no relationship with this I would like to mention that all backup images in the catalogs for both master servers are exactly the same including the granularity factor for some of the backups where I use this feature (in my case Active Directory).

Strangely enough when I tried to do the run the "Reorganize All" command through the GUI utility it returned back "Operation Failed" status message. The result was the same also for the "Rebuild" command. On the contrary "Validation" went through and didn't indicate any issues.

My environment is based on Windows Server 2019 + NetBackup v8.2 for all NBU servers. One of my very old catalog backups that I could dig out (from exactly two years ago) clearly indicates that the EMM_DATA.db file at the time being was a little less than 30MB. If I remember right my servers were running NBU v8.1 binaries at the time being which I have upgraded to v8.1.1, v8.1.2 up till the current v8.2 through the years. I suppose this could be some sort of a bug with any of the intermediate or latest binaries, but right now I'm clueless on how this issue can be solved.

I would appreciate any thoughts from experts around and I hope to find a solution to this pesky little problem.

Cheers

Systems_Team · ‎04-21-2020

Hi Asg2ki,

It seems this is a known issue under 8.2. Here's the the KB doc for it:

The EMM_DATA.db file can grow very large when running NetBackup 8.2 https://www.veritas.com/support/en_US/article.100047315

Hopefully you have current maintenance, which you would need if you want to get the EEB for it. As the article states, if there are no negative symptoms then corrective action is not required. It also states that applying the EEB and performing all the other required steps will not reduce the size of the database if it has already experienced this growth.

Hope this helps,

Steve

asg2ki · ‎04-22-2020

Thanks Steve,

I looked at the article you provided and it definitely seems to have some similarities with scope of my case but only as related to the DB growth. I'm not really sure it falls under exactly the same symptoms through since the article refers more to a situation where the system is experiencing huge stress. In my case this looks more like a problem with the structure / schema of the DB but as you already pointed out even if EEB is applied the growth is not likely to get reduced. Unfortunately I have no ways to get this EEB and respectively give it a try but thanks anyway for trying to help.

BTW as a test I also managed to add additional free space (+25MB) via "NbDbAdmin.exe" utility to the EMM_DATA.db portion specifically but the GUI utility didn't reflect these changes at all and it is still showing me the "-1 bytes" usage as well as 4096MB of allocation. I definitely see the 25MB increase on the file itself from the OS point of view though, so it looks like a nasty bug with that file only.

The negative side effect on my end is the execssive size of backup images produced by the NBU catalog policies since the incremental jobs are also producing 8GB of data during the staging operations. Although my Catalog backups are working absolutely fine I can't say the situation is ideal so hopefully somebody would come up with a way on how the EMM_DATA.db can be fixed or even re-created without having to rebuild the system from scratch.

Cheers

mph999 · ‎04-22-2020

I would run nbdb_unload, and then look to see which tables are taking the most space - make a note of the xxx.dat file

nbdb_unload /someoutput /dir

If you then open reload.sql and search forward for 'INPUT INTO' and then from there search forward for xxx.dat, you will find the table name.

For example:

INPUT INTO "DBM_MAIN"."DBM_HoldImageMedia"
FROM '/netbackup/db/3130.dat'

If you know the Windows version of this, it's even easier ...

grep -B1 '\.dat' reload.sql

asg2ki · ‎04-22-2020

Thanks for the suggestion mph999,

So I did that and I can see the largest file (in my case 837.dat) being 10MB in total. The second largest file (854.dat) is 7MB, then this is followed by five more files of approx. 3.5MB and the rest is literally just few KB where the total number of dat files is 230 with total size of 40MB.

Here are the "reload.sql" references to the 837 and 854 dat files:

-----------------------------------------------------------------------------------------------------------------------------------------------

INPUT INTO "DBM_MAIN"."DBM_Image"
FROM 'S:/1/837.dat'
FORMAT TEXT
ESCAPE CHARACTER '\\'
BY ORDER("ImageKey","MasterServerKey","PreviousBackupImageKey","ParentImageKey","CatArcImageKey","FullBackupImageKey","CSScheduleKey","ClientMachineName","ClientKey","PolicyName","BackupTime","ViewID","ScheduleType","ClientType","ProtocolVersion","StartTime","EndTime","SnapTime","KiloBytes","NumberFragments","NumberCopies","NumberSnapshotDevices","ImageVersion","RetentionLevel","Compression","Encryption","FilesFileCompressed","Mpx","TirInfo","TirExpiration","PrimaryCopy","ImageType","ElapsedTime","Expiration","NumberFiles","ExtSecInfo","RequestPID","IndividualFileRestoreFromRaw","ImageDumpLevel","FileSystemOnly","PrevBlkIncrTime","BlkIncrFullTime","StreamNumber","BackupCopy","BackupStatus","JobID","NumResumes","ResumeExpiration","PfiType","ImageAttribute","Creator","ScheduleName","ProxyClient","KeywordPhrase","FilesFile","FilesFileSize","SoftwareVersion","ObjectDescriptor","EstimatedKiloBytes","VmType","IsSynthetic","RunTime","LastBirthTime","ClientCharacterSet","BmrInfo","OriginMasterServer","OriginMasterServerID","StorageServiceName","StorageServiceState","StorageServiceIndexOffset","StorageServiceIsInactive","TimeInProcess","ClassificationID","StorageServiceVersionNumber","RequiredExpirationDate","ImportFromReplicaTime","IrEnabled","IsQuiesced","SLPOperationsMask","IsScheduled","HasDependees","HasDependents","ValidationCount","ExpirationCount","IndexingStatus","NumChildren","DumpHost","KiloBytesDataTransferred","AppType","CreatedDateTime","LastModifiedDateTime","ProviderGeneratedId")
ENCODING 'UTF-8'
go

-----------------------------------------------------------------------------------------------------------------------------------------------

INPUT INTO "ADTR_MAIN"."ADTR_AuditRecord"
FROM 'S:/1/854.dat'
FORMAT TEXT
ESCAPE CHARACTER '\\'
BY ORDER("AuditRecordKey","MasterServerName","UserName","UserDomain","UserDomainType","AuditTimestamp","MessageId","Category","Operation","NumPHs","NumNVTs","TieInId","Reason","MetaData","RecordId","ParentRecordId","CreatedDateTime","LastModifiedDateTime")
ENCODING 'UTF-8'
go

-----------------------------------------------------------------------------------------------------------------------------------------------

Cheers

mph999 · ‎04-23-2020

So the total size of the .dat files doesn't correspond to the reported size of NBDB. There is an issue with teh audit records at 8.2 (or can be) but you don;t appear to be suffering from that. I'm guessing you may have had this issue for a while, so catalog recovery isn;t the best option as it means 'going back in time' . I would suggest extending the expire time if you still have a catalog image prior to this issue - just in case.

nbdb log in /usr/openv/netbackup logs reports what the unload / re-org did - could be worth a look there but to be honest, if that has failed to reduce the size the only thing I can think of is that this requires Engineering assistance to see if they can fix the issue. Are you able to log a support call. nbdb log is not created by default, needs the dir adding and then services restarting.

For a support call you would need:

nbsu
nbdb log covering a rebuild/ reorg (steps below)
nbdb_unload output
(Reset NBDB passwd (nbdb_admin -dba nbusql) then)
nbdb-backup -online /some/output/dir

(Don't miss the 'hidden' .yekcnedwssap file in the -online backup output), also tell support what you reset the NBDB passwd to (nbusql is the old default but you can use what you want)

Rebuild steps;

/usr/openv/netbackup/bin/bp.kill_all
cd /usr/openv/db/bin
./nbdbms_start_server
./nbdb_admin NBDB -validate
./nbdb_admin NBDB -validate -full
(I would only proceed if validate is clean)
mkdir /usr/openv/netbackup/logs/nbdb_backup
./nbdb_backup -dbn NBDB -online /usr/openv/netbackup/logs/nbdb_backup
ls /usr/openv/netbackup/logs/nbdb_backup (check files exist)
./nbdb_unload -rebuild -verbose
./nbdb_admin -reorganize
./nbdb_unload -rebuild -verbose
bp.kill_all
/usr/openv/netbackup/bin/nbdbms_start_stop start

asg2ki · ‎04-23-2020

Thanks mph999

Unfortunately I can't raise support calls since this is just my personal LAB and I don't have any active maintenance contracts with Veritas, but I agree in normal circumstances that would be the way forward at this point unless someone else here would have further suggestions.

I also think this issue might exist in my NBU master for quite some time althgouh I can't say for how long. The previous catalog backup I mentioned from two years ago was located in a side folder on one of my file servers and probably remained there for emergency purposes prior one of the product upgrades. I usually keep a spare copy once in a while of the entire "NetBackupDB/data" folder for emergency recovery purposes, since in the past I had situations where I lost my "vxdbms.conf" but that's just on a side note. In any case the old catalog won't be usable anyway because I made a lot of modifications to my environment throughout the years therefore it won't justify the time required to restore it and then re-import images, storage pools, etc. which would be almost as if I rebuild the entire NBU master from scratch. Obviously I'm trying to avoid that part since the backup processes are working fine at the moment, it's just this catalog size thing that keeps bugging.

BTW until yesterday I never knew that there is this built-in option to do the "nbdb_unload" of the database into a folder (I'm not a DB guy anyway). I kinda understood this process as dumping the individual tables out of the DB's into separate DAT files + providing the corresponding SQL rebuild script for it. Correct me if I'm wrong but I'm thinking there should be a way then to reverse the same activity and recreate all EMM_DATA, EMM_INDEX, JOB_DATA, etc DB's out of the source folder where the DAT files were dumped. Maybe I'm totally wrong on this one and perhaps such actions would be possible only via specialized engineering tools but I'm still curious to know if this wouldn't be too crazy idea :)

Also if you have any further suggestions, those would be very much appreciated.

Cheers

mph999 · ‎04-23-2020

"I kinda understood this process as dumping the individual tables out of the DB's into separate DAT files + providing the corresponding SQL rebuild script for it"

.. exactly correct - it creates a .dat file for each table and the sql.reload file also.

It is also true, you can reverse the process (hence the reload file) - one way to do this is via SQL Anywhere, you can load up the nbdb_unload and then you 'replay' the reload file that turns it back into a running DB. The slightly bad news is that it's not quite as easy - you have to edit the reload file (as it's now on a PC, which is different to the server it came from ). I've not done it for ages, but at the latest versions of NBU with secure comms, 'I think' (not 100% sure) that there are addition step(s) also.

TBH, if it's a test enviroment, I'd just be tempted to rebuild it personally ...

asg2ki · ‎04-23-2020

Thanks for the clarification.

This additional overhead on the catalog is not critical for the moment except for the incremental catalog jobs but as a workaround I'll simply reduce the schedules to happen less often. My initial concern was if the EMM_DATA file would keep growing further but from what I can tell so far the chances are extremely limited and perhaps I'll simply rebuild the environment at some point.

As per your remarks on the difficulties around recreating the DB via the DAT files, I don't think it would worth the investment of time and efforts. Still it was a very good troubleshooting session and great discussion too.

Thanks again for your help and patience :)

mph999 · ‎04-23-2020

You are most welcome - catalog issues are quite good fun to do ... maybe I need to get out more.

I think the time required maybe be fairly significant, with no promise of success. You can for certain make certain repairs with SQL Anywhere, but with this, it sounds like it may be in quite a state . Like your goodself, I'm not a DB guy, I've just picked it up/ work thiings out when needed. Anything more complex then I have colleagues who can help, or of course Engineering.

Another thought is that you need a licesnse I think for SQL Anywhere - and if I remember if hard to find the download link (why ???) - the version of it you need depends on the SQL version, which changes within NBU from time to time.

VOX

EMM_DATA.db - huge size can't be reduced