Can't remove PureDisk fragments, and have tried th...

Hostasaurus2 · ‎09-06-2012

We had a raid controller failure (thanks Dell!) on an NB 7.1 media server that was doing both tape and disk-based deduplication backups. It of course corrupted the OS and puredisk volumes. Not a huge deal, we use puredisk for staging of backups and then write them to tape for safe keeping. Anyway, hardware has been fixed, new OS installed, NB7.1 reinstalled, server has the same name and IP. It's working just fine for tape activities but the disk pool is of course offline. My goal now is to delete all remnants of the disk pool, then reconfigure puredisk on the rebuilt media server and add back in a new pool.

I successfully deleted most disk images from the catalog by running:

bpimmedia -l -stype PureDisk

to list them, then taking the backup id component of the output and running that through:

bpexpdate -force -d 0 -copy 1 -backupid <backup id>

Deleted the storage unit, then tried to delete the disk pool, got the classic error about can't delete while fragments exist. Re-ran bpimmedia and yep, still there. Ran "nbstlutil list -U" to confirm, yep, still there. Each image will show a single copy, instance 1, and two fragments, both copy one, and one with fragment id -1 and one with fragment id 1. The backup id field shows the same under Image, Copy and each of the two Fragments.

I try to delete them again using bpexpdate, get back "no entity was found". I try to delete them using nbdelete:

/usr/openv/netbackup/bin/admincmd/nbdelete -backup_id <backup id> -force

Just returns with no output, but they're still there. The debug logs show:

13:03:38.084 [45397] <2> image_db: Q_IMAGE_CHGEXP
13:03:38.096 [45397] <2> Orb::init: initializing ORB EMMlib_Orb with: connect_bpdbm_to_emm -ORBSvcConfDirective "-ORBDottedDecimalAddresses 0" -ORBSvcConfDirective "static PBXIOP_Factory '-enable_keepalive'" -ORBSvcConfDirective "static EndpointSelectorFactory ''" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory PBXIOP_Factory'" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory IIOP_Factory'" -ORBDefaultInitRef '' -ORBSvcConfDirective "static PBXIOP_Evaluator_Factory '-orb EMMlib_Orb'" -ORBSvcConfDirective "static Resource_Factory '-ORBConnectionCacheMax 1024 '" -ORBSvcConf /dev/null -ORBSvcConfDirective "static Server_Strategy_Factory '-ORBMaxRecvGIOPPayloadSize 268435456'"(../Orb.cpp:824)
13:03:38.102 [45397] <2> Orb::init: caching EndpointSelectorFactory(../Orb.cpp:839)
13:03:38.111 [45397] <4> connect_bpdbm_to_emm: succesfully initialized EMM interface
13:03:38.112 [45397] <2> expdate_by_backupid: /usr/openv/netbackup/db/images/web1 is empty or not accessible
13:03:38.112 [45397] <4> expdate_by_backupid: problem modifying backupid web1_1337143940: no entity was found (227)
13:03:38.115 [45397] <2> process_request: request complete: exit status 227 no entity was found; query type: 87

Contrary to the error, the directory is there, there just isn't a date directory or any image files that match the fragment in question, so I'm not sure where the software is getting its knowledge of those from.

I've run "nbdelete -allvolumes -force" repeatedly, they still don't go away. The images in question actually should have gone away a long time ago as they're well past when they should have expired but they didn't then and trying to force it doesn't seem to get rid of them either. I'm currently running "bpdbm -consistency 2 -move" but based on our backup image directory size and the speed it's moving, it's looking like that could take up to 48 hours to run which is a no-go when we have regular backups that need to run. We can of course use direct to tape as a stop gap but does anyone know of any other commands I can look at, or perhaps even manual files I can edit, delete, create fake versions of to let NB feel like it deleted them, etc.?

Thanks!

Hostasaurus2 · ‎09-06-2012

I should add that I am doubtful the bpdbm -consistency 2 scan is going to resolve the issue as it appears to be going in alphabetical order and fragments for servers it would have already come across are still there.

Hostasaurus2 · ‎09-07-2012

Finally found and removed the fragment data; what a pain. I knew a few of the image names were for machines that no longer existed, so using some recursive grepping, I was able to find matches on the name in /usr/openv/db/data/ so I knew the rogue fragments must exist in the NBDB/EMM database. Now I had to start figuring out how you interact with that database.

Found the nbdb_backup command, almost what I was looking for but not quite. That backs up the NB database but to binary files, so you can't use it to change/alter the database and bring the data back in.

Found the nbdb_unload command which is apparently just a NB-specific copy of the SQL Anywhere dbunload command. This one was more useful. I ran:

nbdb_unload -dbn NBDB /path/to/backup

It output a SQL file and data files as individual .dat files for every table in the nbdb database. I was able to use those to figure out where my rogue fragments were living. They were in the EMM_ImageFragment table, which has an owner EMM_MAIN.

Now to figure out how to get them out of there. I spent hours looking at all those db/bin/ commands to see if one would give me an interface like the mysql client where I could simply run a delete or truncate command on that table. I couldn't find anything that would work, or that I could figure out the appropriate arguments for to connect with an interactive shell. I started digging around on a sql anywhere forum and finally figured out the command I want is the dbisqlc, but then I needed to figure out the connection string to use to talk to the NBDB database. Finally I got that figured out and worked it backwards through posts in the Symantec forums.

So the solution was to re-run the nb_dbunoad with just the table I wanted:

nbdb_unload -dbn NBDB -t EMM_MAIN.EMM_ImageFragment /path/to/backup

It output a reload.sql and a 786.dat. The reload.sql had the sql statements to recreate that table, indexes, triggers, etc, and to load the data from the 786.dat file. I verified the only data in that file and that table were the puredisk fragments that no longer existed; i.e. no tape data. If there had been tape data, I could have just pruned the 786.dat of the data that wasn't needed. So basically all I did was edit the reload.sql to have two lines:

TRUNCATE TABLE "EMM_MAIN"."EMM_ImageFragment"
go

Then I ran:

/usr/openv/db/bin/dbisqlc -c "CS=utf8;UID=dba;PWD=nbusql;ENG=NB_backup1;DBN=NBDB" reload.sql

Success; fragments are gone, puredisk disk pool deleted.

Oh yeah, I had to add /usr/openv/db/lib/ to my system's ld.so.conf since the sql anywhere binaries are linked against files in there.

VOX

Can't remove PureDisk fragments, and have tried the normal methods