cancel
Showing results for 
Search instead for 
Did you mean: 

Images expired by NBU are not deleted by Data Domain for some reason

montecito
Level 3

Data Domain has become full now. There are lot of images that have been expired by NBU however these images still appear on Data Domain . Could some one please let me know how to delete these images manually from NBU and any guess why it might not have been deleted by data domain.

I can provide more information if required. Please request for logs if required.
11 REPLIES 11

Anton_Panyushki
Level 6
Certified

I think it is quite normal that DD tries its best to save images as long as possible and eventually becomes nearly 100% full all the time.


What happens to those stale images if NetBackup wants to write new images on DD appliance. Are they overwritted?

NetBackup DSSU's work in the same manner. Check if there is an equivalent of  "high watermark" for DD box.

montecito
Level 3

well it wont let netbackup overwrite . SQL backups would start failing after DD is about 93 % itself.  I have already contacted DD support and they say that DD doesnt have the intelligence to decice on its own what to delete and what not to delete .

It depends on NBU for this task to be done . They were quiet sure about it . We also logged a call with symantec but they were of the same view that its not a NBU issue.


Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Which version of NBU? O/S of backup server connected to DD?

Can you see that Image Cleanup jobs are running regularly? (default is every 12 hours)

Check bpdm log on media server for cleanup attempts.

Also have a look at the solution in this post:

https://www-secure.symantec.com/connect/forums/cant-delete-disk-pool




Run this:

/usr/openv/netbackup/bin/admincmd/nbdelete -allvolumes -force 

its a safe command. It just deletes the fragments that are already expired:

From man pages:

The nbdelete command removes all deleted fragments from  the disk volumes  that  are  specified on the command line. The  -allvolumes option removes the fragments  from  all  volume that contain deleted fragments. 

RonCaplinger
Level 6

If you are manually expiring images in the Catalog interface, watch the Activity Monitor after deleting an image and you should see an "Image Cleanup" job appear. 

Similarly, a scheduled "Image Cleanup" job should be running automatically in the Activity Monitor to automatically expire old images from the catalog, based on the images' expiration date.  This is set in the "Host Properties" for the master server.

With Data Domains, all the above processes do is tell the Data Domain what files to delete.  It doesn't erase any files from the disk immediately, like you might expect.

There is a separate process on the Data Domain that should already be automatcially scheduled to run.  Log onto your Data Domain using SSH and enter "filesys clean show schedule".  This will show how often the Data Domain's automatic cleanup process will run.  If you want to start the cleaning process right now, enter "filesys clean start".  Note that this may take anywhere from 5 to 23 hours to run, depending on the Data Domain model, number of images stored, amount of space on it, etc.  You can check on its progress using "filesys clean status"; there are 10 phases to the cleaning process, and some run very fast and some run very slow, but once it is in phase 10, you should start to see some space free up for more backups.

The automatic cleaning process's default schedule is once a week, but I find that to be too long and have set it to twice a week.  In some cases, I clear off a bunch of space in the Catalog interface in NetBackup, wait for the Image Cleanup job to finish in the Activity Monitor, then manually start the Data Domain cleaning process.

montecito
Level 3

My APologies for not updating this thread for sometimg. Here is what data domain suggested and error log from bdmp log :

=========================================================

After doing some additional research on your case, I am seeing some error messages in these logs about fragmentation.  This is from the bpdbm.log file.  This particular error message <16> indicates that there is a problem with this particular function - which is - creating an image copy - copy already exists - adding fragments.

04:33:46.460 [6488.6688] <16> emmlib_CreateImageCopy: (0) Copy already exists. adding fragmnets


I have found some information pertaining to cleaning up fragments from your system - via the NBU command.  

/usr/openv/netbackup/bin/admincmd/nbdelete -allvolumes -force

its a safe command. It just deletes the fragments that are already expired:

======================

 

We have run the nbudelete command and there are improvements but still there are expired images not being deleted.

 

ron, We have default schedule of cleaning that runs. We also run one manual session after the default one and if needed set the throttle to 75 % depending on how full the DD is .

 

I have another question. There are lots of 1 Kb file generated while creating image file on Data domain. Is it possible to change this default behaviour. Because DD support is saying that this is hampering the compression alot.

 

Please suggest if there is something else that can be done apart from nbudelete. Could there be a case that NBU is just not marking the images to be cleaned properly ?

 

Is there some sort of patch available for NBU to work more effectively with DD ??

montecito
Level 3

Details of the environment :

 

NBU 6.5.6

WIndows 2003

Data Domain dd690 fw 4.7.4

vrtseman
Level 3

Check nbcatsync available with 6.5.6.  The tool reconciles media ID records within the image fragments with the Data Domain disk pool media id.  Its primary purpose is for use with partial DR recovery situations where creating a new DD disk pool does not match the media ID of the original disk pool...  

In any case, one of the things it also does is to remove orphaned images (images that don't exist in catalog) from the DD.  Of course this is not a fix just a workaround... Would suggest contacting support though before using this command.

btw, does nbdelete complete successfully?

montecito
Level 3

Yes, Nbdelete completes successfully. Well we contacted symmantec support along with DD support and as usual nothing's wrong theory with either one of them so basically we are trying everything we can from our end.

We have upgraded the firmware of DD to 4.7.4 and in another few weeks will take it to 4.9.2 hopefully that would help because I have noticed now that this week DD only reached 83 % on weekend compare to 93 % usual. Not sure NBdelete helped or firmware.

Will get backup guys to check bmdp logs and will update this.

 

We do not use DD as VTL so not sure ifnbcatsync would help.

vrtseman
Level 3

nbcatsync is for disk media id.  when you configure a disk pool netbackup assigns a disk media ID in the form of @XXXXX.  this media id is recorded in the image header.  anyways, it may just be easier to nfs mount the OST lsu (/backup/ost/<lsu_name>) and manually delete non-existing images in netbackup using a script or something...  SYMC tech support should be able to help you cleaning the DD though.

cheers,

-v

Ray_Esperanzate
Level 4

Are you using SLP's by any chance ?  We had a similar issue and it was due to the SLP's not completing properly.  We have a very short retention period for all our backups (mostly 35 days), so when we dumped the SLP worklist we found that there were a ton of images that should've been expired, but were not due to them being marked "complete" in the SLP.  When that happens, the image basically stays in limbo (per Symantec support).  We had to go in manually and force cancel the SLP processing on those images.  Once that was done, the images expired, image cleanup (netbackup) happened, then DD cleanup.

 

As stated previously:

1) image expires in nbu

2) image cleanup process runs in netbackup which tells the datadomain that a file can be deleted. 

To see this in action, go to the datadomain and run "filesys show space".  Note down the "Cleanable GiB" number.

On Netbackup, run bpimage -cleanup -allclients

While that runs, do another "filesys show space" on the datadomain and you should see the Cleanable GiB number is increasing.

3) Datadomain cleanup (filesys clean start) will take the files marked to be deleted and actually free up the space.  (this clears out the Cleanable GiB number i talked about in the previous step)

 

----------------------

The other part of your question in regards to the number of small files making the datadomain inefficient.  We ran into this issue also, but because of the sheer number of small images we have (we do hourly transaction log backups on ALOT of SQL servers).   We started seeing weirdness like image cleanups taking days to run as well as the Datadomain devices not responding properly.  We were constantly at 100% nfsproc on the Datadomain.   It turned out we had over 1 million files in our LSU.  And when netbackup went on its image cleanup process, the datadomain was constantly searching that huge directory for files to delete.   Long story short, we were advised to create multiple LSUs (then diskpools on the nbu side) on a single datadomain to get around that.   It's against the best practices that they publish (which i think is BS because all it does is mess up the space reporting back to nbu) , but it did the trick for us.

bectaylor76
Level 3

Hello Ray,

 

Can you tell me how to dump the SLP worklist? 

 


Thanks,

Bec