Solved: Data domain- pools getting filled up. backups fail...

Albatross_ · ‎01-27-2016

Hi experts,

Out setup has destination storage as Datadomain (running on OS: 5.5.3.1-509919 Model: DD890 )

we are facing a serious concern on the disk space utilization.

configured high watermark to 95% and low to 85%

Fullbackups are kicked on Friday evenings and some backups fail due to storage full issue.

currently the DD has reached to 96.4% of disk utilization, But I dont see any duplicated images get deleted. Its been almost 48hrs and we dont see any change in available space numbers.

enviroinment :

NB master server 7.7.1

STU - DD 890. ( primary stu / duplications to tape library )

I think once the high water mark is reached, the duplicated images on the disk should be deleted automatically.

In our case I think this is not happening, and could be one of the reasons for failures.

Can some one help me out to solve this issue.

Cheers.

Marianne · ‎01-28-2016

You can make a list of the image-ids with timestamps older than 2 weeks - ( more or less1452816000 and below).

Your management will need to make decision here - if you cancel the SLPs, the normal expiration date will be applied and images on DD will expire without being duplicated to tape.

You can delay the older images by setting them to Inactive. This will give newer images the chance to be submitted.

Newer timestamps can be cancelled and manually duplicated using bpduplicate with -bidfile that contains the imagelist.

Again - care must be taken that images can be duplicated before 2-week expiration is reached.

We cannot make this decision for you, but as long as you have a backlog that is bigger than the amount of backups per day, image expiration and cleanup is not going to happen.

You can only manually duplicate images if they are no longer under SLP control.

The problem with a backlog is that it becomes more and more difficult to catch up.
Every time a duplication is attempted and does not complete successfully, the retry interval is pushed out further and further.
SLP will always run oldest outstanding jobs before newer ones.

The only way around this is to cancel older jobs or set them to inactive....

PLEASE PLEASE PLEASE read through the Best Practice Guide....

Handy NetBackup Links

View solution in original post

Marianne · ‎01-27-2016

How is DD and duplication configured in NBU? Basic disk with staging to tape? Or OST with backup and duplication controlled by SLP? If SLP, then expiration is 100 % according to retention set in SLP. Cleanup will take place according to DD maintenance cycle.

Handy NetBackup Links

sdo · ‎01-27-2016

1) First prove that the DD is working operating normally/correctly.

2) Then prove that the DD is indeed expunging from disk the backups that NetBackup thinks that it has expired.

3) Check your retentions, and SLPs.

4) Check NetBackup logs to prove that the DD is being informed of 'expired' images which can DD can now expunge.

5) Check NetBackup logs to prove the image clean-up is working correctly.

.

Start with step 1 above. There are posts in this forum on how to check that old DD data is expunging. Search the forum for the answer, or wait for someone to tell you. Sorry, but I don't know much about DD.

sdo · ‎01-27-2016

Check this... and I hope you haven't got a NOexpire file too... surely not?

https://www.veritas.com/community/forums/images-are-not-deleting-disk-even-after-expiration

Albatross_ · ‎01-27-2016

HI Marrianne,

Yes with OST and duplication controlled by SLP

Marianne · ‎01-27-2016

OST and SLP - then my previous post stands. HWM and LWM do not work with SLP and fixed retention. Check backup retention in SLPs. Verify that tape duplications are successful and that SLP backlog is within limits. Without successful duplication, the retention level for disk backups remain infinity. Then perform the checks as suggested by sdo - image cleanup jobs at regular intervals, bpdm logs on media servers that will show nbdelete. This is the command that sends the delete instruction to DD. Then over to DD to check maintenance/cleanup job. This normally runs once a week on a Tuesday morning. In very busy environments this is a problem. I have seen in a very busy environment where cleanup on the DD ran further and further behind, leaving large amounts of orphaned images on the DD. Needless to say, when it was time to buy more storage, they bought dedupe appliance from another vendor...

Handy NetBackup Links

Nicolai · ‎01-27-2016

I agree with above statement - further more. What is the average deduplication ratio on your Data Domain ?

If you backup compressed or encrypted data to a Data Domain the space availability will take a serious plunge.

Finding bad dedup data is a cat after the mouse chase. You never know when someone suddenly place large amount of e.g video files, compressed SQL dumps on a file system you protect.

Albatross_ · ‎01-28-2016

Hi sdo,

Yes I found there is a file named witn 'NOexpire-052713' under /usr/openv/netbackup/bin.

Please suggest me what to do, I am new to this backup env. I dont have any idea how and why they have configured NB env.

Albatross_ · ‎01-28-2016

Hi Marriane,

Check backup retention in SLPs.

The backup retention policy is 2 weeks, and dulication for 7Yrs to tape, Threre are two SLP with these retention periods.we have another couple of SLP with out duplication where the Backup retention is again two weeks.

Verify that tape duplications are successful and that SLP backlog is within limits

How do I check the duplications ? I have checked in activity monitor I see there are some duplication jobs running.

Then perform the checks as suggested by sdo :

Thre are image clean up jons running after every successful backups and I see few bunch of image cleanup jobs in the activty monitor.

I could not find any nbdelete in the bpdm logs all the media servers.

Yes DD clean up job is running as scheduled, But again since its filling up we are cleaning it manually.

sdo · ‎01-28-2016

The presence of a file named:

/usr/openv/netbackup/bin/NOexpire-052713

...should not affect you, but maybe NetBackup does a partial match on file names.

Personally, I would remove that file. It looks someone renamed the file several years ago.

.

The way to check whether NetBackup "image cleanup" is requesting expirations is to check activity monitor and look at an "image cleanup" job. Do you see some counts of expired images? If so, then NetBackup is expiring images. So, move on to the next check in the list.

Marianne · ‎01-28-2016

Check bpdm logs on media server after completion of Image Cleanup job.
Please share one such bpdm log for a media server who should have expired images (as per Details of Image Cleanup job).
Please copy log to bpdm.txt and upload as File Attachment.

About checking SLP backlog:

Download the SLP Best Practice doc from the Download link in this TN: http://www.veritas.com/docs/000018455

Look for this section and read through the relevant pages:

Avoid increasing backlog
Monitoring SLP progress and backlog growth

Use 'nbstlutil stlilist -image_incomplete -U' at least once a day to verify that NOT_STARTED images are not older that 24 hours and not increasing.

You may want to ask EMC to re-do a sizing exercize to verify that the amount and type of data being backed up can be fit onto the DD and kept for at least 2 weeks.

Also ask EMC for ways to identify data with poor dedupe rates as well as orphaned images.

Handy NetBackup Links

Albatross_ · ‎01-28-2016

HI Marrianne,

i have executed the command nbstlutil stlilist -image_incomplete -U, there are around 6000 entries

Copy to NYC_TLD1_LTO6 of type DUPLICATE is NOT_STARTED
Copy to NYC_TLD1_LTO6 of type DUPLICATE is NOT_STARTED

is that mean, there are around 6000 jobs to be duplicated,

Let me know how to tackle this

Marianne · ‎01-28-2016

Please go through the SLP Best Practice Guide.

How many backup jobs per day?

1st thing to check is that you have sufficient tape drives and media and that you have suffient bandwidth between DD and media servers (10Gb should be minimum requirement...)

You (and your management) will need to make a decision about the backlog - old outstanding duplications will prevent newer backups to be duplicated.

Images not duplicated cannot be expired.

Handy NetBackup Links

Albatross_ · ‎01-28-2016

Hi Marrianne,

Around 1500 + backup jobs are running per day.

we have configured SLP ( backups to DD pool full backups and duplication to tapes ).

we have around 13 Tape drives and more than enough tapes.

I believe we also have sufficient bandwidth between DD and media servers.

Is ther anyway to duplicate the older images ( using some kind of script ).

I am new to this project and there is no enough documentation and ironic part is the guy who worked and configured NB is out of the project

I am really worried,

Marianne · ‎01-28-2016