Forum Discussion

Albatross_'s avatar
Albatross_
Level 5
9 years ago

Data domain- pools getting filled up. backups failing

Hi experts,

Out setup has destination storage as Datadomain (running on OS: 5.5.3.1-509919 Model: DD890 ) 

we are facing a serious concern on the disk space utilization.

configured high watermark to 95% and low to 85%

Fullbackups are kicked on Friday evenings and some backups fail due to storage full issue.

currently the DD has reached to 96.4% of disk utilization, But I dont see any duplicated images get deleted. Its been almost 48hrs and we dont see any change in available space numbers.

 

enviroinment :

NB master server 7.7.1

STU - DD 890. ( primary stu / duplications to tape library )

I think once the high water mark is reached, the duplicated images on the disk should be deleted automatically. 

In our case I think this is not happening, and could be one of the reasons for failures.

Can some one help me out to solve this issue.

 

 

Cheers.

 

  • You can make a list of the image-ids with timestamps older than 2 weeks - ( more or less1452816000 and below).

    Your management will need to make decision here - if you cancel the SLPs, the normal expiration date will be applied and images on DD will expire without being duplicated to tape.

    You can delay the older images by setting them to Inactive. This will give newer images the chance to be submitted.

    Newer timestamps can be cancelled and manually duplicated using bpduplicate with -bidfile that contains the imagelist.

    Again - care must be taken that images can be duplicated before 2-week expiration is reached.

    We cannot make this decision for you, but as long as you have a backlog that is bigger than the amount of backups per day, image expiration and cleanup is not going to happen.

    You can only manually duplicate images if they are no longer under SLP control.

    The problem with a backlog is that it becomes more and more difficult to catch up.
    Every time a duplication is attempted and does not complete successfully, the retry interval is pushed out further and further.
    SLP will always run oldest outstanding jobs before newer ones.

    The only way around this is to cancel older jobs or set them to inactive....

    PLEASE PLEASE PLEASE read through the Best Practice Guide....

  • HI Marrianne,

    i have executed the command nbstlutil stlilist -image_incomplete -U, there are around 6000 entries 

    Copy to NYC_TLD1_LTO6 of type DUPLICATE is NOT_STARTED
    Copy to NYC_TLD1_LTO6 of type DUPLICATE is NOT_STARTED

    is that mean, there are around 6000 jobs to be duplicated,

    Let me know how to tackle this

     

     

  • Please go through the SLP Best Practice Guide.

    How many backup jobs per day?

    1st thing to check is that you have sufficient tape drives and media and that you have suffient bandwidth between DD and media servers (10Gb should be minimum requirement...)

    You (and your management) will need to make a decision about the backlog  - old outstanding duplications will prevent newer backups to be duplicated.

    Images not duplicated cannot be expired.

     

  • Hi Marrianne,

    Around 1500 + backup jobs are running per day.

    we have configured SLP ( backups to DD pool full backups and duplication to tapes ).

    we have around 13 Tape drives and more than enough tapes.

    I believe we also have sufficient bandwidth between DD and media servers.

    Is ther anyway to duplicate the older images ( using some kind of script ).

    I am new to this project and there is no enough documentation and ironic part is the guy who worked and configured NB is out of the project 

    I am really worried, sad

     

     

  • You can make a list of the image-ids with timestamps older than 2 weeks - ( more or less1452816000 and below).

    Your management will need to make decision here - if you cancel the SLPs, the normal expiration date will be applied and images on DD will expire without being duplicated to tape.

    You can delay the older images by setting them to Inactive. This will give newer images the chance to be submitted.

    Newer timestamps can be cancelled and manually duplicated using bpduplicate with -bidfile that contains the imagelist.

    Again - care must be taken that images can be duplicated before 2-week expiration is reached.

    We cannot make this decision for you, but as long as you have a backlog that is bigger than the amount of backups per day, image expiration and cleanup is not going to happen.

    You can only manually duplicate images if they are no longer under SLP control.

    The problem with a backlog is that it becomes more and more difficult to catch up.
    Every time a duplication is attempted and does not complete successfully, the retry interval is pushed out further and further.
    SLP will always run oldest outstanding jobs before newer ones.

    The only way around this is to cancel older jobs or set them to inactive....

    PLEASE PLEASE PLEASE read through the Best Practice Guide....

  • Thanks a lot Marianne,

    I will check with my Manager and get back at the earliest.

     

    Thanks

  • What is the duplication speed to tape  ?

    Data domain has something called "Locality". Locality this is how data domain place data on its file systems, with good locality -  restore/duplication speed is good, with bad locality restore/duplication speed can drop down to 50MB/sec. bad locality is usually seen on data with high change rate.

    Using a 50G fragment size on tape (take a look in the storage unit configuration):

    6:30 per fragment = 136MB/sec

    10:00 per fragment = 83MB/sec

    20:00 per fragment = 40MB/sec

    40:00 per fragment = 20MB/sec.

    From the activity monitor job - type duplication. Speed = 136MB/sec becuase 50GB was duplicated in 6 minutes and 30 seconds.

    02/02/2016 08:00:00 - begin reading
    02/02/2016 08:01:45 - Info bptm (pid=27316) waited for full buffer 202 times, delayed 344 times
    02/02/2016 08:06:30 - end reading; read time: 0:06:30

    Also check on media server that SIZE_DATA_BUFFERS/NUMBER_DATA_BUFFERS is configured correctly. SIZE_DATA_BUFFETRS should 262144 and NUMBER_DATA_BUFFERS should be 256 or higher.

    http://www.veritas.com/docs/000016306

    http://www.veritas.com/docs/000004792