Forum Discussion

wnatale1's avatar
wnatale1
Level 3
7 years ago
Solved

Storage Lifecycle Policies never run duplications

Hello, we have Master server 7.6.1.2 on RedHat Linux 6.3 and Media server 7.6.0.4 on RedHat Linux 6.5. These versions are past EOL support but are looking for any help digging into the SLP issue while we work on upgrades.

We have defined many Storage Lifecycle Policies to first Backup to an AdvancedDisk (no MSDP), then Duplicate twice to tape with separate Volume Pools and retention. This worked fine in the past but about a week ago we had some backups stuck queued and restarted the master applications. Since then, some SLPs have never run their duplications. How do we force duplications to run or check the status of these SLP?

The SLP are Active and have images "IN_PROCESS" as below:

# nbstlutil stlilist -image_incomplete -U -mediaserver xxxxx
Image xxxxx_1532931165 for Lifecycle PDC_Standard_2 is IN_PROCESS
Image xxxxx_1532932029 for Lifecycle PDC_Standard_2 is IN_PROCESS
Image xxxxx_1532932187 for Lifecycle PDC_Standard_2 is IN_PROCESS
Image xxxxx_1532934683 for Lifecycle PDC_NFS is IN_PROCESS
Image xxxxx_1532965377 for Lifecycle PDCNEW_Standard is IN_PROCESS
Image xxxxx_1532965378 for Lifecycle PDCNEW_Standard is IN_PROCESS
Image xxxxx_1532965379 for Lifecycle PDCNEW_Standard is IN_PROCESS

# nbstlutil report
SLP Name: (state)                                 Number of copies: Size:
PDCNEW_Standard (active)                                  12       44073 MB
PDC_NFS (active)                                          16      424036 MB
PDC_Standard_2 (active)                                   46     9331523 MB

Thank you,
Will

  • Run "nbstlutil list -lifecycle <slp_name> -image_incomplete" command. This will list all backupids that are pending duplciation for that particular SLP. Now pick couple of backupids and search for them in the nbstserv processed logs (which you have already processed).

    This will show you all lines from nbstserv logs for that particular backupid and will provide some information why it was not part of duplication batch.

    Sometimes I have seen such issues get resolved after service restart on Master server.

  • If you have Business Critical Support (BCS) premier contract with Veritas, you will get support for 7.6

    Support Extensions - Covered Products
    https://www.veritas.com/content/support/en_US/business-critical-services/support-extensions-program.html

    Now technical part, NetBackup will not submit duplication jobs that will fail with the known factors. For example if there are issues with tape drives (drive needs cleaning, drives going down etc), media (retention, available media, media owner etc), shared drives also contributes to drives issue. So before you start digging logs these are few thing that you should try.

    1. Check for tape drive and media related error from NetBackup console -> reports -> Media logs, Tape logs

    2. Stop services on media server which are sharing the tape drives (SSO) then start one by one. 

    3. Power cycle library to remove SCSI reservations. You can do it through NetBackup command as well but I will suggest power cycle as SCSI reservation can be set from outside NetBackup as well.

    4. Check Media in volume pool, enough available media, retention level of media (by default only same retenitons will use single media), media owner and scratch pool.

    5. If items listed above are working fine then change the SLP Parameter, Netbackup Admin console -> Host Properties -> Master servers -> Properties -> SLP Parameters -> Tape resource multiplier (default value is 2 change it to 6 or more). Please note that step#5 is overcommiting resoruces and it acts as bandaid. You will need to increase tape resources or reduce backlog.

    If nothing helps then lets into logs.

    Process nbstserv logs for last 24 hours: vxlogview -p nb -i nbstserv -d all -t 24:00:00 

    Look for this keywords and read above and below lines.

    remov (it will take care of words - remove, removing and removed). These lines will give you the reason why duplication job is not getting submitted.

     

     

    • wnatale1's avatar
      wnatale1
      Level 3

      RE: Anshu

      1. No errors

      2. Not a shared tape library, but see #3

      3. I power cycled the tape library and NetBackup server

      4. Plenty of Scratch tapes

      5. There is no backlog or competition for these drives - they are literally doing nothing. I don't believe this would help.

      nbstserv log does not contain "remov" string in the last 24 hours. I will try to query it back further.
      Edit: I queried 500 hours back which predates issues seen with these SLP and there are no "remov" results in the output.

      • Anshu_Pathak's avatar
        Anshu_Pathak
        Level 5

        Run "nbstlutil list -lifecycle <slp_name> -image_incomplete" command. This will list all backupids that are pending duplciation for that particular SLP. Now pick couple of backupids and search for them in the nbstserv processed logs (which you have already processed).

        This will show you all lines from nbstserv logs for that particular backupid and will provide some information why it was not part of duplication batch.

        Sometimes I have seen such issues get resolved after service restart on Master server.

  • Can you try deactivating the SLP and then activating again after half an hour to see if that makes any difference??

    nbstlutil inactive -lifecyle PDC_standard_2

    nbstlutil active -lifecyle PDC_standard_2

     

    • wnatale1's avatar
      wnatale1
      Level 3

      RE: Tape_Archived
      I've done this with no change to the behavior. I've created a brand new SLP, run backups to it, and it will not ever run duplications to tape.