cancel
Showing results for 
Search instead for 
Did you mean: 

Storage Lifecycle Policies never run duplications

wnatale1
Level 3

Hello, we have Master server 7.6.1.2 on RedHat Linux 6.3 and Media server 7.6.0.4 on RedHat Linux 6.5. These versions are past EOL support but are looking for any help digging into the SLP issue while we work on upgrades.

We have defined many Storage Lifecycle Policies to first Backup to an AdvancedDisk (no MSDP), then Duplicate twice to tape with separate Volume Pools and retention. This worked fine in the past but about a week ago we had some backups stuck queued and restarted the master applications. Since then, some SLPs have never run their duplications. How do we force duplications to run or check the status of these SLP?

The SLP are Active and have images "IN_PROCESS" as below:

# nbstlutil stlilist -image_incomplete -U -mediaserver xxxxx
Image xxxxx_1532931165 for Lifecycle PDC_Standard_2 is IN_PROCESS
Image xxxxx_1532932029 for Lifecycle PDC_Standard_2 is IN_PROCESS
Image xxxxx_1532932187 for Lifecycle PDC_Standard_2 is IN_PROCESS
Image xxxxx_1532934683 for Lifecycle PDC_NFS is IN_PROCESS
Image xxxxx_1532965377 for Lifecycle PDCNEW_Standard is IN_PROCESS
Image xxxxx_1532965378 for Lifecycle PDCNEW_Standard is IN_PROCESS
Image xxxxx_1532965379 for Lifecycle PDCNEW_Standard is IN_PROCESS

# nbstlutil report
SLP Name: (state)                                 Number of copies: Size:
PDCNEW_Standard (active)                                  12       44073 MB
PDC_NFS (active)                                          16      424036 MB
PDC_Standard_2 (active)                                   46     9331523 MB

Thank you,
Will

1 ACCEPTED SOLUTION

Accepted Solutions

Run "nbstlutil list -lifecycle <slp_name> -image_incomplete" command. This will list all backupids that are pending duplciation for that particular SLP. Now pick couple of backupids and search for them in the nbstserv processed logs (which you have already processed).

This will show you all lines from nbstserv logs for that particular backupid and will provide some information why it was not part of duplication batch.

Sometimes I have seen such issues get resolved after service restart on Master server.

View solution in original post

9 REPLIES 9

Anshu_Pathak
Level 5

If you have Business Critical Support (BCS) premier contract with Veritas, you will get support for 7.6

Support Extensions - Covered Products
https://www.veritas.com/content/support/en_US/business-critical-services/support-extensions-program....

Now technical part, NetBackup will not submit duplication jobs that will fail with the known factors. For example if there are issues with tape drives (drive needs cleaning, drives going down etc), media (retention, available media, media owner etc), shared drives also contributes to drives issue. So before you start digging logs these are few thing that you should try.

1. Check for tape drive and media related error from NetBackup console -> reports -> Media logs, Tape logs

2. Stop services on media server which are sharing the tape drives (SSO) then start one by one. 

3. Power cycle library to remove SCSI reservations. You can do it through NetBackup command as well but I will suggest power cycle as SCSI reservation can be set from outside NetBackup as well.

4. Check Media in volume pool, enough available media, retention level of media (by default only same retenitons will use single media), media owner and scratch pool.

5. If items listed above are working fine then change the SLP Parameter, Netbackup Admin console -> Host Properties -> Master servers -> Properties -> SLP Parameters -> Tape resource multiplier (default value is 2 change it to 6 or more). Please note that step#5 is overcommiting resoruces and it acts as bandaid. You will need to increase tape resources or reduce backlog.

If nothing helps then lets into logs.

Process nbstserv logs for last 24 hours: vxlogview -p nb -i nbstserv -d all -t 24:00:00 

Look for this keywords and read above and below lines.

remov (it will take care of words - remove, removing and removed). These lines will give you the reason why duplication job is not getting submitted.

 

 

Tape_Archived
Moderator
Moderator
   VIP   

Can you try deactivating the SLP and then activating again after half an hour to see if that makes any difference??

nbstlutil inactive -lifecyle PDC_standard_2

nbstlutil active -lifecyle PDC_standard_2

 

RE: Tape_Archived
I've done this with no change to the behavior. I've created a brand new SLP, run backups to it, and it will not ever run duplications to tape.

RE: Anshu

1. No errors

2. Not a shared tape library, but see #3

3. I power cycled the tape library and NetBackup server

4. Plenty of Scratch tapes

5. There is no backlog or competition for these drives - they are literally doing nothing. I don't believe this would help.

nbstserv log does not contain "remov" string in the last 24 hours. I will try to query it back further.
Edit: I queried 500 hours back which predates issues seen with these SLP and there are no "remov" results in the output.

Run "nbstlutil list -lifecycle <slp_name> -image_incomplete" command. This will list all backupids that are pending duplciation for that particular SLP. Now pick couple of backupids and search for them in the nbstserv processed logs (which you have already processed).

This will show you all lines from nbstserv logs for that particular backupid and will provide some information why it was not part of duplication batch.

Sometimes I have seen such issues get resolved after service restart on Master server.

Searching for the backup id in the vxlog led to searching for the batch numbers and this result:
Removing batch 34535. Submitting this batch would exceed STU job limit

I don't think there are any limits in place but I increased the "Tape resource multiplier" and "Disk resource multiplier" SLP settings and the SLP pretty much immediately ran, so the immediate issue is resolved at least. Thank you Anshu!

@wnatale1

Tape/Disk resource multiplier solves lot of such issues. I forgot to mention that do a case insensitive seach for word remov, we have caught it on Tuesday iteself.

Removing batch 34535. Submitting this batch would exceed STU job limit

mph999
Level 6
Employee Accredited

He he ... awlays use grep -i .....  ;0)

This nce thing about nbstserv log is that you can search backupids, and this should get you near the relevant lines.

Once files are added to the batch, or BID file, you can then look for 'nbduplicate' and see if the command is constructed, and then the admin log should show the bpduplication command being 'run' alone with the batch file name.

I believe I did originally grep -i for remov, but the main thing missing there was logging verbosity:

vxlogcfg -a -p 51216 -o 226 -s DebugLevel=6 -s DiagnosticLevel=6

Once this was added, we had many more useful results.