cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup lifecycle duplcation didn't run

AlGon
Level 6

Hello,

Netbackup 7.1.0.4

I’m hoping someone can help with this one.

We use storage lifecycle policies to handle the duplication of images from our VTL to several physical tape libraries.  The backup images are retained on the VTL for a few weeks and the image tape copy is retained for several years\weeks\months, depending on the when the backup is taken.

Recently I needed to restore a file from a month in the past. I discovered that a number of weeks were missing from the restore selection.  This was because the VTL images had already expired and there was no tape copy available. 

In short, what seemed to be happening was the backup to VTL was working OK but the duplication to tape just hadn’t occurred.

When I check our backup logs for failed backups, everything looks OK.  There is nothing to tell me that a lifecycle didn’t initiate the duplication.  Also, because a large amount of tapes get ejected each day, it makes it a little trickier to tell if a lifecycle didn’t run its duplication.

In the end the problem was because of a corrupt policy, and to fix the issue I just created a new policy.  It’s now writing to tape.

We want to make sure this hasn’t happened to any other servers.

Is there a way I can tell if a server has written its second copy to tape, basically, confirm the lifecycle performed its duplication?  Maybe a script I can run or possibly something in OpsCenter?  I just need to make sure the duplication to tape for a given server actually occurred.

Hope this makes sense.

Many thanks,

12 REPLIES 12

AntBar
Level 5

Hi,

I'm not sure, but did you try nbstlutil.exe for listing image affected by SLP ?

AlGon
Level 6

Hi,

Thanks for the reply.

nbstlutil will only tell me lifecycles the are either running or due to run. 

What I need is, a way to find out which duplications job didn't actually run.  I could use the NBU catalog GUI, input each sever and check if copy 2 exists.

Of course this would take ages, but this is essentially what I need.

mph999
Level 6
Employee Accredited

nbstlutil list -image_incomplete 

nbstlutil stlilist -image_incomplete

Martin

watsons
Level 6

So you're saying that duplication did happen in the past, but not sure if it really made it to the tape copy?

You can have a daily script run to check if those images meant for duplication are really having a 2nd copy - and if their expiration date are maintained properly as it should be.

bpimagelist -backupid <backupID> -U 

From the output, extract copy number and its mediaID with expiration date. Keep them in your report and run random restore to confirm the tape copy can do its work.

An important reminder is to check if your SLP duplication setting change the retention period of your backup copy - it should be at least longer than the disk copy, if not the same.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Did you remember to change STU in all policies to use SLP instead?

If STU specified in any schedules you need to change it there too.

SLP Best Practice Guide tells you how to use nbstlutil command to keep an eye on duplications to ensure that you sufficient resources and that duplications are actually going through. It will also report on NOT_STARTED image duplications.

You should also see duplication jobs on a regular basis in Activity Monitor. Is this your experience?

Genericus
Moderator
Moderator
   VIP   

I have aliases on my solaris master to check these as well as to modify the parameters to speed up or slow down the duplication process:

 

The default time is 24 hours - you have to specfiy if you want to check longer...

 

This checks incomplete SLP:

slpck='bpimagelist -L -idonly -hoursago 120 -stl_incomplete | sort +2Mr +3nr +4r'
 

This gives me a count:

slpcksum='date;bpimagelist -L -idonly -hoursago 120 -stl_incomplete|wc'
 

This copies the LONG lifecycle parameter - large size and check only once every 6 hours

slplong=

'cp -p  /usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS.LONG

/usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS'


 

This copies the SHORT lifecycle parameter - regular size and check every 20 minutes

slpshort='cp -p /usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS.SHORT

/usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS'
 

This copies the SHORT lifecycle parameter - small size and check every 5 minutes, and MAX_MINUTES_TIL_FORCE_SMALL_DUPLICATION_JOB set to 2

slpsmall='cp -p /usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS.SMALL

/usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS'
 

These activate any inactive SLP:

slpgo=/var/adm/scripts/slpjob.activate.ksh

for i in `/usr/openv/netbackup/bin/admincmd/bpimagelist -L -idonly -hoursago 120 -stl_incomplete | sort +2M +3n +4 | /usr/bin/cut -d" " -f10`
do
 DATE=`date +%Y%m%d.%H%M`
 echo "Now activating "$i "at "$DATE
 /usr/openv/netbackup/bin/admincmd/nbstlutil active  -wait -backupid $i
 echo "------------------------------------------------"
done
 

This restarts the SLP scan for jobs to run:

slpscan='date; /usr/openv/netbackup/bin/admincmd/nbstlutil new_session'

 

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

AlGon
Level 6

 

 
Hi Marianne
 
Yes, absolutely.  This has only happened with one server (that we know of) and that was in a policy of its own.  In the end I just recreated the same policy, set it to the same SLP, and it’s been duplicating fine ever since.  Really bizarre.
 
All our policies use SLPs so we always see SLP duplication jobs in the activity monitor.  
Because this happened to one server, I've been asked to make sure this isn't happening to others. 
 
I wanted to know if there might be an easy way to find out if all our servers in policies are definitely running the duplication part of the lifecycle.  I'm sure it’s a one off, but I want to be sure.
 
Thanks,
 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Are you checking 'nbstlutil stlilist -image_incomplete -U' output on a regular basis?

Any chance this SLP was modified while outstanding duplications existed?

AlGon
Level 6

 

Are you checking 'nbstlutil stlilist -image_incomplete -U' output on a regular basis?
 
Yes, we have to in order to stay in top of things.
 
Any chance this SLP was modified while outstanding duplications existed?
 
Hmm, this is quite possible :(
 
Is there a way I can find out of all the recently run backup jobs, say a couple of days old, have made second backup copy?  This way, if they have, It means that duplication jobs have run for all these servers.
 
Thanks,
 
Al

 

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Just remembered something else - SLP guarantees successful duplication by assigning INFINITY retention to backup copy until such time as duplicate has completed successfully.

The fact that your VTL copy has expired is a clear indication that backup was not done with SLP as destination.

See this topic in chapter 14 of NBU Admin Guide I:

About ensuring successful copies using lifecycles

In a storage lifecycle policy, all copies must be completed. A lifecycle initially tries
three times to create a copy. If no copy is created, NetBackup continues to try, but
less frequently.
The successful completion of copies is important because a lifecycle does not
allow a copy to be expired before all copies are completed to each destination in
the lifecycle. Expiration is necessary to free up space on the storage unit for new
backups. NetBackup changes the retention period of an image to Infinity until all
copies are created. After all copies are complete, the retention returns to the level
as set in the policy that writes to the storage destination.

 

AlGon
Level 6

Yes Marianne, good point, this helps clarify things a little more.

For my managers piece of mind, he's still asking if I can verify all the servers have a 'copy two' from a recent backup.

Again, thanks for all your help with this.

GlenG
Level 4

AIGon,

 

I do not use Life-Cycles but I use the following bash script code in a much larger script to get details on each "copy".
 
Here's an example of the contents of the file $NBU_preview_out:
 
07/03/2012 19:01:12 PolicyName Incr host_1341360072 master "/stage2fs" 1 1
07/03/2012 19:01:12 PolicyName Incr host_1341360072 master 4245L4 2 0
06/26/2012 22:16:10 PolicyName Incr host_1340766970 master 4204L4 2 1
06/22/2012 19:00:37 PolicyName Weekly host_1340409637 master 4234L4 2 1 4216L4 4010L4 4017L4 4175L4
 
The "1 1", "2 0", "2 1" are first digit - copy number, second digit 1=primary, 0=copy.  
/stage2fs is the stagging file and the nnnnL4 are tape numbers containing the backup.
 
###########################################################################
#
# preview all the backups to get info for report
#
# input  array POLICYTOCOPY
# output files $NBU_preview_out and $NBU_err
###########################################################################
previewallbackups () {               # get info on the requested policys' current backups in the catalog
 
printf "%-s\n" "    Policy                    Status"
  for i in "${POLICYTOCOPY[@]}"; do
    ES=0                             # work variable for command "exit status"
    CN=1                             # start with copy number 1
    while [ $ES == 0 ] ; do          # get all backup "copies" in the catalog
      /opt/openv/netbackup/bin/admincmd/bpduplicate -PM -cn $CN -s 01/01/05 -policy $i >>$NBU_preview_out 2>>$NBU_err
      ES=$?
#     if [ ${CN} -eq 1 ]             # report on status for copy number 1 only
#       then
          if [ ${ES} -eq 0 ]
            then
              statusmsg="backup image found"
            else
              statusmsg="backup image not found - bpduplicate exit status=$ES"
          fi
          printf "%-25s %-s\n" ${i} "${statusmsg}"
#     fi
      let CN=CN+1                    # point to next copy
    done                             # keep going until not found
  done                               # do all policies
 
echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> previewallbackups <<<<<<<<<<<<<<<<<<<<<<<<<<<<"
cat $NBU_preview_out
echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> previewallbackups errors        <<<<<<<<<<<<<<"
cat $$NBU_err
}
 
 
HTH,
GlenG