06-26-2012 03:03 AM
Hello,
Netbackup 7.1.0.4
I’m hoping someone can help with this one.
We use storage lifecycle policies to handle the duplication of images from our VTL to several physical tape libraries. The backup images are retained on the VTL for a few weeks and the image tape copy is retained for several years\weeks\months, depending on the when the backup is taken.
Recently I needed to restore a file from a month in the past. I discovered that a number of weeks were missing from the restore selection. This was because the VTL images had already expired and there was no tape copy available.
In short, what seemed to be happening was the backup to VTL was working OK but the duplication to tape just hadn’t occurred.
When I check our backup logs for failed backups, everything looks OK. There is nothing to tell me that a lifecycle didn’t initiate the duplication. Also, because a large amount of tapes get ejected each day, it makes it a little trickier to tell if a lifecycle didn’t run its duplication.
In the end the problem was because of a corrupt policy, and to fix the issue I just created a new policy. It’s now writing to tape.
We want to make sure this hasn’t happened to any other servers.
Is there a way I can tell if a server has written its second copy to tape, basically, confirm the lifecycle performed its duplication? Maybe a script I can run or possibly something in OpsCenter? I just need to make sure the duplication to tape for a given server actually occurred.
Hope this makes sense.
Many thanks,
06-26-2012 03:32 AM
Hi,
I'm not sure, but did you try nbstlutil.exe for listing image affected by SLP ?
06-26-2012 03:43 AM
Hi,
Thanks for the reply.
nbstlutil will only tell me lifecycles the are either running or due to run.
What I need is, a way to find out which duplications job didn't actually run. I could use the NBU catalog GUI, input each sever and check if copy 2 exists.
Of course this would take ages, but this is essentially what I need.
06-26-2012 04:13 AM
nbstlutil list -image_incomplete
nbstlutil stlilist -image_incomplete
Martin
06-26-2012 04:15 AM
So you're saying that duplication did happen in the past, but not sure if it really made it to the tape copy?
You can have a daily script run to check if those images meant for duplication are really having a 2nd copy - and if their expiration date are maintained properly as it should be.
bpimagelist -backupid <backupID> -U
From the output, extract copy number and its mediaID with expiration date. Keep them in your report and run random restore to confirm the tape copy can do its work.
An important reminder is to check if your SLP duplication setting change the retention period of your backup copy - it should be at least longer than the disk copy, if not the same.
06-26-2012 05:02 AM
Did you remember to change STU in all policies to use SLP instead?
If STU specified in any schedules you need to change it there too.
SLP Best Practice Guide tells you how to use nbstlutil command to keep an eye on duplications to ensure that you sufficient resources and that duplications are actually going through. It will also report on NOT_STARTED image duplications.
You should also see duplication jobs on a regular basis in Activity Monitor. Is this your experience?
06-26-2012 12:15 PM
I have aliases on my solaris master to check these as well as to modify the parameters to speed up or slow down the duplication process:
The default time is 24 hours - you have to specfiy if you want to check longer...
This checks incomplete SLP:
slpck='bpimagelist -L -idonly -hoursago 120 -stl_incomplete | sort +2Mr +3nr +4r'
This gives me a count:
slpcksum='date;bpimagelist -L -idonly -hoursago 120 -stl_incomplete|wc'
This copies the LONG lifecycle parameter - large size and check only once every 6 hours
slplong=
'cp -p /usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS.LONG
/usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS'
This copies the SHORT lifecycle parameter - regular size and check every 20 minutes
slpshort='cp -p /usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS.SHORT
/usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS'
This copies the SHORT lifecycle parameter - small size and check every 5 minutes, and MAX_MINUTES_TIL_FORCE_SMALL_DUPLICATION_JOB set to 2
slpsmall='cp -p /usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS.SMALL
/usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS'
These activate any inactive SLP:
slpgo=/var/adm/scripts/slpjob.activate.ksh
for i in `/usr/openv/netbackup/bin/admincmd/bpimagelist -L -idonly -hoursago 120 -stl_incomplete | sort +2M +3n +4 | /usr/bin/cut -d" " -f10`
do
DATE=`date +%Y%m%d.%H%M`
echo "Now activating "$i "at "$DATE
/usr/openv/netbackup/bin/admincmd/nbstlutil active -wait -backupid $i
echo "------------------------------------------------"
done
This restarts the SLP scan for jobs to run:
slpscan='date; /usr/openv/netbackup/bin/admincmd/nbstlutil new_session'
06-26-2012 12:26 PM
06-26-2012 12:28 PM
Are you checking 'nbstlutil stlilist -image_incomplete -U' output on a regular basis?
Any chance this SLP was modified while outstanding duplications existed?
06-26-2012 11:02 PM
06-27-2012 12:41 AM
Just remembered something else - SLP guarantees successful duplication by assigning INFINITY retention to backup copy until such time as duplicate has completed successfully.
The fact that your VTL copy has expired is a clear indication that backup was not done with SLP as destination.
See this topic in chapter 14 of NBU Admin Guide I:
About ensuring successful copies using lifecycles
In a storage lifecycle policy, all copies must be completed. A lifecycle initially tries
three times to create a copy. If no copy is created, NetBackup continues to try, but
less frequently.
The successful completion of copies is important because a lifecycle does not
allow a copy to be expired before all copies are completed to each destination in
the lifecycle. Expiration is necessary to free up space on the storage unit for new
backups. NetBackup changes the retention period of an image to Infinity until all
copies are created. After all copies are complete, the retention returns to the level
as set in the policy that writes to the storage destination.
06-27-2012 01:30 AM
Yes Marianne, good point, this helps clarify things a little more.
For my managers piece of mind, he's still asking if I can verify all the servers have a 'copy two' from a recent backup.
Again, thanks for all your help with this.
07-05-2012 11:04 AM
AIGon,