β10-07-2015 08:38 AM
Somewhat tough to explain what I am seeing, but basically the first weekend of each month we do what we call a Monthly Full. This is where our longterm retention requirements are met and we do Full Backups, Replicate data via SLP and then duplicate these backups to tape via SLP which is retained for 1 to 7 years based on requirements. Now each Monday after this full monthly weekend we see 100's of jobs queued up including the Incrementals that should be running and then they start failing with 196's because they are queued for so long. Those backups are written to disk. Now this is a busy time because obviously there is a lot of reading and writing going on from this disk, but the replication SLP jobs still run without issue pretty quickly. I think what we are seeing is some issues with these duplication jobs. I have 4 dedicated LTO 6 drives that dupe this data to tape so we arent fighting for resources between the backups and duplications.
Not sure where to begin with this one so I am all ears and just looking all over. I have noticed that we dont have any lifecycle parameters set at all.
/usr/openv/netbackup/db/config ->more LIFECYCLE_PARAMETERS
#
# Beginning with the NetBackup 7.6 release, Storage Lifecycle configuration
# values are now stored as part of the NetBackup system configuration data
# and can be viewed or changed using the Storage Lifecycle Parameters node
# under Host Properties in the Administration Console or via the bpsetconfig
# and bpgetconfig commands. The prior contents of this file have been automatically
# migrated to the NetBackup system configuration storage. That prior content
# can be viewed in the LIFECYCLE_PARAMETERS.deprecated file for historical purposes.
#
# When modifying values via the bpsetconfig command, be aware that all
# Storage Lifecycle parameter names are now prepended by 'SLP.'. In
# addition, the following parameter names have been changed:
#
# CLEANUP_SESSION_INTERVAL_HOURS is now SLP.CLEANUP_SESSION_INTERVAL
# IMAGE_EXTENDED_RETRY_PERIOD_IN_HOURS is now SLP.IMAGE_EXTENDED_RETRY_PERIOD
# MAX_KB_SIZE_PER_DUPLICATION_JOB is now SLP.MAX_SIZE_PER_DUPLICATION_JOB
# MAX_GB_SIZE_PER_DUPLICATION_JOB is now SLP.MAX_SIZE_PER_DUPLICATION_JOB
# MAX_GB_SIZE_PER_BACKUP_REPLICATION_JOB is now SLP.MAX_SIZE_PER_BACKUP_REPLICATION_JOB
# MAX_MINUTES_TIL_FORCE_SMALL_DUPLICATION_JOB is now SLP.MAX_TIME_TIL_FORCE_SMALL_DUPLICATION_JOB
# MIN_KB_SIZE_PER_DUPLICATION_JOB is now SLP.MIN_SIZE_PER_DUPLICATION_JOB
# MIN_GB_SIZE_PER_DUPLICATION_JOB is now SLP.MIN_SIZE_PER_DUPLICATION_JOB
# VERSION_CLEANUP_DELAY_HOURS is now SLP.VERSION_CLEANUP_DELAY
#
# Size and time values are now specified using units like 'minutes' or 'gigabytes'.
#
# The following parameters have been deprecated due to changes in SLP processing
# and are no longer recognized:
#
# = DUPLICATION_SESSION_INTERVAL_MINUTES
# = IMPORT_SESSION_TIMER
#
# See the NetBackup Adminstrator's Guide, Volume 1 for more information.
#
Also I noticed that our OFFSITE SLP's that do the duplication are not set to preserve multiplexing...
This is my current SLP backlog...
Backlog of incomplete SLP Copies
In Process (Storage Lifecycle State: 2):
Number of copies: 974
Total expected size 114655594 MB
SLP Name: (state) Number of copies: Size:
BR_OFFSITE_P09 (active) 14 861128 MB
XP53TAPE008_SLP (active) 9 758711 MB
XP53TAPE008_SLP_OFFSITE (active) 394 29658783 MB
XP53TAPE009_SLP (active) 20 4380659 MB
XP53TAPE009_SLP_OFFSITE (active) 227 12631405 MB
XP53TAPE010_SLP (active) 18 8309785 MB
XP53TAPE010_SLP_OFFSITE (active) 292 58055121 MB
Total: 974 114655592 MB
The _OFFSITE ones are the ones doing the duplication operations.
Solved! Go to Solution.
β10-07-2015 11:45 AM
Since you are on 7.6 I would implement SLP windows so duplication occurs during day time. I would also ensure backup has higher priority than duplications. Doing writing and reading on the same disk will impact overall performance so best to separate the two work loads.
LIFECYCLE_PARAMETERS is deprecated in 7.6. The same setting can now be found under the master server settings. But you won't find a setting that solve youre issue, however tewaking workload may ease the issue.
There is a TN describing the meaning of the SLP setting, it written for 7.5 that still use the LIFECYCLE_PARAMETERS file. Just ignore that.
http://www.veritas.com/docs/000023582
Doing duplication from disk to tape is per definition non-multiplexed.Preserve multiplexing is for duplicating tapes only.
β10-07-2015 08:41 AM
Also worth noting that I dont see any data buffer files or anything on the media servers with the disk pools attached in /usr/openv/netbackup/db/config.
Netbackup Version: 7.6.0.2
Media Server OS: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
Master Server OS: Solaris 10
β10-07-2015 09:09 AM
Grabbed the bptm logs from all 3 of the media servers with disk attached from 10-3 which is when the full monthly backups started.
β10-07-2015 09:13 AM
I also grabbed the bptm logs from 10-4 to today just in case there may be any good information in there.
β10-07-2015 11:45 AM
Since you are on 7.6 I would implement SLP windows so duplication occurs during day time. I would also ensure backup has higher priority than duplications. Doing writing and reading on the same disk will impact overall performance so best to separate the two work loads.
LIFECYCLE_PARAMETERS is deprecated in 7.6. The same setting can now be found under the master server settings. But you won't find a setting that solve youre issue, however tewaking workload may ease the issue.
There is a TN describing the meaning of the SLP setting, it written for 7.5 that still use the LIFECYCLE_PARAMETERS file. Just ignore that.
http://www.veritas.com/docs/000023582
Doing duplication from disk to tape is per definition non-multiplexed.Preserve multiplexing is for duplicating tapes only.
β10-08-2015 09:49 AM
This makes sense. Thanks again for your help.
β10-08-2015 11:50 AM
Be aware - SLP windows are not just setting start times, they kill duplications if they are not complete by the end of the window, OW.
You can still set SLP parameters at the OS level ( at least I do using Solaris, on NBN 7.6.0.4 )
The files are located in /usr/openv/var/global/
I set up aliases and files so I can change between configs on the fly.
# cat /usr/openv/var/global/nbcl.conf.big
SLP.MIN_SIZE_PER_DUPLICATION_JOB = 64 GB
SLP.MAX_SIZE_PER_DUPLICATION_JOB = 512 GB
SLP.JOB_SUBMISSION_INTERVAL = 20 minutes
SLP.IMAGE_PROCESSING_INTERVAL = 20 minutes
SLP.IMAGE_EXTENDED_RETRY_PERIOD = 1 hour
SLP.MAX_TIME_TIL_FORCE_SMALL_DUPLICATION_JOB = 20 minutes
# cat /usr/openv/var/global/nbcl.conf.small
SLP.MIN_SIZE_PER_DUPLICATION_JOB = 32 GB
SLP.MAX_SIZE_PER_DUPLICATION_JOB = 64 GB
SLP.JOB_SUBMISSION_INTERVAL = 5 minutes
SLP.IMAGE_PROCESSING_INTERVAL = 5 minutes
SLP.IMAGE_EXTENDED_RETRY_PERIOD = 10 minutes
SLP.MAX_TIME_TIL_FORCE_SMALL_DUPLICATION_JOB = 2 minutes
I set up aliases, so I can type slpbig or slpsmall and change the config on the fly.
slpbig='/usr/openv/netbackup/bin/admincmd/bpsetconfig /usr/openv/var/global/nbcl.conf.big'
slpsmall='/usr/openv/netbackup/bin/admincmd/bpsetconfig /usr/openv/var/global/nbcl.conf.small'
β10-08-2015 11:57 AM
In my case I have an entire month before the duplication jobs would be queued up again so if I cant get them all done in a week its no big deal if a few get stopped here and there.