cancel
Showing results for 
Search instead for 
Did you mean: 

Slow Duplication Jobs Which Then Slow Other Jobs Down

backup-botw
Level 6

Somewhat tough to explain what I am seeing, but basically the first weekend of each month we do what we call a Monthly Full. This is where our longterm retention requirements are met and we do Full Backups, Replicate data via SLP and then duplicate these backups to tape via SLP which is retained for 1 to 7 years based on requirements. Now each Monday after this full monthly weekend we see 100's of jobs queued up including the Incrementals that should be running and then they start failing with 196's because they are queued for so long. Those backups are written to disk. Now this is a busy time because obviously there is a lot of reading and writing going on from this disk, but the replication SLP jobs still run without issue pretty quickly. I think what we are seeing is some issues with these duplication jobs. I have 4 dedicated LTO 6 drives that dupe this data to tape so we arent fighting for resources between the backups and duplications.

Not sure where to begin with this one so I am all ears and just looking all over. I have noticed that we dont have any lifecycle parameters set at all.

/usr/openv/netbackup/db/config ->more LIFECYCLE_PARAMETERS

#

# Beginning with the NetBackup 7.6 release, Storage Lifecycle configuration

# values are now stored as part of the NetBackup system configuration data

# and can be viewed or changed using the Storage Lifecycle Parameters node

# under Host Properties in the Administration Console or via the bpsetconfig

# and bpgetconfig commands. The prior contents of this file have been automatically

# migrated to the NetBackup system configuration storage. That prior content

# can be viewed in the LIFECYCLE_PARAMETERS.deprecated file for historical purposes.

#

# When modifying values via the bpsetconfig command, be aware that all

# Storage Lifecycle parameter names are now prepended by 'SLP.'. In

# addition, the following parameter names have been changed:

#

# CLEANUP_SESSION_INTERVAL_HOURS              is now SLP.CLEANUP_SESSION_INTERVAL

# IMAGE_EXTENDED_RETRY_PERIOD_IN_HOURS        is now SLP.IMAGE_EXTENDED_RETRY_PERIOD

# MAX_KB_SIZE_PER_DUPLICATION_JOB             is now SLP.MAX_SIZE_PER_DUPLICATION_JOB

# MAX_GB_SIZE_PER_DUPLICATION_JOB             is now SLP.MAX_SIZE_PER_DUPLICATION_JOB

# MAX_GB_SIZE_PER_BACKUP_REPLICATION_JOB      is now SLP.MAX_SIZE_PER_BACKUP_REPLICATION_JOB

# MAX_MINUTES_TIL_FORCE_SMALL_DUPLICATION_JOB is now SLP.MAX_TIME_TIL_FORCE_SMALL_DUPLICATION_JOB

# MIN_KB_SIZE_PER_DUPLICATION_JOB             is now SLP.MIN_SIZE_PER_DUPLICATION_JOB

# MIN_GB_SIZE_PER_DUPLICATION_JOB             is now SLP.MIN_SIZE_PER_DUPLICATION_JOB

# VERSION_CLEANUP_DELAY_HOURS                 is now SLP.VERSION_CLEANUP_DELAY

#

# Size and time values are now specified using units like 'minutes' or 'gigabytes'.

#

# The following parameters have been deprecated due to changes in SLP processing

# and are no longer recognized:

#

# = DUPLICATION_SESSION_INTERVAL_MINUTES

# = IMPORT_SESSION_TIMER

#

# See the NetBackup Adminstrator's Guide, Volume 1 for more information.

#

Also I noticed that our OFFSITE SLP's that do the duplication are not set to preserve multiplexing...

mpx.png

 

This is my current SLP backlog...

Backlog of incomplete SLP Copies

        In Process (Storage Lifecycle State: 2):

                Number of copies:       974

                Total expected size     114655594 MB

 

SLP Name: (state)                                 Number of copies: Size:

BR_OFFSITE_P09 (active)                                       14      861128 MB

XP53TAPE008_SLP (active)                                       9      758711 MB

XP53TAPE008_SLP_OFFSITE (active)                394     29658783 MB

XP53TAPE009_SLP (active)                                      20     4380659 MB

XP53TAPE009_SLP_OFFSITE (active)                 227    12631405 MB

XP53TAPE010_SLP (active)                                      18     8309785 MB

XP53TAPE010_SLP_OFFSITE (active)                 292    58055121 MB

 

Total:                                                       974   114655592 MB

 

The _OFFSITE ones are the ones doing the duplication operations.

1 ACCEPTED SOLUTION

Accepted Solutions

Nicolai
Moderator
Moderator
Partner    VIP   

Since you are on 7.6 I would implement SLP windows so duplication occurs during day time. I would also ensure backup has higher priority than duplications. Doing writing and reading on the same disk will impact overall performance so best to separate the two work loads.

LIFECYCLE_PARAMETERS is  deprecated in 7.6. The same setting can now be found under the master server settings. But you won't find a setting that solve youre issue, however tewaking workload may ease the issue. 

There is a TN describing the meaning of the SLP setting, it written for 7.5 that still use the LIFECYCLE_PARAMETERS file. Just ignore that.

http://www.veritas.com/docs/000023582

Doing duplication from disk to tape is per definition non-multiplexed.Preserve multiplexing is for duplicating tapes only.

 

View solution in original post

7 REPLIES 7

backup-botw
Level 6

Also worth noting that I dont see any data buffer files or anything on the media servers with the disk pools attached in /usr/openv/netbackup/db/config.

Netbackup Version: 7.6.0.2

Media Server OS: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

Master Server OS: Solaris 10

backup-botw
Level 6

Grabbed the bptm logs from all 3 of the media servers with disk attached from 10-3 which is when the full monthly backups started.

backup-botw
Level 6

I also grabbed the bptm logs from 10-4 to today just in case there may be any good information in there.

Nicolai
Moderator
Moderator
Partner    VIP   

Since you are on 7.6 I would implement SLP windows so duplication occurs during day time. I would also ensure backup has higher priority than duplications. Doing writing and reading on the same disk will impact overall performance so best to separate the two work loads.

LIFECYCLE_PARAMETERS is  deprecated in 7.6. The same setting can now be found under the master server settings. But you won't find a setting that solve youre issue, however tewaking workload may ease the issue. 

There is a TN describing the meaning of the SLP setting, it written for 7.5 that still use the LIFECYCLE_PARAMETERS file. Just ignore that.

http://www.veritas.com/docs/000023582

Doing duplication from disk to tape is per definition non-multiplexed.Preserve multiplexing is for duplicating tapes only.

 

backup-botw
Level 6

This makes sense. Thanks again for your help.

Genericus
Moderator
Moderator
   VIP   

Be aware - SLP windows are not just setting start times, they kill duplications if they are not complete by the end of the window, OW.

You can still set SLP parameters at the OS level ( at least I do using Solaris, on NBN 7.6.0.4 )

The files are located in /usr/openv/var/global/

I set up aliases and files so I can change between configs on the fly.

# cat /usr/openv/var/global/nbcl.conf.big

SLP.MIN_SIZE_PER_DUPLICATION_JOB = 64 GB
SLP.MAX_SIZE_PER_DUPLICATION_JOB = 512 GB
SLP.JOB_SUBMISSION_INTERVAL = 20 minutes
SLP.IMAGE_PROCESSING_INTERVAL = 20 minutes
SLP.IMAGE_EXTENDED_RETRY_PERIOD = 1 hour
SLP.MAX_TIME_TIL_FORCE_SMALL_DUPLICATION_JOB = 20 minutes

 

# cat /usr/openv/var/global/nbcl.conf.small

SLP.MIN_SIZE_PER_DUPLICATION_JOB = 32 GB
SLP.MAX_SIZE_PER_DUPLICATION_JOB = 64 GB
SLP.JOB_SUBMISSION_INTERVAL = 5 minutes
SLP.IMAGE_PROCESSING_INTERVAL = 5 minutes
SLP.IMAGE_EXTENDED_RETRY_PERIOD = 10 minutes
SLP.MAX_TIME_TIL_FORCE_SMALL_DUPLICATION_JOB = 2 minutes

 

 

I set up aliases, so I can type slpbig or slpsmall and change the config on the fly.

slpbig='/usr/openv/netbackup/bin/admincmd/bpsetconfig /usr/openv/var/global/nbcl.conf.big'

slpsmall='/usr/openv/netbackup/bin/admincmd/bpsetconfig /usr/openv/var/global/nbcl.conf.small'

 

 

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

backup-botw
Level 6

In my case I have an entire month before the duplication jobs would be queued up again so if I cant get them all done in a week its no big deal if a few get stopped here and there.