cancel
Showing results for 
Search instead for 
Did you mean: 

SLP_Multiple_Lifecycle replication job???

Stanleyj
Level 6

After upgrading to 7.5.0.6 i started noticing a "SLP_Multiple_Lifecycle" job kick off that usually contains around 50 - 100 images and takes forever to replicate offsite.  I have 6 slps (3 for production, 3 for model (monthly, weekly, daily)) that have been inplace since I setup the system 4 years ago and they have always kicked off in a pretty decent time frame.   Usually directly after a collection of jobs run. 

Now with this multiple lifecycle it contains client that backed up 12 hours earlier and it kicks off directly after my catalog backup around 8:15am every day (catalog starts at 8am)???  Then the policy will run for almost 5 hours on most days transferring 2gb with speeds lingering around 78kb.  The WAN link is a 200mb line and i have seen 5 and 6 gigs of data transfer within 30 minutes on other replication jobs from the same evening.

The majority of the images are from sql jobs but standard file server backups are mixed in as well.  I thought this could be caused because the clients were running 7.5.0.6 but i have rolled back several of them and they still are replicating inside of the multiple lifecycle job.

I've been trying to figure this out for several months now and i just cant make any connection as to why these images wait so long before replicating and why they run so slow.

 

1 ACCEPTED SOLUTION

Accepted Solutions

Mark_Solutions
Level 6
Partner Accredited Certified

If you are using de-dupe then you should be OK to use accelerator - just check the box on the policy

The first run will still be a total pain but after that accelerator should get used - you may want to try client side de-dupe too to take the workload off the appliance when doing the fingerpriniting.

#edit# If you start to use Accelerator the increase the WorkerThreads value in the /disk/etc/puredisk/contentrouter.cfg file from 64 to 128 - that can help its workload anyway!

View solution in original post

12 REPLIES 12

Mark_Solutions
Level 6
Partner Accredited Certified

You can control everything with the LIFECYCLE_PARAMETERS file on the Master Server

The tech support site seems to be down at the moment but a google or search on here will give you all of the details you need

Their is a grouping option which is what allows the Multiple SLPs to run

If the SQL backups are small then i assume your current parameters file has a long tim eperiod before small duplications will run

All of your tuning and how the duplications / replications run are controlled by this file so see what its current settings are and tune to suit

Hope this helps

Stanleyj
Level 6

Mark,

I did some searching just after i posted this and i found another post you had commented on about this touch file.  I looked on my media server and i dont seem to have this paticular file in the /usr/openv/netbackup/db/config

Is there another location it might be?  Or could this be my problem all together

Mark_Solutions
Level 6
Partner Accredited Certified

Should be on your Master Server only

If it is not there then it will use all defaults

You can create it, put in your selections and options, re-start NetBackup and it should take effect

Stanleyj
Level 6

Something very strange happened today.  I looked at my jobs and the multiple lifecyle that I opened this post about was broken out into two replication jobs like it suppose to????  I never found or created a the file we were discussing. 

I really hate these unknown "things" that go on with netbackup.  I get this wierdo crap all the time that appears and then just starts working. 

But...  (There always is one) even though the replication job was broken up the transfer rate was still only 78kb for both streams.  One of them transfered 29mb in 1 1/2 hours.  That doesnt make any sense.



THis is from a slow job:
9/13/2013 3:06:44 AM - Info bpdm(pid=27429) started           
9/13/2013 3:06:44 AM - started process bpdm (27429)
9/13/2013 3:06:44 AM - requesting resource @aaaab
9/13/2013 3:06:44 AM - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=dp_disk_nbuappl01;Path=PureDiskVolume;StorageServe...
9/13/2013 3:15:58 AM - Info nbuappl01(pid=27429) Using OpenStorage to replicate backup id T190sql01.dev_1379053838, media id @aaaab, storage server nbuappl01, disk volume PureDiskVolume
9/13/2013 3:15:58 AM - Info nbuappl01(pid=27429) Replicating images to target storage server drnbuappl01, disk volume PureDiskVolume  
9/13/2013 3:17:01 AM - Info nbuappl01(pid=27429) StorageServer=PureDisk:nbuappl01; Report=PDDO Stats for (nbuappl01): scanned: 4845 KB, CR sent: 28 KB, CR sent over FC: 0 KB, dedup: 99.4%

Here is what im should be expecting:
9/12/2013 11:18:27 PM - Info bpdm(pid=23989) started           
9/12/2013 11:18:27 PM - started process bpdm (23989)
9/12/2013 11:18:27 PM - requesting resource @aaaab
9/12/2013 11:18:28 PM - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=dp_disk_nbuappl01;Path=PureDiskVolume;StorageServe...
9/12/2013 11:18:33 PM - Info nbuappl01(pid=23989) Using OpenStorage to replicate backup id P190sql01_1379040948, media id @aaaab, storage server nbuappl01, disk volume PureDiskVolume
9/12/2013 11:18:33 PM - Info nbuappl01(pid=23989) Replicating images to target storage server drnbuappl01, disk volume PureDiskVolume  
9/12/2013 11:19:33 PM - Info nbuappl01(pid=23989) StorageServer=PureDisk:nbuappl01; Report=PDDO Stats for (nbuappl01): scanned: 410349 KB, CR sent: 234 KB, CR sent over FC: 0 KB, dedup: 99.9%
9/12/2013 11:19:33 PM - replicated backup id P190sql01_1379040948 successfully
9/12/2013 11:19:33 PM - end operation

If im reading this right the second job sent quadruple the data over in almost 1 minute where the first one sent it in 15 mintues??

Maybe im looking to much into this but something just doesnt seem right. 

Could it possibly be due to the fact that the slower jobs are sql incrementals where as the faster ones are sql fulls?  That is the only difference between the two backup policies

Mark_Solutions
Level 6
Partner Accredited Certified

The only thing i can think in relation to the SQL question is that SQL full backups de-dupe quite well but incrementals usually do not - but then your stats indicate they both do well!

I have seen duplication / replications run slow when very small amounts of data are involved - i know its "another thing" but if you cancel a slow one it will often kick straight back in a go through really quickly!

Not sure if you are using any bandwidth throttling but that can have an adverse affect on replication jobs

Stanleyj
Level 6

Mark,

I dont know if this is coincedence or not but since i stopped "attempting" to backup our evault system the multiple lifecycle SLP's and the issue of them starting at 8:15 every morning seems to be resolved.

I stopped backing up evault early last week because apparently i cant (or support) get it to do incrementals but only fulls.  the evault system is 2.5 tb and from what i can tell when it does actually do a backup the dedup rate is about 5% and it chokes the life out of my appliance while its running for almost 36 hours.  Im assuming that this puts a heavy burden on the logs and purging of older images because i was only running the evault job once a week but the issues with the other jobs was every day.  I'm still confused but who knows??

I started trying to backup evault a few days after I upgrade to 7.5.0.6.  I may be jumping to conclusions a little to early but things are running ALOT smoother with the system now that evault is not being backed up at the moment.

Now i dont know what to do because i need to get evault backups working but it appears its going to effect everything else in the process.  I wonder if sending evault over to tape instead of disk would still give me what i need?

Thank you for all your help with this.  I swear i seem to always have the strangest issues and Im really not doing anything super complicated.

Mark_Solutions
Level 6
Partner Accredited Certified

Have you tried using Accelerator for your EV system?

Failing that, and depending on the size of your appliance, how about using Advanced Disk for the EV jobs?

On an appliance with 2 shelves there is a nice chunk of disk left which is pretty handy to setup as an Advanced Disk pool

If you are not getting much de-dupe anyway then do it to Advanced Disk and duplicate it off to tape - we tend to do this for customers who have data that just doesn't de-dupe for what ever reason.

Stanleyj
Level 6

I haven't tried accelerator and this appliance is setup completly dedupe.  The advanced disk is only 1gb.  Looks like im in quite the pickle here.  :)

Mark_Solutions
Level 6
Partner Accredited Certified

If you are using de-dupe then you should be OK to use accelerator - just check the box on the policy

The first run will still be a total pain but after that accelerator should get used - you may want to try client side de-dupe too to take the workload off the appliance when doing the fingerpriniting.

#edit# If you start to use Accelerator the increase the WorkerThreads value in the /disk/etc/puredisk/contentrouter.cfg file from 64 to 128 - that can help its workload anyway!

Stanleyj
Level 6

Got he Worker Threads option changed.  Thanks for that little tidbit.  I use accelerator on about 170 jobs. 

Stanleyj
Level 6

Mark,

I apparently found the performance threshold of a 5200 applaince. Because now that the consumed space on the appliance has fallen back due to the evault full backups expiring and purging off everything has been running smooth as glass!!  Replication jobs are running on time like the should and im not seeing nearly the amount of backup failures that was getting when the appliance storage was 85% consumed. 

The storage consumption is down to around 65 - 70 percent and everything seems to be okay because i havent touched a thing. The worker threads are the only thing i changed and that was after i started seeing some improvements.

At this point i dont know what to do about evault but thats a whole different thread.  Thank you again for all your help.  i just wanted to put a conclusion to this thread.  Thanks buddy.

Mark_Solutions
Level 6
Partner Accredited Certified

Great news - glad to be of any help!