03-22-2012 05:15 AM
Solved! Go to Solution.
05-11-2012 02:49 AM
Hi Bernard, Hi Chad
the great news is that removing just a parameter in RMAN script, my oracle backup (which lasted on wednesday almost 23hrs) yesterday taken 7hrs and half to complete!!!! :)
that parameter is FILESPERSET=1 (we added it because is a best practice taken from Symantec Admin Guide).
with it in RMAN script, we saw almost 350 netbackup jobs (each job taking an average of 4mins, even if some jobs were backupping just some 50MBs), without it the number of netbackup jobs decreased to almost 100, each job with a 70% dedup rate.
i just not understand why Symantec and Oracle Admin Guide don't point out in a clear way that implementing it will dramatically affect backup times.
this morning i've also changed to NUMBER_DATA_BUFFERS 128, and added one more channel in RMAN scripts (now they're 3), next monday i'll see if these changes have improved backup time any more.
regards,
Alberto
03-23-2012 11:09 AM
our backup to VTL was much faster than to the 5020
i even tested to backup to a 5220 advanced disk and it was slower then the VTL.
i have a case open with support they ran an app critical network test i am waiting to see what they can come up with.
03-26-2012 12:54 AM
that's quite sad :(
please let me know if you can understand something more about it.
i can tell you that we've opened a TAR with oracle support (i hoped they would have had some case history to help us), but they dismissed it quite fastly telling us that the appliance is the one and only to be biased.
03-27-2012 04:42 AM
A few things here which may assist and worth testing (1 at a time) ....
1. If you are doing client side de-dupe then try using media server side
2. Try using de-dupe compression if you are not already (client and media server side - (/usr/openv/lib/ost-plugins/pd.conf change to COMPRESSION = 1)
3. Reduce the default fragment size of the storage unit to 5000MB
4. SIZE and NUMBER_DATA_BUFFERS - see if they exist and what values are used - I have found that performance can be better without them at times.
5. Check that the appliance has been tuned - not sure if this is used on the 5020 but on the 5200 there is a tuning script that should run when first configured - if it has one it may be located at: /opt/NBUAppliance/scripts/bin/tune.pl
If it exists check with Symantec if it is OK to run it - if so do the following:
cd to: /opt/NBUAppliance/scripts/
type the following once in that directory:
../bin/perl tune.pl
Hope this helps
03-28-2012 01:55 AM
Hi Mark,
thank for your reply, here are my setup:
i'll try to vary the defrag size, while i've read that modifing SIZE and NUMBER_DATA_BUFFERS should be done really carefully.
thank you!
Alberto
03-28-2012 10:08 PM
Hi Alberto,
Could you send me the RMAN script and config you are using for your backup, please.
Also: the SIZE_DATA_BUFFERS is always multiple of 1024... so it should be 131072 not 132096.
Could you also make sure that the following was configured in RMAN:
03-29-2012 05:50 AM
05-01-2012 06:55 AM
We are having the same issue. Did any of these suggestion improve your Oracle backups?
Has this issue been resolved? Thanks
05-02-2012 05:42 AM
Hi,
What does the detailed jobinfo tell about wating for empty buffers on the puredisk jobs??
What is the content of SIZE_DATA_BUFFERS_DISK and NUMBER_DATA_BUFFERS_DISK which are the files used for disk pools? - not SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS which are used for tape communication (physical and virtual - VTL)
--jakob;
05-03-2012 11:36 AM
No waiting for empty buffers reported but rather example... waited for full buffer 12625 times, delayed 16416 times
These do not currently exist -- SIZE_DATA_BUFFERS_DISK and NUMBER_DATA_BUFFERS_DISK
SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS are set to 262144 and 256 respectively.
05-05-2012 09:07 PM
You mention you're going to a library. How many drives and of what type? I have seen cases where streaming to multiple tape units in a library can outperform certain disk setups.
Are you using SAN Client?
05-08-2012 01:09 AM
Hi Sebastian,
i've found this note from Symantec:
http://www.symantec.com/business/support/index?page=content&id=TECH35968
so, i've added to my media servers these parametres:
05-08-2012 09:39 PM
ok, so to clear up a couple things, in the absence of SIZE_DATA_BUFFERS_DISK and the associated "NUMBER" file, the standard SIZE_DATA_BUFFERS is used. I would highly recommend using 262144 as your size data buffer with the recognition that if you exceed your hardware devices buffer size you will have a buffer overrun condition which will allow you to write the backups but not restore them. Provided you're using at least LTO3 drives, this value will ensure you're fine. If you want to get more performance out of the back end of the process (media server writing to tape), feel free to change these files. HOWEVER, where I really see the biggest bang for my buck is on the client-side communications buffer. Windows clients have a default of like 16KB or something stupid like that. Sure, it keeps from breaking WIndows NT boxes but not so hot for performance. The max value is 32767 or 32MB of RAM that will get reserved for NBU during the backup. What this allows NBU to do is to buffer more data on the client-side of the backup operation prior to flushing to the network. Your traffic profile will get a LOT more bursty, but I've seen SIGNIFICANT improvements in throughput.
Additionally, for Oracle backups to dedupe storage the best practice is to leverage an RMAN capability called "proxy copy" to make sure the data always comes out of the DB the same way. This results in higher dedupe rates which in turn will give you better performance.
Another note, backups of a single client to tape will MOST times result in faster throughput than to dedupe, particularly on data sets with very low dedupe rates. The process of deduplication inserts a non-trivial amount of CPU processing into the backup process whereas the backup to tape is a pure IO operation.....take the data in on this Ethernet port, send it out on this FC port. Of course tape's going to be faster. NOW, the REAL difference is when you start running 100, 200 or more jobs simultaneously. If you even have enough tape devices to handle that kind of load you're probably using multiplexing settings that will kill you if you have a DR event. But the disk-based backups are able to cope with this high level of parallelization leading to shorter backup windows. While each stream may not be as fast as the same stream sent to tape, the real key is how long is your backup window right? THAT'S where the dedupe helps, it's all about the parallelization of the workload.........well that and saving you a ton of disk space you would have needed to store all those backups. :)
05-09-2012 02:30 AM
Hi Chad,
as i wrote above, i've set up these parameters on my media servers:
05-09-2012 03:46 AM
So I would look at doing a couple of things:
05-10-2012 07:51 AM
Hi, I'm asking myself either or not "Maximum concurrent jobs" setting on the N5020 STU couldn't be participating in performances?
Regards
05-10-2012 08:47 PM
It's possible. I generally set the max concurrent jobs to 100.....it's a nice round number and I haven't had problems with it although I generally don't do a ton of simultaneous backups as most of what I'm doing is for POC's. With that said though, the key is to push the envelope for as many simultaneous jobs as you possibly can. The good thing with dedupe is that for about 90% of the traffic, the data is getting thrown out at either the client or media server, so the dedupe box shouldn't have to do that much other than answering the question of "have you ever seen this chunk of data before".
So for first backups I might start the max concurrent jobs setting a bit lower as you're actually going to be moving real data, but after that I would keep bumping it up until you start to see stream throughput being impacted.
As for the above references with the Oracle backups, there are a lot of other factors that go into how many streams are really created and allowed to run concurrently, almost all of which is controlled by the RMAN parameters in the script file on the client.
05-10-2012 11:24 PM
Hi, it would be great having Alberto feedback regarding above concurent jobs settings.
Regards
05-11-2012 02:49 AM
Hi Bernard, Hi Chad
the great news is that removing just a parameter in RMAN script, my oracle backup (which lasted on wednesday almost 23hrs) yesterday taken 7hrs and half to complete!!!! :)
that parameter is FILESPERSET=1 (we added it because is a best practice taken from Symantec Admin Guide).
with it in RMAN script, we saw almost 350 netbackup jobs (each job taking an average of 4mins, even if some jobs were backupping just some 50MBs), without it the number of netbackup jobs decreased to almost 100, each job with a 70% dedup rate.
i just not understand why Symantec and Oracle Admin Guide don't point out in a clear way that implementing it will dramatically affect backup times.
this morning i've also changed to NUMBER_DATA_BUFFERS 128, and added one more channel in RMAN scripts (now they're 3), next monday i'll see if these changes have improved backup time any more.
regards,
Alberto
05-11-2012 05:01 AM
Thanks for the technical feedback you provide.
Regards