Deduplication incremental backup is taking more th... - Page 2

Dennis_S_ · ‎08-11-2011

I'm backing up 1.4TB of data from our project drives to an iSCSI storage Synology server and it is still running. How should I configure this software so it takes less time? It started at 11:00PM yesterday and now it is 4:42PM today. When it completes, it appears that there's usually only about 100 gigabytes of changed data that it stores in the dedupe folder from what I can tell from file dates on the iSCSI target. But it is acting like it is backing up the whole thing on every incremental due to how much time it takes. I think this will take roughly 20 hours total which is not what we want.

teiva-boy · ‎08-17-2011

Seriously, NDMP is probably your answer as that will only move the files that have changed since the last full backup (Level0, then Level1, etc)

Though NDMP can't be deduplicated via BackupExec currently.

You could try to break up the job into smaller selections, and have them run concurrently as a solution?

Of course, you need to make sure that the filer has at least a couple of Gb links trunked together, as well as your media server as well. Hitting 60Mb/s is close the real-world max of a single Gb link.

Dennis_S_ · ‎08-17-2011

Yes, I could try to break it down per project drive and run them concurrently. I hadn't done that because I would venture toward that as an option if I was getting out of memory errors on a backup server.

We're looking into NDMP but currently we do not use that option for our current backup software so it would make sense that we would not need it for Symantec Backup Exec as well. These both should complete in roughly the same amount of time.

I'll try making multiple concurrent backups next. Maybe I should break up the largest project drive into two jobs and then the other two into one job each for a total of 4 and then try running it. Maybe it is just too many files to have inside of one dedupe job.

teiva-boy · ‎08-17-2011

When running off a CIFS share and not using an agent or NDMP, you're at the mercy of CIFS and file reads, per file... There are about 15 round-trip communication steps of accessing the file, before you even begin it's file copy...

Multiple concurrent jobs should help, though you want to increment up slowly so that you see a positive benefit, and back down when you don't see any more gain. It could be a bottleneck in the NetApp disk vol, NetApp head, or your network, or the Backup Exec server. 2-4 sounds about right, if you have multiple Gb links trunked together. Most of my NetApp 3000 series customers have 4 Gb links trunked, my 2040 customers use FC and/or 2 Gb links trunked. The 2020 and 2050 IMO barely can use a single Gb link.

robnicholson · ‎08-18-2011

Because it is best practice for you to do so with AV software.

Never really through about the virus scanner on the target system before. Is excluding beremote.exe going to make much difference there as it's loaded once (and therefore scanned once) at the start of each backup job?

Cheers, Rob.

robnicholson · ‎08-18-2011

When running off a CIFS share and not using an agent or NDMP, you're at the mercy of CIFS and file reads, per file... There are about 15 round-trip communication steps of accessing the file, before you even begin it's file copy...

Just trying to understand how all of this works here. With an agent based copy doing an incremental backup, it is the agent that scans the folder to be backed up to identify changed files based upon the attribute flag? If so, then yet I can see how this would be much faster than the media server scanning for the files over a file share. It would like running the Take Command attrib /a:a /s Folder command which lists the files with the attribute flag set. Run it on the server itself and I'm guessing it's faster than running it remotely over a gigabit network using a CIFS share.

Question - does Windows itself use the same protocol/system when accessing remote shares as Backup Exec? If not, then running the above is meaningless. Although I am doing it out of interest ;)

(Aside, this caused me to look at our main file share backup to dedupe which is differential and failed last night with some AOFO error. But what's worrying is that the size of the differential backup jumped from 93GB to 651GB! Can't believe users created 550GB of data yesterday...)

Cheers, Rob.

robnicholson · ‎08-18-2011

A comparison to our system. Our backup job is smaller at 1.1TB but the differential backup is backing up a similar amount of 93GB. Files are the usual collection of Office documents not many large files.

That took 04:38 to do the backup and 03:22 to verify. So ~7 hours. This is using an agent and client side dedplication and backing up to external USB (only temporary!) which has a raw speed of ~30MB/s or 1,800MB/min.

So the time the original poster is getting is a factor of 2 slower.

I'm going to guess it's because the Dedupe database is been accessed remotely over iSCSI? I assume this the case as they said "to an iSCSI storage Synology server". My early tests of putting the deduplication system on iSCSI were a disaster. Backup test speeds were pathetic. I ran out of time to test it further at the time but mean to re-visit.

I assume this is why Symantec best-practise is the put dedupe storage on local devices?

There is an unsupported option whereby you can put the dedupe PostgreSQL database on local storage but put the data somewhere else (iSCSI).

Cheers, Rob.

robnicholson · ‎08-18-2011

Okay here's some times for checking for changes files in preparation for an incremental backup:

Thu 18/08/11 09:34:52>attrib /a:a /s /e /q \\vserver003\e$
Thu 18/08/11 09:56:09>

Thu 18/08/11 10:06:21>attrib /a:a /s /e /q e:\
Thu 18/08/11 10:18:01>

The first one is from the BE media server over the gigabit network using Windows file share and it weighs in at 22 minutes.

The second one is run on the server been backed up and is twice as fast at 12 minutes.

So yes, scanning for changed files across the network is, not surprisingly, slower.

But in the scheme of that 16 hour incremental backup, this is a small overhead. Reading this link, I infer that CIFS is the protocol Windows uses so it is valid to compare running attrib to the process that Backup Exec undertakes itself. In fact, one could turn on the pre-scan option and watch how long it takes.

So I would suspect the problem lies elsewhere.

Dennis_S_ · ‎08-18-2011

Yes all of the above is good information and I read through it all and it makes sense.

But why is it that SmartSyncPro accomplishes this task in 2 hours with no NDMP and no "remote agent" (this software does not support / host either of those features) ?

It makes me think that Backup Exec should as well since it is an enterprise product and SmartSyncPro is mainly a desktop product. As far as I can tell anyways.

Last night's backup, in SmartSyncPro 1.30 TB Used Space on USB 2.0 hard drive, Log 08/17/2011 8:30:04 PM Copying source to Destination... 08/17/2011 10:43:27 PM Synchronization completed. This is roughly 2 hours. This is just running on WinXP SP3 x64 on a core2duo desktop PC with the latest version of SmartSyncPro 3. This is an incremental backup overwriting changed files in the full.

On the other hand... Symantec Backup Exec 2010 right now is still plugging along at a slow 800MB/min doing my full backup data-to-disk test and I'm guessing it will take roughly 2 days to complete at that rate.

robnicholson · ‎08-18-2011

Hi Dennis,

You won't find me trying to say that Backup Exec's deduplication option is high performance, because it clearly isn't. Is it good enough? Well for us with a 1.5TB file system to backup, it's okay, just.

I've posted elsewhere that I cannot quite understand why a backup system that does this is so slow:

The remote agent scans a local file system for files with the attribute flag set
Reads each of those files locally at full local disk speed (10,000MB/min is not unreasonable)
For each 64k chunk in that file, calculate a hash
Communicate with the media server to see if it already has that 64k block in the database - at this point the hash is been transferred - any know how big the hash is?
If not, then transfer that 64k block across and ask the media server to write to the database (actually it doesn't put the block in the database, that goes in a flat file)
Update a handful of database records to record this file's backup stat

The last differential backup of 93GB managed 450MB/min at took 4.5 hours. NOTE: it's a differential backup so ~70GB of the data is already in the database so the remote agent should be able to pull that off the local disk in 7 minutes raw speed, let's say 10 minutes with a bit of overhead. But now we start to get to where the deduplication algorithm starts to struggle maybe. There are 1,146,880 64k blocks in 70GB of data. That's over one million database queries to see if it's in the database already. That's not a small number of queries and we don't know right now how big the hash algorithm.

And of course, in your scenario, those SQL queries are been made across iSCSI so that's going to introduce a speed premium.

With Backup Exec, we can say deduplication is about saving space/getting longer history than reducing the backup window. But I'm sure with dedicated that backup window can be reduced as well.

Cheers, Rob.

robnicholson · ‎08-18-2011

PS. Comparing with SmartSyncPro isn't really fair as they are doing very different things. It's just doing this:

Finding files that have different attributes, timestamp, security or size and then copying at full-tilt to the other location.

I was going to say it's not doing a hash compute to determine if the file has changed but having a look at this page, it does have this option:

http://www.smartsync.com/help/profile_properties_file_comparison.html

Do you have this turned on? If not, it would be useful to turn it on and see what speed SmartSyncPro can achieve because that's adding in a similar hash calculation.

But SmartSyncPro is definately not doing the 64k hash lookup in a database to allow you to keep a history of changes to a file going back in time, so a bit of comparing apples and oranges going on.

Cheers, Rob.

Atur_S_ · ‎08-18-2011

Hi Rob, I work with Dennis. Thanks for your input. To answer your question, the dedupe database is being accessed over iSCSI on the Synology. BE is a guest (VM) and its VHD resides on the NetApp 2040. Our approach was to use cheap storage for the database.

robnicholson · ‎08-18-2011

Answering own question but I think the hash size on PureDisk (which is what BE uses) is 128 bits or 16 bytes. So the network saving between client side and server side de-duplication should be a massive factor of 4,096 for 64k blocks. I have no idea of the relative performance of hash algorithms or even was algorithm BE uses.

Cheers, Rob.

robnicholson · ‎08-18-2011

I really don't have time for this, but it's interesting. Okay, let's assume PureDisk uses MD5 hash algorithm. I've just run md5sums on a 419MB file on our main file server (eight core ESX Server VM) and it took 21 seconds to calculate the MD5.

If my math is correct:

419,430,400 bytes in 21 seconds which is 1,198,372,571B/min (x60/21) or 1,142MB/min

Isn't that an interesting figure, the deduplication bottleneck assuming MD5 is used is the hash algorithm itself. Doesn't matter that your B2D system can do 10,000MB/min, the maximum speed the hash algorithm can split up those files is ~1,000MB/min.

With that 1.5TB initial backup, ~26 hours is spent just calculating the hash values.

This kind of application is ripe for multi-threading but I think somebody else said that the deduplication engine isn't multi-threaded. On that 8 core file server of ours, it could throw eight threads at it reducing those 26 hours down to 3.2 hours (yes I know this is all a bit rough/back of envelope).

Cheers, Rob.

Steven_G_ · ‎08-31-2011

I too am having what seems to be excessive backup times during a dedup job.

Our hardware is the following:

Backup Server:

Processors: Dual Intel Xeon e5506 @ 2.13GHz

Memory: 24GB

Windows: Server 2008 R2 64-bit

Local Disks:

320 Sata with OS and apps

32 TByte Raid w/dedup target folder

Nic: Intel 82576 Gigabit Ethernet

Client Server:

Processors: Intel Xeon l5420 @ 2.50GHz

Memory: 8GB

Windows: Server 2003 R2 32-bit SP2

Local Disk: 4 TByte Raid connected via fiber channel

Nic: Broadcom BCM5708C NetX II Gigabit Ethernet

When I first tried a job I was using client side dedup. When I saw the slow speeds (19:00 MB/Min) I stopped the job after 25 hours.

I set all subsequent jobs to media side dedup. The processors, on both servers, and the nics were twiddling their thumbs wanting something to do.

I monitored the BE servers raid drive for disk I/O and queue times. Other than occasional spikes the disk was also unimpressed with the work load. None of the changes in the job rate coincided with the disk I/O spikes.

I let one job complete that had an average rate of 119.00 MB/min w/verify.

I did a second pass job, where no data has changed, and it ran amazingly fast - 1,581.00 MB/min.

I then added data to the folder, a 2 GB subfolder, and ran the same job again.

It took 2 hours to back up the additional 2 GBs of data at an average speed of 904 MB/min.

The next complete job, of new data, took 12 hours to backup 22GB at a rate of 28.00 MB/min.

None of the systems are virtual or clustered. The processors, memory or drives are not being taxed in anyway that I can see.

I have applied the hot fixes mentioned above. I have killed ALL caching on the backup servers array.

I have excluded the path to the dedup folder from all SEP scans.

The biggest problem I'm going to have in a production environment are these slow speeds with new data.

I'm going to be backing up close to 6 TB of new data when I go live with this and it's only going to grow from there.

I'd appreciate any input on what I can do to speed things up.

Thanks

Steve

robnicholson · ‎08-31-2011

I did a second pass job, where no data has changed, and it ran amazingly fast - 1,581.00 MB/min.

1,500MB/min isn't amazing fast when you consider a USB 2 hard disk can manage around the speed. Even LTO-3 tape can manage well over 3,000MB/min. USB 3 can manage a pretty amazing speeds of over 8,000MB/min. I've had a USB-3 drive hooked up to our test media server and the drive was idle most of the time.

As I mentioned above, on our similar spec CPU system, the hash algorithm that's key to the operation of deduplication can only manage about 1,000MB/min (with all the caveats above).

So add on top of this, the overhead of millions of Postgres SQL queries been fired into the database to check if the 64k chunk is already there, you can start to see where it's struggling. It's akin to zipping up your entire file system (in a way!) - that's not going to be fast is it?

Backup Exec as it stands slightly cuts down backup windows but it's main strength is getting more historical data in the backup.

I don't think it's the disk systems or networks that are slowing things down here - it's the deduplication algorithm itself in term so the hash calculation plus the SQL server queries. And only Symantec can can optimise that. I would imagine super optimisation of the hash algoritm, multi-thread operation of both the agent (in hashing) and media server (in SQL queries) are the solution. It maybe that dedicated hardware is needed for that hashing bottleneck. That said, you can get six core CPUs for not a lot these days, stick two CPUs in the media server and it really throw itself into hashing. It could be that if your got lots of grunt on the media server, client side de-duplication is not preferable over server side and your bottleneck comes how fast you can get the data off the disks and across the network.

That said, your backup speed does seem very low.

Cheers, Rob.

Dennis_S_ · ‎09-06-2011

We configured LUNs into a separate host VM and installed the remote agent and I configured the backup exec with dedupe option to use Microsoft Change Journaling after I found out that it was already configured on the VM hosting the LUNs.

So now we aren't backing up as much data but our current tests found that the change journaling took the incrementals down to 12 minutes per day and the full backups we run every Saturday night takes 12 hours. Both are with file verification turned on. Turning it off cuts these times in half (6 minutes and 6 hours respectively).

This is for a data set of 671 gigs now and a delta of about 7 megabytes.

So basically all we had to do was convert over to LUN and turn on change journaling, both of which are what I suspected were the cause of the problem from the beginning.

Other tests conclude still that backing up CIFs is a long task and also results in an error at the end, where backup exec complains that it could not find a remote agent.

Anyway we're in the process of implementing all of these changes into the business. I would like to speed up the full backup but right now it is not a pressing need.

We ended up scaling back the VM that hosts backup exec from 4 processor cores to 2 and from 8 gigs of RAM to 4. We have noticed 0 performance issues with these changes. It operates exactly the same as it did with a higher hardware specification.

Dennis_S_ · ‎09-06-2011

Our job rate for full backup with dedupe option to iSCSI target on Synology NAS is 2,704MB/min which is about 1,000MB/min faster than our previous tests. It still takes a long time but at least the dailies are only taking 12 minutes and I don't have a job rate listed in Backup Exec for that. It just shows Elapsed Time: 0:11:52 Byte Count 7,791,760

Now, finally, on to a new thread.

Turls · ‎09-06-2011

@Dennis, please mark a comment that solved your issue as a "Solution"

VOX

Deduplication incremental backup is taking more than 16 hours