Backup & Recovery

One of the most frustrating and common problems with backups is troubleshooting poor performance. A simple concept like backing up a file, unfortunately, touches almost all aspects of your network and server infrastructure - everything from the source server's OS to the tape library. This makes troubleshooting a non-trivial task.  This article gives some tips on how to troubleshoot performance issues, and what areas to focus your attention.

The Facts of Life
There are some environments that just won't have high performance. There's a small amount of OS overhead to opening & closing files. If you're backing up a multi-gigabyte database file, you'll typically see great performance. Switch to a set of a few thousand small files, and the overhead to open & close those files will drive performance down dramatically. There's no way to fix that problem directly, but you can work around it with Differential or Synthetic backup strategies.

The same rule applies to the type of agent. The Linux/Unix Backup Exec agent, for example, won't have the same performance level as the Windows agent. There's really not much you can do about that.

Backups of non-file data is also generally slower.  System State backups have several minutes of overhead, as do Active Directory backups.  Backups of active DFS links are well known to be much slower than normal file backups as well.

Context - What Happened?
If your backup jobs are suddenly experiencing a performance issue, your first question should be "What recently changed in my environment?". Did you change the time a job runs? Did you update some network card or tape drive drivers? Are there any recent errors in the Event Logs? Knowing what changed will help to identify possible sources of your problem.

Time of Day Issues
If you changed the time your backup jobs run, you might be running into a time when the server or network is highly used. Daytime backups are usually unwise, because your users are putting a heavy load on the servers & network. Investigate if there are any automated processes that use large amounts of network traffic. Are there other backups running for other departments?
Are there large file copies, replication or software deployments going on the same time as your backups? In these cases, shifting your start times may be the cure.

Divide & Conquer
Your next step should be to narrow down when the performance issues happen. It is for all jobs, or just for a specific server? If just one server is slow, then your problem probably lies with that server or some part of the network that's unique to that server.

On the other hand, if all jobs seem to run slow, you need to start eliminating parts of your setup. Try a large (1 or 2 GB) test backup of your Backup server to itself - without any network traffic. This will tell you if the problem is with your tape library or the network.

Network Issues
This one's harder to nail down. Most of the time, the problem will be with the NIC on the backup server and/or the source server. If the problem was with the network infrastructure, more people that you would be complaining, right?

If your backup server has multiple network cards, consider testing using a different NIC. Check the firmware & OS drivers for your network card. When in doubt, update to the latest.

Try some XCOPY tests with large sets of files, and compare the results with a server that seems to run faster on your network. This will help identify if the problem's on a specifc server or the network in general.  Note that XCOPY and backup benchmarks can't be directly compared.

Tape & Library Issues
If you suspect your tape drive or tape library to the source of poor performance, the first thing Symantec support will tell you is to try running NTBackup. While this advice may seem pointless and annoying, it can be a valuable test to determine if the problem is with your tape system or Backup Exec itself. (It's interesting to note that the NTBackup program was written by Symantec, and can be considered an ultra-light version of Backup Exec.)

The general process is as follows:
-Backup a set of test files local to the backup server. Ideally 1 or 2 GB. Record the elapsed time for the job.
-Stop all of the Backup Exec services
-Run the built-in Windows program "NTBackup"
-Select and backup the exact same file set. Compare the job times.
-Restart the Backup Exec services

If this test shows Backup Exec performing much slower, call Symantec support- you've probably got a bug to report. Otherwise, you're back to finding what in your environment is to blame.

Backup Exec generally works with any tape drive or library with either the vendor's drivers or Symantec's own drivers. In most cases you're better off with Symantec's. Try running tests with one, then the other. It's not uncommon to have problems with one or the other- you'll have to try to see what works in your environment.

You can confirm your hardware is supported by checking Symantec's Hardware Compatability List (HCL).  The following link is for v12.5
For other versions, search the Symantec Support site for "HCL" , and select only the "Backup Exec" product.

Always download the latest drivers before doing your tests! You can find Symantec's latest device drivers here:
Select your version, platform, and select a "File Type" of "Driver".

Backup To Disk Issues
Probably the most common reason for Backup To Disk jobs to run slower than tape jobs is disk fragmentation. More specifically, the *.BKF files in your Backup to Disk folder should not be fragmented. If you have multiple jobs running and creating new files in this folder, you probably have large amounts of fragmentation - and thus slow read/write performance.

It's possible to run several concurrent jobs with Backup To Disk - and that's a good thing. Be aware that it is possible to have too many jobs running, and thus saturate your network connection. A typical gigabyte network should have no problem running 3 or 4 jobs at once. When in doubt, compare speeds with just 1 job running vs several.

General Tips For Faster Backups
The fastest way to backup a file is to not back it up. Review your selection lists. Are there files you don't need to backup? Check your selection lists for the following, and consider excluding them:
-Page files
-Hibernation files
-Temp directories
-Recycle bin
-"Documents and Settings" folder

For servers with many smaller files, do you really need to do full backups every night? Consider Differential backups.

Check for disk fragmentation on the source files.

Try to schedule backup jobs during times of low CPU and network usage of your source servers.

Throw More Hardware At The Problem
Your last resort should be buying hardware. Major performance problems typically won't be solved with new hardware, but it can help ease up bottlenecks. Some areas to consider investing:
-Is you server network Gigabyte? If not, it really should be.
-Are you using LTO1 or LTO2 tape drives? Each generation of LTO drives double the capacity from the previous version, and are faster than their predecessors. This also means fewer tape mounts & dismounts to take up time.
-Is it time to think about Backup-to-Disk? This option will cost more than you think, but solves a lot of problems interacting with physical objects and moving parts. (Tapes & Libraries)

If you run across something unusual in your environment that causes performance issues, share with the rest of the community your story! Your quick note may save someone hours of troubleshooting.

  This is valuable information
Great article! Very helpful information. The only correction or comment that I would share is that Backup Exec automatically skips Page files. So there is no need to manually exclude them.
This is helpful and, importantly, recent enough to be useful for today's technologies.

God I wish someone, somewhere would write a "best practices for backing up DFS-R" guide.  Sure, you see lots of "it can be slow" but there seems to be nobody actually tell us how best to address the issue.

With DFS-R services started I'm facing 2.6TB backup times of nearly a week using BE.  With W2k8's NTBackup it's down to 24 hours - but that's not possible to use at all when the disks are above 2TB.  What an utter nightmare.... Smiley Sad
It's by no means a "Best Practice" document, but the closest things I've found are below - and I wrote one of them.  Please share if you find anything better.