cancel
Showing results for 
Search instead for 
Did you mean: 

Backup to Disk - Extremely slow verify (BE 2010 R2)

Mary_Land
Level 3

Hello everyone,

I encounter extremely slow verify rate when doing backup-to-disk.

 

Media server:

-Hardware:  HP ProLiant DL360 G5 (2 x Xeon 5140, 18 GB RAM, Smart Array P400i with Battery Back Write Cache and brand-new battery, 4 x HP 600GB SAS-2 6Gbps 10,000 rpm drives in a RAID 5 array, 512MB strip size => 1.5 MB stripe, 2 logical drives, HP SCSI HBA which is ATTO ExpressPCI UL5D LP)

-OS:  Windows Server 2008 R2 Standard, all updates up to and including February 2011Backup to Disk target:

-Tape drive:  Tandberg 820, SCSI

 

-Local logical drive on media server mapped to drive letter D, 1.5 TB, NTFS file system, 64K allocation unit size

Backup Exec version:  2010 R2 (13.0 Rev. 4164), with Service Pack 1 and Hotfix 150096 installed

Backup to Disk Folder settings:

-Maximum size for backup-to-disk files:  4 GB (also tested with 1 GB)

-Allocate maximum size for backup-to-disk files:  YES

-Maximum number of backup sets per backup-to-disk file:  100 (application default)

-Concurrent:  1

Results of running B2D Compatibility Test Tool (B2DTest_64.exe) on B2D target:  ALL PASSED (I examined the log in C:\ProgramData\Symantec\BackupExec\Logs\B2DTest.log and saw that all 16 tests passed.)

 

Media sets:

-Weekly:  13-day overwrite protection, 1 hour append (just a formality, since there's no option to set it lower)

-Daily:  6-day overwrite protection, 1 hour append (same as above)

 

Remote server being backed up:

-Hardware:  HP ProLiant DL380 G7 (1 x Xeon 5620, 36 GB RAM, Smart Array P410i with 512MB Flash Back Write Cache, 8 x HP 300GB SAS-2 6Gbps 10,000 rpm drives in 2 arrays - RAID 1 for OS, RAID 5 for data, 2 logical drives)

-OS:  Windows Server 2008 R2 Standard, all updates up to and including February 2011

-RAWS (x64) which was push-installed from BE media server

 

Network:  Both the media server and the server being backed up have Gigabit NIC, both are connected to the same HP ProCurve 2910al switch (Layer 2/3 managed switch), both NICs and ports on the switch are forced to 1000, no auto-negotiation.

 

I originally set up a policy to have Full backup (Copy files) running on each Friday evening which will back up the entire remote server described above with verify, then a duplicate job linked to this Full backup job will run immediately afterwards to duplicate the sets to tape (LTO Ultrium 3).  During the week Monday through Thursday a recurring job will run to back up Working Set (changed today).

When I was test-running the Working Set job last week (to disk), it took some 3 hours to back up/verify 24 GB.  I examined the job history and found that the verify speed was the culprit, as slow as 282 MB/min.  The rate for various sets varied between 261 MB/min - 1152 MB/min.

In terms of backup speed, I thought the network was the bottleneck, so I performed a test by copying a set of 11 files (a total of 6.42 GB, which consists of the DVD ISO for BE 2010 R2 which is 3.3 GB, plus those 1of4, 2of4, etc. splitted files) from a Windows file share on the server being backed up to the media server (stored them in the same folder as the BKF files).  Speed was ~6600 MB/min.  Then I created a test backup job in Backup Exec with AOFO (VSS) and no compression, and that job ran at 3988 MB/min backup rate (and decreasing), but a mere 378 MB/min for verify (yes, I know verify is performed locally, still the 6600 MB/min copying using SMB v1 (for testing I disabled SMB v2) vs. 3988 MB/min backup rate shows that something is wrong, probably with BE.

Now, in terms of verify speed, I know that BE attempts to read the sets from the BKF files and calculate checksums to compare with stored checksums.  I know that RAID 5 is slow, but it can't be that slow.  Something is seriously wrong with BE if the verify speed is 378 MB/min average.

During verify operations, I checked the CPU and memory usage on the media server, and there's no spike whatsoever, CPU barely even hit 1%.

I have run both ATTO Disk Benchmark Tool and IOMeter to benchmark the RAID and can't see anything wrong from the results, so it can't be the disk subsystem.

At this point, I'm so frustrated that I don't even know what else to do, short of NOT verifying.  I had just rebuilt the media server from scratch and tested again, pretty much same results.

If anyone could chime in and point me to the right direction, it would be much appreciated.

18 REPLIES 18

pkh
Moderator
Moderator
   VIP    Certified

Have you tried defragmenting the drive where your B2D folder resides?

Mary_Land
Level 3

I forgot to mention that there is no dedupe involved here, since we never purchased the deduplication option for BE.

Just straight B2D.

JoaoMatos
Level 6
Partner

Hi,

what is the performance when you copy data to the B2D drive?

Try disable antivirus checks is not active for writing.

Mary_Land
Level 3

Hi pkh,

I checked for fragmentation, and there is no fragmentation (since that entire logical drive is dedicated to the B2D folder, and all BKF files were created with maximum size pre-allocated).

Thanks.

Mary_Land
Level 3

Hi JoaoMatos,

No anti-virus is currently installed on either the media server or the (remote) server being backed up.

I am running Microsoft SQLIO Disk Benchmarking Tool on the B2D drive now (sequential read, sequential write, random read, random write).  I will post the results shortly.

Mary_Land
Level 3

 

The SQLIO disk benchmarking results on the drive (RAID 5) where the B2D folder resides are posted below.  I know that SQLIO and other benchmark tools are synthetic, but still the results (throughput, latency, etc.) look acceptable.
 
I still don't know why Verify takes so long if what it does primarily involves reading the data back from a drive which is local.  Had it involved deduplication, that would have made sense because in that case a full rehydration of the files had to be performed (at least that's how I understand it) which would resulted in a significant slow-down.  In my case, as I have mentioned, there's no dedupe at all.
 
----------------------------------------------------------------------------------------
 
SEQUENTIAL READ 64K (64 KB I/O request size, 2 threads, multiple I/Os per thread enabled with 64 outstanding requests per thread, running continuously for 10 minutes)
 
sqlio -kR -s600 -fsequential -o64 -b64 -LS -Fparam.txt
 
CUMULATIVE DATA:
 
throughput metrics:
IOs/sec: 12647.58
MBs/sec:   790.47
 
latency metrics:
Min_Latency(ms): 1
Avg_Latency(ms): 9
Max_Latency(ms): 27
 
---------------------------------------------------------------------------------------
 
SEQUENTIAL WRITE 64K (64 KB I/O request size, 2 threads, multiple I/Os per thread enabled with 64 outstanding requests per thread, running continuously for 10 minutes)
 
sqlio -kW -s600 -fsequential -o64 -b64 -LS -Fparam.txt
 
CUMULATIVE DATA:
 
throughput metrics:
IOs/sec:  7349.04
MBs/sec:   459.31
 
latency metrics:
Min_Latency(ms): 1
Avg_Latency(ms): 17
Max_Latency(ms): 503
 
---------------------------------------------------------------------------------------
 
SEQUENTIAL READ 256K (256KB I/O request size, 2 threads, multiple I/Os per thread enabled with 64 outstanding requests per thread, running continuously for 10 minutes)
 
sqlio -kR -s600 -fsequential -o64 -b256 -LS -Fparam.txt
 
CUMULATIVE DATA:
 
throughput metrics:
IOs/sec:  3217.58
MBs/sec:   804.39
 
latency metrics:
Min_Latency(ms): 5
Avg_Latency(ms): 39
Max_Latency(ms): 67
 
---------------------------------------------------------------------------------------
 
SEQUENTIAL WRITE 256K (256KB I/O request size, 2 threads, multiple I/Os per thread enabled with 64 outstanding requests per thread, running continuously for 10 minutes)
 
sqlio -kW -s600 -fsequential -o64 -b256 -LS -Fparam.txt
 
CUMULATIVE DATA:
 
throughput metrics:
IOs/sec:  1860.14
MBs/sec:   465.03
 
latency metrics:
Min_Latency(ms): 13
Avg_Latency(ms): 68
Max_Latency(ms): 105
 
---------------------------------------------------------------------------------------
 
RANDOM READ 64K (64KB I/O request size, 2 threads, multiple I/Os per thread enabled with 64 outstanding requests per thread, running continuously for 10 minutes)
 
sqlio -kR -s600 -frandom -o64 -b64 -LS -Fparam.txt
 
CUMULATIVE DATA:
 
throughput metrics:
IOs/sec:  2928.54
MBs/sec:   183.03
 
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 43
Max_Latency(ms): 642
 
---------------------------------------------------------------------------------------
 
RANDOM WRITE 64K (64KB I/O request size, 2 threads, multiple I/Os per thread enabled with 64 outstanding requests per thread, running continuously for 10 minutes)
 
sqlio -kW -s600 -frandom -o64 -b64 -LS -Fparam.txt
 
CUMULATIVE DATA:
 
throughput metrics:
IOs/sec:  7340.28
MBs/sec:   458.76
 
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 17
Max_Latency(ms): 2089
 
---------------------------------------------------------------------------------------

Mary_Land
Level 3

 

With respect to the drive where the B2D folder lives, what I will do this weekend is to use HP Array Configuration Utility to delete the RAID 5 drive and use the same space (exact same space on same physical disks) to create a RAID 1+0.  In other words, the B2D folder will live on a RAID 1+0 drive, while everything else is kept the same.  I will manually force the same jobs to run (with Verify) to see whether it would make a difference.
 
I will also use SQLIO with the same parameters to benchmark that newly created RAID 1+0 B2D drive and will post the results here.
 
To be quite honest, if Verify is read-intensive, I don't expect much of a difference in read performance when changing the B2D drive from RAID 5 to RAID 10, because from what I understand, RAID 5 only takes a performance hit when writing (because of the parity calculation), whereas the only thing RAID 5 does when reading is some sort of block redirection (due to striping) and no calculation is involved.

Mary_Land
Level 3

When reading contents back from those BKF files (in order to calculate/compare checksums), what type of read (Sequential or Random) does a Verify operation perform on that B2D disk?

Could someone confirm?  Thanks.

Dariel_Cruz
Level 3

am running into the same issue, seems to be related to reading from the B2D, Writing seems fine. I have a fibre channel disk array and tape drive combination, the backups to the disk are fine, the backups to the tape directly run fine, but from B2D to tape are bad. I have tried everything I could think of including updated drivers and flash updates for hardware, nothing seems to work, I ran out of ideas, I see the disks usage at 100% when backup exec reads from it yet the speed sucks. 

TadSend
Level 0

I upgraded from BE12.5 to BE 2010.  The DLT tape drive used to do 500 MB/min.  Now it's down to 115 MB/min!

I installed a removeable HD.  This will backup at 1016 MB/min but degrades down to 115 MB/min or so after some hours.

Tape Backup used to take 5 to 6 hours for BU and verify on the DLT tape drive.  Now it take at least 20 hours unless I give up and cancel first.

This is a major problem for me.

I have tried to change the Quantum Tape drive driver from Symantec to Quantum.  No apparent change.

Do anyone have any suggestions please?

Tad Sendzimir

Dariel_Cruz
Level 3

After much teaking and testing I have narrowed it down to the verify only portion of backup exec, i dont know what it does when it verifies but it is not "just" reading the data back, I have tried verifying from a B2D and my disks go nuts at 100% i/o while the speed reported by backup exec is bearly scratching what these disks can do and the time it takes is brutal, yet a restore job from the same B2D gets good speeds, I think symantec needs to look at whatever verify does other than just the reading data part.

GA_Dave
Level 3

I have the same problem, but I've noticed that when the job is run to an empty B2D folder, the verify runs properly fast.  Then the second (or third or fourth, I think) time the job runs and overwrites the data, the verify takes forever.  My patchwork solution is manually emptying the B2D folders for our nightly backups (Associate to the Retired media set, delete from BE, then delete the data from the disk itself), and I'm trying to get support on the line about this.  I was working with a guy a week ago, but haven't heard back since I found that writing to an empty folder makes it go the normal speed.

I know it's an older thread, but better late than never, right?

runner724
Not applicable

I have a similar issue with BE 2010 R2.

I'm using B2D to backup a mere 5 GB of data over the network to a Buffalo NAS.  The write operation itself averages 200 MB per minute.  But the verification takes exactly 4.00 MB per minute, as reported by the Job Monitor.  In addition, an incremental job was stuck on verify for over four days.  Verification takes so long that it exceeds the allowable 23 hours and 59 minutes for a job to complete, and none of my full backups are finishing in time.

The only way I can get the system to work is to disable verification.

Does anybody else have some insight to this?

Lars_Skogshus
Not applicable

Hello I have the same problems.

Backup at 3-4000 MB/Min and verify at 50 - 200 MB/min

Did any of you get rid of the problem?

For me it is the same issue in 2010 R3

navar_holmes
Level 5

I have the same slow verify problem on multiple jobs/servers with the verify part.

The Big Question?  How to speed up verifies?

BU servers are 2008 R2 with BE 2010.

BU are Dell515 with internal (10) 2TB near-line SAS in a raid 5.  Lots of horsepower.

Started migrating my user folders to a new 2008 R2 server.

B2D file size at first was 100GB.

Fulls on Sat and Diff on other days.

1st full backup: Byte Count-142GB, Job Rate-1,623MB, Elapse Time-1:57hr.

2nd full backup: Byte Count-155GB, Job Rate-1,450MB, Elapse Time-2:30hr.

3rd full backup: Byte Count-158GB, Job Rate-1,361MB, Elapse Time-2:42hr.

4th full backup: Byte Count-163GB, Job Rate-795MB, Elapse Time-13:03hr.

All other full backups run able the same as the 4th full backup.

Then after the B2D job has ran I duplicate that job to a Fiber Channel LTO-4 Dual Tape library.

1st Dup2Tape: Byte Count-142GB, Job Rate-3,310MB, Elapse Time-1:10hr.

2nd Dup2Tape: Byte Count-155GB, Job Rate-2,997MB, Elapse Time-1:17hr.

3rd Dup2Tape: Byte Count-158GB, Job Rate-2,831MB, Elapse Time-1:21hr.

4th Dup2Tape: Byte Count-163GB, Job Rate-849MB, Elapse Time-3:31hr.

 

I thought about Defraging but lets face it.  You can't defrag a 10TB volume.

The pattern I see on my two BU server is any backup job that is over about 150GB has the slow verify proccess.

 

I have another file server that is server 2003 with SQL 2005 also installed.

When I do a full backup of it's data (not a SQL job) it always takes about 1hour for 94GB at 1,706MB.

And the Dup2Tape takes about 41min at 4,243MB.

 

So the larger that data being backed up the slower the job.  I would expect this but going from 142GB for 1:57hr to 163GB for 13:03hr is not normal.

Here is the something else.

As i said that i have been migrating user folder to a new server.

I still need to backup the old server (2003).  The backup jobs have not changed. Still slow.

full backup: Byte Count-369GB, Job Rate-815MB, Elapse Time-14:22hr

GA_Dave
Level 3

Since there are a couple newer posts, I figured I'd share our extended experience. I set up a recurring erase job to wipe the B2D files before the new job runs (since B2D files maintain the same name when overwritten, a recurring job can be set; IMG files can't be automatically erased). We didn't see any extremely slow verifying after that.

Another thing we did, moreso on accident when re-creating some jobs and B2D folders, was split up our destination folders. We were writing a file backup and a database backup at the same time to the same B2D folder; after splitting those into two separate folders, the speed of the writing was still the same, but I forgot to put the recurring erase job back in place, and the verify still went at normal speeds, even when overwriting. Maybe it's just what navar holmes observed above, that when you get above a certain threshold going into one B2D folder, the verify slows to a crawl (though I'll note that separately, our jobs are ~275 and ~325 GB).

I'm far from a BackupExec expert, but hopefully this helps people A)get around their problem or B)fix the problem itself :)

navar_holmes
Level 5

For my file server I also created a new B2D folder and set the file size to 8GB.

I am running both Full and Diff to the folder.  The only thing that is being shared is the media set.

FYI I have a B2D folder for every backup job.  But i do share the the media set.

There are 60 total jobs and half of them are Dup jobs.

Here is my primary media sets.

1WeeksFull - Overwrite=1 Weeks, Append=12hours.  No vault rule.

2DaysDiff - Overwrite=2 Days, Append=12hours.  No vaiult rule.

 

But the two slow file server jobs don't use the media sets or B2D.

If the verify is the slow part?  What is happening during a verify?

 

SkipKnees
Not applicable

I am using external B2D USB hard drives since my tape drive took a dirt nap.

Like others my first few backups went like a breeze and suddenly I had jobs timing out during the verify operation.  I also found that doing a test restore the job took a very long time to start putting data back in the redirected location.  So long in fact that I stopped the job before it had processed anything.  ~10 mins of nothing...

Here is what I did that while it may not have fixed the problem it seems to have helped.

Browse to the backup to disk device in windows, right click and select properties, on the general tab uncheck "index this drive for faster searching" and apply that change to the drive and subfolders.

Directly after that and a Backup Exec service restart I was able to run a restore faster than I have any record of in my job log.  I am running a test backup now to see how long the verify portion of the job takes.

I had to cancel my test backup to not interfere with live data during the work day.  I will try to get back to this board to post an update.