cancel
Showing results for 
Search instead for 
Did you mean: 

BE 12.5, Backup speed normal, verify agonisingly slow after running happily for months

SSheaf
Level 3
Hi Guys,

I am hoping some-one can point me in the right direction.  The backup (to LTO4 tape - SCSI) phase continues to run at normal speeds.  The verify is now almost randomly slowing to one quarter of the backup speed where previously it was in the order of 5 times faster.  This really causes issues as one job blew out to over 17 hours instead of less than 4 hours (normal).  This caused subsequent jobs to time out.  The verify is so slow that I will require over 24 hours to complete all the backup jobs in the queue (which repeat on a weekday basis). 

Manually running the jobs shows much the same behaviour, backup always at normal speeds, but verify running at anything from normal speed down to agnosingly slow. 

Up until about one week ago this slowdown in verify had not occured.  The hardware has not changed, and at least one of the tapes had been used only 5 times since new.  I have run a cleaning tape through the drive (about 9 months old) even though the drive had no indications that it needed cleaning (either under device properties - and drive LEDs). Zero hard read and hard write errors.

Only changes I can identify are software updates, both Windows and Symantec Backup exec (plus Trend Worry free anti-Virus).  The tape drive is HP External SCSI on a dedicated controller.  Disk subsystem is a local 3ware RAID 5 array (4 disks, all showing OK) on the same server.

So far I can find nothing in the event logs (etc) which might be related to this unacceptable behaviour.

Cheers
Stephen
13 REPLIES 13

Larry_Fine
Level 6
   VIP   
Any soft read or write errors?  Verify doesn't hit the original source disk, so you can basically rule that out and focus on the tape drive side.  What about temporarily turning off verify in the backup job and later doing a separate verify job?  I assume you have power cycled everything?  Does the CPU load look high during the verify?

SSheaf
Level 3
Thanks Larry,

Not only is the drive HP, so are the tapes!

You have partially confirmed what I had understood.  While I had the chance I restored a chunk of the data that was slow to verify.  For the record
152Gig of data backed up, with a backup speed of 778Mb/min.  BE verified that at 168Mb/min (normally around 4Gig/min)
11.7 Gig of that restored at 175Mb/min (overall speed, it peaked over 400).

CPU (quad core) did not appear loaded and there was about 700Meg of RAM free. 
Power cycled, not yet - does not go too well during the working day!  I will give that a day or too - I have to come in at night to power it down fully (tape drive is external).

Soft read errors is approaching 15,000, soft write errors is 75 (exactly).  I had not been keeping my eye on read write errors, so cannot tell what the trends are there.  However i will be watching over the next few days.  I will reconfigure the backup jobs to separate the backup and verify stages (making the assumption that this can be done) tomorrow.

Will keep you infomed

Cheers

Larry_Fine
Level 6
   VIP   
Soft read errors is approaching 15,000

That sure doesn't sound good, so definitely keep an eye on that.  You could possibly record the current number, run a verify of something today, and then check the number again without impacting your production environment.  If that number keeps climbing, you need to contact your hardware vendor or change media

That number could have jumped that high from one bad media a long time ago (depending upon when you installed BE or added this tape drive), or it could be steadily climbing, so monitoring it this week sounds like your best tool to help solve this mystery.

BTW, your usual verify speed of 4GB/min and a backup speed of 1/4 that indicate that you are likely bottlenecked at the source disk or network.  When you are bottlenecked at the tape drive end, then your verify speed matches your backup speed.

SSheaf
Level 3
Hi Larry,

With a single tape backup last night (as per normal)  soft write errors increased by 3, but soft read errors increased by nearly 2000 (ouch!) 
Today I have
1.  Shut down BE services on the server (with the tape drive)
2.  Switched off the tape drive (an advantage of it being external) for about 10 seconds or so
3.  Tape drive back on
4.  BE services started
5.  Run a cleaning tape through it (even though the indicators did not suggest it was needed)
6. Started a verify run on a tape from last week.

I had a quick look, at first glance it appears I cannot schedule a verify of whatever tape happens to be in the drive.  What I would like to do is complete the backup of all data, then verify the tape in a single run.  That way the data should be backed up, even when verify is playing silly beggars.  Although that begs the question of restore, reliability and speed.

Incidentally the fastest backup speed occurs when backing up over the network!  In some jobs speeds well over 1Gig/min are acheived (over the complete job).

The verify run has now been going 16mins and so far has recorded 50k, I suspect it is still looking for the correct tape location.  However that seems a little slow to me.  Will update when I know more.

Cheers
Stephen

Larry_Fine
Level 6
   VIP   
Correct, it is not possible to schedule a verify (or a restore) of a dataset that doesn't exist yet.  I was only suggesting that the verify be done manually once to see if that made a difference, most likely due to time shifting.

Your fastest backup MAY occur over the network if your network is good/quiet and the source files are large.  Large files backup faster than small files due to less overhead of traversing the file directory.  Disk fragmentation may also be a small factor.  There are lots of variables, but my point was that you were still running slower than the tape drive is capable of, so there is room for improvement if desired/needed/justified.

Do you have the latest DDI device driver set installed?

SSheaf
Level 3
Last night the backup/verify performed at about normal speeds (ie prior to the start of slow verify).  However the soft read errors increased by about 1500 and the soft write by 3.

At this stage I am concerned with the soft read errors as 1500 seems to be too much considering the total soft read errors is about 18000 (the drive having been in for perhaps 8 months).

I need to check the Driver set installed, but I am at a bit of a loss to explain why it should work without problems for 7 or so months and then start playing silly beggars, particularly when it is now so unpredictable.  It does make me wonder about the head on the drive, or perhaps there was something about the latest batch of tapes.

SSheaf
Level 3
Over the last few days I have worked with HP technical staff and they suggested I load the HP driver for the Tape drive.  (Installing Symantec Backup Exec loaded symantec drivers for the Tape drive).  So using Windows device Manager I "updated" the device driver to the one HP recommended.  (I did detour through removing Symantec drivers via BE12.5, but I had to reinstall them to allow BE to write to the tape.  I then re-updated the Device Manager driver to HP)  Initial results indicate that the symantec drivers may be at fault.  The question I have is why did the symantec drivers work properly for about 6 months then fall over?  Perhaps i will never know!

Thanks for the help Larry

PS.  On reviewing the slowed verify speeds I found sometimes the verify speed was less than one tenth what it had been previously, so the problem was actually worse than I had reported!

SSheaf
Level 3

The problem has returned - and the tape this time has been used a total of 15 times (one of the most used LTO4 tapes that i have) since new.  HP now believe that the drive itself is at fault and are sending me a replacement.  I will update this when I have some more info.

SSheaf
Level 3
Hi guys,

FYI:  a new tape drive arrived yesterday.  All the backup jobs last night (to a new tape) ran as fast as they have ever run.  Verify running in the order of 5000Meg/min on all but tiny jobs.  On Saturday (with the old drive) verify got down to 86Meg/min before I killed the verify activity.  Backup speeds have remained remarkably stable throughout the whole saga.

Yes, it is only one backup session, but it has been a while since I actually got a complete backup with verify on every job!
 

dougz
Level 4
I'm having essentially the same problem.  After 2 years of great performance, all of a sudden my 12.5 Backup Exec server is doing an EXTREMELY slow verify job even though the backup job completed at normal speed.  Additionally, all other backup and verify jobs that ran this past weekend ran at normal speed, so it doesn't make sense to me why I'd be having a problem with just this one particular verify job. 

Last week I updated the Backup I updated the Backup Exec installation, and one of the updates was a driver update, I believe, so I'll be focusing on reverting back to the previous drivers as my first step in troubleshooting this issue.  I vaguely recall back in the days of Backup Exec 8.x having the same exact problem and in the end it was indeed a driver issue.

I'll let you know how it goes.

-Doug

SSheaf
Level 3
Hi Doug,

The HP technician got me to try some driver changes, but that made no discernable difference except perhaps on the first couple of days after the change to using HP drivers for the LTO 4 drive rather than Symantec ones.  The thing that I noticed is that the verify speed was highly variable, on one cycle (tape) a job would be slow, but on the next backup cycle that job might verify at normal speed.  Out of six jobs within the same cycle it could be anything from one to all jobs verified slowly.

Overnight BE is set to do half a dozen different backup jobs to the same tape.  Tapes are changed every workday.

As you are probably aware the verify only checks that the tape is readable, it makes no attempt to check the contents of the tape with the original files.  Hence verify speed is really only affected by a few variables.

Currently the replacement drive has only had a problem when a tape was write protected, but then that is to be expected!!  ;)  The original LTO4 drive was not even a year old.

I would keep an eye out for any deterioration, my experience was single (and not entirely random) slow verifies interspersed with normal backups.  Over a month or two it got worse in both the number of slow verifies per cycle and the frequency of cycles being affected.   In some cases the slow verifies were even slower (eg 300Meg per minute down to 89Meg per minute).  I also noticed a rapid increase in the number of soft read errors.

I was running the latest updates of Backup Exec 12.5.

Cheers
Stephen

dougz
Level 4
Stephen, thanks for following up.  The problem I'm having definitely sounds like it's the same issue.  Like you, I'm running the latest updates of v12.5.  This weekend all jobs including verify jobs ran quickly, except for one verify job which crawled for no apparent reason.  I find it a bit fishy that the first time I'd have this problem would be right after updating Backup Exec with the latest hotfixes, but I'm aware that it could just be coincidental.  And of course I will also continue monitoring my jobs.  I just want to make sure I stay on top of this one and get it fixed before it becomes too huge of a problem.  The reason being because I'm putting about 8 TB (compressed) to tape per week (about 20TB uncompressed).   It's quite a bit of data, and I can't afford to get behind schedule! :)

Can you confirm what I think you're saying?  What I think you're saying is that in the end the problem for you definitely seems to have been the actual tape drive, not the driver.  Is that right?

I will start monitoring soft errors on a daily basis as well.  Will update this thread when I have more info.

Thanks again,

Doug

SSheaf
Level 3
Hi Doug.

In my case the problem disappeared when the tape drive was swapped over.  When I swapped the drives I did not even stop the server (external SCSI tape drive).  I simply stopped the services, disconnected the old drive, connected the new drive and restarted the services.  No tape/backup driver updates etc at this point or subsequently.  (Any updates and restarts since then have been the usual windows updates ones etc.)

I am also pretty sure I did things such as restarting the server etc prior to the final tests with the old drive, as well as swapping to the HP driver, checking for updates etc.

Yep, watch those soft errors.  I was getting thousands per night.  Hard errors barely budged.

Cheers
Stephen