BackupExec causes horrendous fragmentation

Ross_Smith · ‎06-01-2005

I am seeing incredible fragmentation on the drive we use to store our BackupToDisk files. I'm posting more details below, but if you are using B2D, could you please check the fragmentation level for me and let me know if you are also having this problem.

We have a 570Gb NAS device used as the main target for all of our backups. This device has been in use for approximately 6 months and is used almost exclusively for B2D backups.

After numerous problems with BackupExec, I happened to run a defrag analysis on this device, with the following results: http://www.averysilly.com/Defrag%20report.jpg.

Even after deleting 350Gb of B2D files, we are still getting 99% fragmentation reported with files scattered across the entire disk, and with a 1Gb b2d file in as many as 40,000 fragments.

I am still speaking with Veritas, we've no clear idea what caused this, but on at least one occasion I have set BackupExec to allow concurrent operations to this B2D device, and we suspect this may be the cause of this fragmentation.

We are defragmenting this device today to see if it fixes our problems, but I would be very interested to know if anyone else is seeing similar effects on their B2D devices?

RossSubject line updated 20/6/05 by:
Ross Smith

Richard_Grint · ‎11-10-2005

Ross
I don't know why XP should be different except that it may be related to how full the disk is. I can comment on how NTFS (on all windows 5.1 variants I think) works a bit and there is a factor relating to fragmentation but I don't think it directly affects BackupExec.

The FAT was replaced by a file called the MFT. There is one MFT record per file on your disk. To keep the MFT contiguous it lives in a reserved (possibly better described as shielded region of the disk volume) known as the MFT zone. This area is a percentage of your disk volumes size; the default is 12.5% and this is the minimum that can be set.

Using a tool fsutil you can set the MFT zone to a larger reserved area i.e. 25%, 37.5% or even 50%. You can't shrink the reserved area. Our MFT is tiny compared with this zone because all our files are 1Gb Backup exec files; the MFT zone was designed for small sub 100kb average file size.

The MFT zone is that strange empty area in defrag GUI that files just don't seem to reorg into neer the begining of the disk. It is less obvious on mixed use high utilisation disk because eventually files get forced into there.

The issue for BackupExec users is that the default/minimum seems to have been set for disks that host 'normal' file systems with many, fairly small files. It performs poorly if you have a few very large files e.g. Backup Exec or a database system because the MFT will be small but a large area of the disk is then protected from normal file placement unless all space outside the MFT zone is exhausted.

However this issue affects the performance of defrag tools more than it affects BackupExec (at least for us) i.e. we get fragmentation even on an empty file system caused by writing 16 backup files in parallel (Concurrent Operations is set to 16 for us). This means that each space request to the OS gets a small bit of disk space and then 15 requests for other files then occur before the next for that file. It affects defrag because defrag can't use the MFT zone to reorg the disk.

Also re MFT yes there will be seeks on write to disk and for that matter to the NTFS Logfile but for reads (during a subsequent duplication to tape) because the MFT is small it will be cached and therefore seeks won't occur.

As far as I can see the underlying issues is that BackupExec asks the OS for very small incremental space allocations. This should be configurable in BackupExec in a similar manner to SQL Server which would have the same problem if it didn't have the config setting.

We are still running with the Python script from my previous posting but have now added a Perfect Disk defrag (which runs for 12 hours) at the weekend to consolidate free space for use during the week. With this in place our daily backup performance is 1.5 hours for a 90Gb backup to disk, 5 hours defrag and about 2.5 hours to roll off the backup to tape. It would be good to eliminate the 5 hour defrag (and the weekend 12 hour defrag)!

Peter_Ludwig · ‎11-10-2005

> Sorry Peter, there's no registry entry for this setting,

Yes, I know, but there should be one in my opinion. Like for other settings.
Most users will not care, but some could change it....

greetings
Peter

Evan_Waite · ‎11-18-2005

Hey Everyone,

Sorry for rehashing a thread this long and old but I expect a number of people are interested in this topic and therefore this info.

Veritas/Symantec has officially acknowledged the fragmentation problems with Backup-To-Disk (B2D). They have released a support document with information on dealing with fragmentation and tips to avoid problems. They also state they are looking into long term solutions.

Here is the support document: http://support.veritas.com/docs/280072

Peter_Ludwig · ‎11-30-2005

Yes, Evan, but their hints are not really helpful. How could I stay with one B2D folders per volume? How could I keep 30% free? How could I switch down to a single concurrent operation per B2D folder?

BTW, I had to copy a couple of 100GB from tape to disk and back and I observed that writing to disk comes to > 1100 MB/min, but the other way round was only 700MB/min.

greetings
Peter

Aaron_Payne · ‎12-01-2005

Found out we had this problem too just the other day. Opened a tech support case (...sigh) and finally got it escallated to their Advanced Support group (THANK YOU CDW!).

Anyway - have a LeftHand IP/SAN (ISCSI) and have the same darn problem everyone else here does. 45% Total Fragmentation and 99% File Fragmentation after just one backup on a new volume (158GB on a 700GB Volume). Shot right backup up to 45% total and 99% file.

Obviously this is completely unaccaptable. I don't get this kind of fragmentation with NTBackup to the same volume with the same data being backed up so don't try and tell us it's not BE.

Anyway - if you decide to open a ticket you can reference Case 290-11-934. Maybe that will help you. My tech support guy is supposed to be working on this so we'll see. I hope he doesnt come back with a reference to that goofy tech note.

Ross_Smith_2 · ‎12-02-2005

Hi Aaron,

Thanks for reporting that - Veritas have very few records of people with this problem, so every single call logged helps at the moment.

I've spoken to the advanced support team in the UK a few times about this problem so I know they're aware of it. I don't think they'll have a fix yet though - this is still being looked into by Veritas' software engineers in the states and I'm still waiting for them to come back to me with an eta for a fix.

Ross

richard_zuckerm · ‎12-02-2005

are you using virtual tapes on the REO? Overland says that you should not get a frag problem with them?

Jim_Nowotny · ‎01-06-2006

Veritas,

Is a "fix" being worked on? Or do I just need to move on? If one is coming I will sit tight and deal with it.

I have done some informal D2D testing since getting burn by this. Here is what I have found so far.

BackupExec 10d, lots of fragmentation, much slower than tape
Arcserv 11.5, less fragmentation, but still only slightly faster
Netvault 7.3, much faster, space is pre-allocated in a virtual tape library, still not as fast as tape, but close.

Next up will be Netbackup 6.0.

I really like BackupExec, I guess I’m just use to the UI. It just kills me to see the marketing on this. I can only conclude that marketing is telling us a bold face lie or more likely they are just clueless. I bet the engineers in the test department knew of this problem all along, but nobody listened!

Again if a fix coming, I stick with it, if not I got go.

Jim…

GovGeek__ · ‎02-15-2006

RANT!! It is amazing to me that vomettoss is neglecting this issue. I'd call and initiate a ticket; however every time I call, it's like a blame game. And then they try and get me to blow up my installation. I've actually spent nine straight hours on the telephone with what can only be described as a training session to Server 2003 for Veritas. I never knew I could speak three different languages.

TECHNICAL... Okay, not really, but... We have a 10 TB data store, fibre attached, that has huge fragmentation after just three days of backing up approx. 60GB each night. What usually takes 1.5 hours when backing up to AIT-3, now takes 13 hours on B2D. Rather than use a third party app to defrag, I wrote a small VBScript that you just direct Microshafts Scheduler to. While this is a horrible solution, it get's the job done in an automated fashion.

Anyway, I thouht I'd chime in and let others know that I am yet another unappreciated, underestimated, and dissatisfied customer. They'll probably remove this anyway.

-GovGeek
= )

Evan_Waite · ‎02-15-2006

I feel your pain GovGeek, we had to purchase a third party defrag program just to deal with this. It runs during the day (pretty much all day) and removes roughly 85,000 fragments per day. I don't imagine that's good for the long term reliability of the hard drives in that RAID set :(

As others have stated, the problem stems from Veritas treating B2D like a tape drive and sending small chunks to it. If they preallocated (or at least gave us an option to allocate) the space and then wrote to it, things would be much much better.

I'm just surpised you ran into any blame game with their support, they DID release an support document stating they have this problem.

-Evan

H_Vincent_Latus · ‎02-16-2006

I've just updated to the new Symantec version, hoping that would fix the issue, but I haven't had the opportunity to investigate further.

GovGeek__ · ‎02-16-2006

Yes, they did release one statement regarding the issue. However the solutions are unrealistic and they still try and suggest other apps/situations are to blame. It's like they won't take full ownership of the situation. This has been my experience in my six years of using their product. This isn't my first BBQ... ;)

The real problem is in the way that they manipulate Microshaft Windows so extensively. The product is extremely intrusive to the OS.

Anyway, it would be great if there was some sort of handler or container that receives the smaller packets, repackages them properly, and streams them to the data store in one contiguous file. But then you�d have to catalog the additional process also.

Russ_Perry · ‎02-16-2006

The fragmentation is caused by the method of writing the Backup to Disk files in Backup Exec... The disk writes are simple Windows filewrite operations, sending a block of data at a time to the B2D destination as opposed to pre-allocating a location on the target drive the size of a full bkf file, then filling it up. This by itself can cause fragmentation especially if the destination is being used actively for other file storage, a single B2D bkf file can be spread all over the place. In the situation where you have multiple jobs or BEWS servers feeding B2D folders on the same drive, the problem is compounded since alternating blocks of data are being written to disk by each job/server. I would imagine that the ability to pre-allocate the full B2D bkf size will be explored for a future version of BEWS... In the meantime, short of opening support cases, enter your findings on the enhancement.veritas.com website (still correct with symantec).

see http://seer.support.veritas.com/docs/280072.htm for more info.

Ross_Smith_2 · ‎04-03-2006

Hey everyone,

I'm testing the beta of BackupExec 11 right now and there's a new checkbox in the B2D folder properties: "Preallocate files to the maximum size". The help documentation states that this is to reduce fragmentation, and veritas has confirmed that this change is designed to solve this problem.

So the good news is that despite the complete lack of communication they have at least been working on this. God only knows how we're going to solve the fragmentation already on the drives though. The last time I tried a backup on a half full NAS box it took so long it was actually easier to rely on my tape backups for a week and just re-format the NAS box.

Ross

Peter_Ludwig · ‎04-03-2006

Quoting Ross:
>I'm testing the beta of BackupExec 11 right now and there's a new checkbox in the B2D folder properties: "Preallocate files to the maximum size".

Sounds good and could avoid theissue that backups run out of space (because thejob will not start atall if there isnot enough space). But this also means that you need pre-scan option enabled (= time!).

(BTW, I had to agree do not speak about new features in the V11, but was testingother things).

greetings
Peter

Ross_Smith_2 · ‎04-03-2006

Hi Peter,

I'm not sure if it'll help that I'm afraid, what you're suggesting seems to be that they're pre-allocating all the .bkf files in advance.

I don't believe that is how this works, I think Veritas/Symantec are still creating .bkf files as needed during the job, they have just made a change so that as these files are created, they occupy their maximum size. That's all they need to prevent these files becoming fragmented.

I've done a fair bit of testing and haven't seen any fragmentation with this option turned on.

I had to agree to the same NDA as you, but after asking Symantec, they allowed me to post this info here since we've all been waiting for this for a long time.

Ross

H_Vincent_Latus · ‎04-05-2006

So Ross, how is the new feature working? Does it seem to correct the issue? Also, do you know how far away they are from releasing it? My B2D jobs were taking so long, I've now abandoned B2D. I'm hoping that a solution can be found soon, the ease of B2D restores was very nice.

Ross_Smith_2 · ‎04-05-2006

I've no idea when v11 is likely to be released I'm afraid. Ultimately I'm just an end user like the rest of us here (I've just spent a lot of time pestering veritas *grin*).

However, in my experience it's unlikely that this will solve your speed problems. Generally it's possible to get a good speed out of B2D even with the fragmentation issues.

I've found real problems if you're using software RAID-5 - a couple of our nas boxes used a Windows 2000 OS with software raid, that simply fell over and died with the kinds of data transfer veritas use. We had to re-format as stripe sets.

I've also heard of some people having problems that were down to the brand of drive or raid controller. Obviously network performance can have an effect, as can the motherboard interface to your raid controller. (we're using a 64bit PCI card, standard PCI slots simply don't have a high enough throughput).

If you've slow performance the best test you can do is a straight forward file transfer using windows explorer. That should highlight any performance problems. I would expect you'd find it's a hardware or configuration issue external to BE.

It's a pain to troubleshoot without knowing what kind of rates you should see, so here are some examples of the rates we see here:

1. Server backup to local NAS - 1017MB/min
This is backing up from a server with a 9 disk raid array to a local 5 disk array, across a gigabit network.

2. Workstation backup to remote NAS box - 175MB/min
This is backing up from a remote machine with a single drive to a NAS box with a 4 disk array. We have a 100Mb/s network link to the backup server and gigabit to the NAS box.

3. Local backup to a tape drive (HP Ultrium 215) - 570MB/min
This is backing up from the local drive (5 disk raid array) to a local tape backup drive. We're getting the maximum expected rates from this drive.

So our fastest NAS backup runs about twice as fast as our tape drive, yet our slowest runs at about an eighth of the speed. However I believe all these figures are optimal for their situations.

As an aside, I'm very curious to see whether the NAS backups are any faster now they've solved the fragmentation issues... I bet Veritas that they'd see a significant gain, but I won't know if I'm right or not until they release v11 and I can install it on our production server.

Ross

Ross_Smith_2 · ‎04-05-2006

PS. After 250 backup operations on our test server, we have zero file fragmentation with BackupExec 11.

Paul_Yip · ‎04-05-2006

After running some tests with Ross, I have come to the conclusion that the problem was not just the fragmentation that caused the slow backup to disk performance.

We are using a PCI-X RAID card but are running it off of just a regular PCI slot. That's why our performance isn't that great to begin with. If we were using it on a PCI-X slot, I'm sure our performance would be better backing up to disk. But the problem I was seeing was that after using BE10 for a while, our backups started to run slower.

After rebuilding our server, the speeds came back up to ~300mb/min. and have stayed like that for a while. I find it odd because it was at ~300mb/min before but speeds dropped slower and slower as time went by. So that may be caused by the software/defragmentation because nothing else was changed and after wiping out the array and starting over again, the speeds were the same.

VOX