Hi Chris, I responded at

Chris_Garrett · ‎12-18-2012

Hi,

Netbackup Accelerator is excellent - we are using it along with some 5220 appliances and Netbackup 7.5 - but I do have a question about the way it works.

This rather good document covers a lot of the technical details: http://www.symantec.com/business/support/index?page=content&id=HOWTO67421

However, one bit of information that is missed out is exactly what data about the filesystem is stored in the track log, and how the track log is compared against the filesystem so quickly. Does it use the same techniques on ext3 and NTFS for example?

Thank you if you have any information on this. This question arises more out of curiosity than necessity.

Chris.

RLeon · ‎12-18-2012

Apart from what has already been covered in the Nbu Admin guide vol1, this following is also a good read:
http://www.symantec.com/connect/blogs/frequently-asked-questions-netbackup-accelerator

one bit of information that is missed out is exactly what data about the filesystem is stored in the track log

From the article you linked to, it says this:

The NetBackup client sends to the media server a tar backup stream that consists of the following: The client's changed blocks, and the previous backup ID and data extents (block offset and size) of the unchanged blocks.

So the tracklog tracks block level changes (and "unchanges") on the volume since the last backup.

and how the track log is compared against the filesystem so quickly

In a traditional incremental backup without the use of Accelerator, the Nbu client would have to check evey single file on the client's volume to see which ones have been changed since the last backup, via the archive-bit or time stamp. Say the client has a total of 100 files, then during an incremental backup, all 100 files will have to be enumerated and checked. This takes alot of time.

With Accelerator, the changed blocks since the last backup are tracked. So, in the next incremental backup, only these changed blocks will be sent. On the file system level (one level up), only the files associated with the changed blocks will be checked by Netbackup for cataloging (To record filenames, locations, etc.). Using the same 100 files example, effectively, less than 100 files will have to be checked, which makes the process very fast.
This works with full backups too, only this time, all 100 files will be cataloged. The rest is the same (only changed blocks will be sent, etc.)
Accelerator uses storage-server-based "Optimized" synthetic backups to create fully cataloged backup images out of existing deduplicated data. More from this discussion here:
https://www.symantec.com/connect/forums/difference-between-synthetic-backup-optimised-synthetic-backup-and-virtual-synthetic-backup

Does it use the same techniques on ext3 and NTFS for example

Yes, apart from when you enable the use of the NTFS change journal (USN journal), which makes Accelerator even faster. Essentially, it is NTFS's own change log tracking.
With the NTFS change journal, during the step where Accelerator tries to associate the tracked changed blocks with actual files, instead of doing the investigation and associations itself, the file list will be readily available from the NTFS change journal.
There are special considerations for using the NTFS change journal. Please refer to the Nbu admin guide vol1 for full details.

Chris_Garrett · ‎12-18-2012

Thank you for the swift reply RLeon!

The technology is somewhat clearer to me, although my understanding was that the track log effectively records the point in time of the filesystem as of the last backup, and gets updated whenever a new backup is run - it doesn't get bigger if you have more changes (it only grows if there are more files or bigger files), and there are no processes running in the background between backups to record filesystem changes. Surely the log therefore records metadata about the filesystem rather than changes?

I'm still not sure how Netbackup Accelerator works out exactly what has changed so quickly though. It can't be performing a full block-level scan of the filesystem - if it was, I couldn't perform a 700GB backup in 5 minutes on SATA disks... Maybe it checks inode tables or something similar?

RLeon · ‎12-18-2012

It can't be performing a full block-level scan of the filesystem

I'd surely hope not.
The reason why it can do this so effortlessly is because of this (Not 100% factual. Accelerator is proprietary after all):
Say your volume has a total of 1000 blocks, then Accelerator would have a table of 1000 zeros. Note that this table does not store any of your data. It is just a bitmap, with each entity representing a block on the actual volume.

The important thing to note is that the bitmap does not actually have any of your data. It only has a map that represents the actual blocks. The larger your volume is, the larger this bitmap is, proportionately.
Note that whether you have alot or little actual data stored on the file system does not affect the size of this bitmap. Even if you have only used 50 blocks out of the 1000 of the entire volume, the bitmap is still 1000 in size.

Every Nbu triggered backup would reset everything back to zero on this bitmap, and from this point on, every changed block (I.e., a disk write) will be recorded by having the block's associated zero changed to a "one", and so on.

Come the next backup. Only those blocks that have a "one" on the bitmap will be sent.

Chris_Garrett · ‎12-19-2012

Hi RLeon,

thank you for your further reply. The trouble is (and I'm not trying to be difficult here - I may just not have understood something correctly) that this is not how the accelerator appears to work when you look into it further, for two main reasons:

1. The track log size depends on the number of files and the total used disk space, rather than the size of the volume. The formula I found somewhere is (number of files x 200)+ ((Total used disk space/128KB) * 20) = size in bytes

2. When you say "Every Nbu triggered backup would reset everything back to zero on this bitmap, and from this point on, every changed block (I.e., a disk write) will be recorded by having the block's associated zero changed to a "one", and so on." you imply that the track log gets updated in between backups. However, Symantec say that there are no processes or demons running in between backups that update the tack log. The FAQ referenced earlier in this thread states "The track log comes to action during the backup and is populated with entries that are used by NetBackup Accelerator to intelligently identify changed files and segments within the changed files."

RLeon · ‎12-19-2012

and I'm not trying to be difficult here - I may just not have understood something correctly

No worries with that. It is actually good to have found someone who is as curious as I am about the nitty gritties of the behind the scene stuff!

I do apologise if I appeared to be passing off my personal understandings as factual absolutes. It must be emphasised that what I said about the topic is what I have perceived as how it works, based on the available documentations and my existing knowledge of how similar and related storage technology works.

The track log size depends on the number of files and the total used disk space

This and the formula are completely valid. It was the block bitmap inside the track log I was referring to. And since the track log also contains other data about the file system, then understandably, the more files there are the bigger the track log would get. However, the size of the block bitmap is still limited by the total number of blocks the volume has.

you imply that the track log gets updated in between backups. However, Symantec say that there are no processes or demons running in between backups that update the tack log.

You made a very good point here and I have no answer to that. I noticed the following from the same document where you got the track log size formula from:

NetBackup compares the checksums in the track log against the current file system to detect changes.

Which would imply that it actually does a full scan of the file system during each subsequent backup. This contradicts the notion that one of the reasons behind its speed is that it doesn't have to.

Perhaps someone with more insight on the matter would comment on this?
I sure hope we are not getting too close into the sensitive territory here.

Chris_Garrett · ‎12-20-2012

This and the formula are completely valid. It was the block bitmap inside the track log I was referring to. And since the track log also contains other data about the file system, then understandably, the more files there are the bigger the track log would get. However, the size of the block bitmap is still limited by the total number of blocks the volume has.

That does make sense when you put it like that - thank you.

Perhaps someone with more insight on the matter would comment on this?
I sure hope we are not getting too close into the sensitive territory here.

I would certainly appreciate this - of course, providing this isn't some special sauce that Symantec needs to keep secret for the moment

Chris_Garrett · ‎01-07-2013

I dont suppose anyone at Symantec is following this thread and can shed some light on this?

CRZ · ‎01-07-2013

I try to follow every thread! But I can't shed any light on this feature as I know nothing about it.

What I CAN suggest, which probably won't help, is revisiting (or visiting for the first time) the SymantecTV intro to Accelerator as well as Rasheed's blog entry on Accelerator from back in July. I can tell you already I don't think it's going to answer your specific question about the contents of the track log, but there may be something in the background information which can help. Or you can tack on a new reply there and see if Rasheed's around. :) Good luck!

Chris_Garrett · ‎01-08-2013

Thank you CRZ. I had seen the blog but not the SymantecTV episode. I have taken your suggestion and posted a question on Rasheeds blog.

AbdulRasheed · ‎03-19-2013

Hi Chris,

I responded at the blog. I am also pasting the same thing here in the interest of other readers.

==========

I won't be able to reveal the actual IP behind the processes. NetBackup Accelerator is designed such that it works on any file system. In addition, it is also architected such that it can make use of exisiting change journal mechanism* built into the file system (this is user configurable).

The track log keeps essential information needed to verify if a file has changed since a previous point in time. This does include the file meta data collected. In addition, it also includes hashes for various segments of a file so that we only need to read the changed blocks if a file has changed. During the run time, we do a file system meta data walk to detect the current statuses of files. There is IP involved in this area to make this process quicker. (You may recall that FlashBackup, V-Ray technology etc. also has similar processes to optimally detect changes without a full scan).

*Currently NTFS change journals are supported

============

I understand that I am not actually giving you all the details as we cannot reveal the secret sauces giving Symantec competitive advantage. In relation to your question on change detection, my response to another question in my blog might help. I am pasting that below.

==============

The 'random read' overhead is mainly for the data blocks of a file. Note that most of the meta data for the file (needed in a incremental backup) comes from the directory. Although we are used to thinking of directory as a 'folder' containing a bunch of files, directory itself is special file in the file system that associates file meta data (owner, group, various time stamps, data block addresses and offsets etc.) to its file name. Your random read would have overhead when the file blocks are scattered around disk segments. When it comes to NetBackup Accelerator, there is no (or mimimum) impact with such fragmentations becuase it needs to seek the actual data blocks only when the file is changed which it knows from reading the directory (directory-file, to be precise).

Thus, with the exception of the very first backup, NetBackup Accelerator can help you with file systems where there is huge random read overhead. However, note that your mileage depends on how frequently files change.

Chris_Garrett · ‎03-19-2013

Thank you Rasheed,

This is useful information,

Chris.

VOX

Netbackup Accelerators Track Log