The Technical Services team for Backup and Recovery have produced a number of documents we call "Blueprints".
These Blueprints are designed to show backup and recovery challenges around specific technologies or functions and how NetBackup solves these challenges.
Each Blueprint consists of:
The NetBackup Catalog describes the internal databases and files that keep track of the data that NetBackup has under protection. Each time a backup is written, entries into the various databases and file are created which provides NetBackup the information about the data for restore purposes. This data includes things like file names, the client being protected, location of the data (disk/tape etc.), size, retention policy, media server used etc. Basically all of the information needed to report on the data and ultimately recover the data.
Since the NetBackup catalog is included with NetBackup it is hard to quantify business value. The value comes in the ease of use and the fact that the NetBackup administrator does not need to spend a great deal of time managing the catalog. The NetBackup Catalog structure makes it very easy to protect. Since it is not a single database, losing a portion of the catalog does not render the catalog useless. In some data protection solutions that use a unified database, database corruption can lead to the inability to recover data. Because the catalog itself requires very little attention, a separate FTE is not required to manage it.
The NetBackup Catalog writes information every time a backup is run. In the first part of the catalog – the “Images” directory, data stored is approximately 120 bytes per file saved. This translates into a handful of GB written per day even in a large environment. Since the catalog size is based on number of files, a very large customer environment with a small number of large files would have a small catalog. Even if Terabytes of data are protected, if they are large files, the catalog would be small. Conversely, a small customer with billons of small files, even if they totalled less than a TB of data, could have a very large catalog. So when architecting a NetBackup solution and determining the disk size needed for the catalog storage, number of total files is more important than size of the overall data.
The second part of the catalog is the databases which make up the EMM database structure. This stores the information about the media used for backups as well as allocation of media resources during backups. This is a separate database and resides in a separate directory (see Figure 1 in the Test Drive below).
The NetBackup Catalog has undergone a number of changes – most of which were done in order to speed up access time for restores as well as to provide a better method of protecting the data. Prior to NetBackup version 5.0 the catalog was stored in ASCII format. This worked well when customers had small environments, however the amount of space required to store the catalog was becoming an issue as each file required at least 150 bytes of space. Many customers in the NetBackup 3.2 – 5.x days had limited disk space for the catalog and disk was very expensive, therefore at 150 bytes per file, the catalog could quickly grow to 100GB which – at the time – was a very large catalog. Therefore a catalog that stored 1,000,000,000 files (one billion) would create a catalog size of 150 GB.
In NetBackup 5.0 the files were converted from ASCII storage to Binary to reduce catalog space and speed up browsing for restores. This not only increased the speed, but it reduced the amount of space required to 120 bytes per file – therefore the same 1,000,000,000 files would create a 120GB catalog – which was a 25% savings on disk. In modern times with multi-Terabyte disks, this isn’t an issue, but in 2004 this was a huge savings. The next challenge was protecting the catalog. In 2004 the average tape drive could write a single stream of data at approximately 5MB/sec per stream (based on a SDLT600 writing at 60MB/sec) and the catalog backup was single stream. This meant a catalog backup of 120GB could take up to 83 hours. The real challenge was that the catalog backup could only be done “cold” – when no backups or restores were running. The size of the catalog was becoming an issue for many customers. Catalog Archiving was introduced, however it was difficult to use.
In NetBackup 6.0 the EMM database was included as part of the catalog to track media location and allocate resources. This was mostly transparent to the users. The ability to perform hot catalog backups was added in NetBackup 6.5 which allowed backups to proceed at the same time as a catalog backup. This has made the catalog easier to protect, but it can still be a challenge when the size of the catalog goes over 500GB.
You can download the full Blueprint from the link below.