cancel
Showing results for 
Search instead for 
Did you mean: 
Seth_Bokelman
Level 5
Certified
Our university runs a central backup service for our campus, though IT is largely decentralized and broken into over 20 IT groups. In the beginning, we used ARCserve to connect to the assorted Netware and Windows servers across campus and then backed up their data to tape nightly. We also purchased and installed some snapshotting software from another vendor in order to adequately back up open files, as the backup client didn't handle open files well at all.

Though it started off working well, the system eventually turned into a nightmare. SCSI bus resets would occur, which would halt all backups and cause the tape libraries to re-inventory themselves, causing large delays. The backup software wouldn't automatically restart the backup jobs after the errors, either, and it didn't work properly over Remote Desktop, so whoever had the duties to monitor the system for that weekend would have to drive to work and manually restart the jobs. We systematically replaced every piece of hardware, but the problems continued. The software vendor blamed our server vendor, who blamed our SCSI HBA vendor, who blamed our tape library vendor, who blamed our software vendor. The visible public failures of our backup system (we notified the server administrators each time we failed to get a backup) were becoming a very large black eye for our department, and our system administrators were growing tired of spending their weekends in the data center watching over the backups. It became clear that if we couldn't even rely on the system to perform the backup, getting the data back in a restore would be more of a problem. As we perform central backups for a distributed support environment, we also distribute the cost of purchasing and maintaining the backup system. Since we were charging our internal "customers" for this service, we really needed to be able to justify the cost of the backup system. The reliability of the old system was so poor, that was becoming increasingly harder to do.

After evaluating several solutions, we settled on a new StorageTek/Sun backup system along with NetBackup. It not only took care of our Windows and NetWare servers, but we were able to bring in the Solaris, Linux, and Mac OS X servers that had been using separate systems. We also liked that it even had support for our OpenVMS system, which was hosting our e-mail at the time. The relationship between NetBackup and Sun made us feel much less likely to be left in a circle of finger-pointing if we ran into problems, and the ability of the system to cope with the complexity of all the operating systems in our environment was a huge advantage for us, as we were able to run one piece of software on both Solaris and Windows media servers.

Putting the new NetBackup system into production was relatively easy, and the reliability of performing our backups immediately improved. We purchased the new Sun hardware and NetBackup software, and had our first backups running a couple hours after uncrating the hardware. Learning any new system is not without its challenges, however, and while NetBackup will work in many configurations, finding the one that fits best for your environment can take a bit of time and learning. Immediately mirroring the schedules and policies from the ARCserve system into the NetBackup system vastly improved the reliability of our backups and got the torch-and-pitchfork-wielding administrators away from our door.

As with any new system, it took us a bit to figure out the best way to set up our scheduling. We started by using frequency-based scheduling instead of calendar-based scheduling, as our old software used a frequency-based approach. We wanted to perform "weekly" full backups, with every fourth full backup designated as a "monthly" full backup and retained for a longer period of time. However, with the weekly backups set for a frequency of every one week, and the monthly backups set for a frequency of every four weeks, we would find ourselves performing two full backups on the same weekend. To compensate, we started manually excluding the "weekly" backup on the weekend that the "monthly" backup job would run. This lead to a lot of time being spent micro-managing scheduling, until we found a better way to do it.

The solution was to switch to calendar-based scheduling: we just moved the "monthly" jobs to run on the first Saturday of the month and the "weekly" jobs to run on the other 3 (or 4) Saturdays. This meant that sometimes there would be an extra week between the "monthly" backup sets, but in return, we never had to manage the exclude dates again. The backup scheduling mostly became a matter of "set it and forget it", as we'd defined a schedule that really worked for us, and NetBackup took care of the rest.

We also had a large number of policies to sort through, as we'd created one for each of our NetWare servers. By switching to NetBackup's "target" method with our legacy Netware servers, we were able to condense them to running in one policy instead of the dozens we used to manage. This greatly simplified operations by giving us far fewer policies & schedules to manage and modify.
Those two optimizations alone freed up hours of administrator time each week, allowing us to take on more projects and perform even more fine-tuning of our backup environment. We also discovered that NetBackup's snapshot technology was so much better than our previous software that we were able to drop our licenses for the other software we'd purchased to compensate and remove it from all our servers.

Once we were happy with how things were working, we enabled a feature that our "customers" (the various sysadmins spread across the university) really loved. In the past, whenever one of them needed a file restored, they needed to track down one of only two of us who could perform the restore. They would be relaying a message from their end-users to us, and the user would often not give them the correct path to the file that they needed restored. In turn, we would all waste excessive time attempting to find the correct path and file before we could initiate the restore.

Now with NetBackup, the system administrators can just log in to their servers, open up the Backup, Archive, & Restore application, look through the backups to find the correct file, select it, and initiate the restore themselves! They can do it in the middle of the night, or on a weekend, and we won’t have to be paged to do it for them, as long as the tapes are in the library! This feature alone has been so popular with our administrators that when we considered changing backup applications, the lack of this feature stopped us from looking any further, even if it would have saved us money on licensing. These days, the various server administrators perform over 90% of their own restores, with no intervention required from us.

Prior to NetBackup, we had roughly one full-time employee dedicated simply to managing the problematic and unreliable backup system. After we purchased NetBackup, and learned how to make it work efficiently for us, only about a quarter of an employee's time is being used to manage the system, even though we have 4 times as many servers and 10 times as much data as when we started!

Version history
Last update:
‎07-09-2009 09:46 AM
Updated by: