What is the best way to backup 100 servers? Do you use 100 backup jobs or 1 backup job?
Often there is a philosophical argument about which approach is better. With more jobs, the success statistics look better. A single job failure is only a 1% failure versus a 100% failure when the big single job failes. Since these jobs are small, it is also easier to rerun any failed jobs. However, it is more difficult to schedule 100 jobs. There is no right or wrong answer and it boils down to installation standards and personal preference.
With the advent of BE 2012 with its server-centric approach, the pendulum seems to have swung to the side of having multiple jobs. This will come as a shock to people who are not used to handling many jobs. The aim of this article is to point out the considerations when handling many jobs. Other than a few references to BE 2012-specific features, the considerations in this article is equally valid for older versions of BE.
In order to arrive at a seriatim of jobs to run, you need to gather some basic information first.
Your jobs are backing up your servers. Which server is more critical than the others? You would want to back up those critical servers first. Get them out of the way fast. If there are any hardware failures, at least there is good current backup for these critical servers. Use this to rank your backup jobs. Jobs that are backing up critical servers should be done first before the less critical servers.
Be precise with how long you have to backup your servers. The longer the period, the better it is. You got to know the backup window for each server. If a low-priority server is free from 7 p.m. and a high-priority server is only free from 10 p.m., you should back up the low-priority a 7 p.m. rather than wait until after the high-priority server is backed up. This will relieve any time crunch that you may have later in the night.
Find out when housekeeping or other batch jobs are run on each server and how long. It is not good to have other tasks running when doing a backup. This generally results in slower performance for the backup job due to resource contentions. If there is a clash with your backup jobs, then you would re-schedule either your backup job or your housekeeping/batch jobs.
Once you have gotten your seriatim, there are a couple of ways to run the jobs
Ignore the seriatim and run every job at once. This is possible if you ae just using disk storage and/or disk cartridge. You can set the number of concurrent jobs parameter to a high number and just run your jobs. However, they may not necessarily run faster than if you are to run them serially. This is because the jobs will be contending for resources like CPU and I/O's.
If you are writing to tapes, then you need to run your jobs serially. BE does not support multi-streaming to tape, like NBU.
BE has 5 levels of job priorities. The job with the higher priority will get the resource first. For example, when 2 jobs needs a tape drive, the job with the higher job priority will get to use it first. You can set the priority of the job that needs to run first to the highest priority, the next job will get the next lower priority and so on. This will work if you have 5 or less jobs or there are 5 or less jobs writing to each resource, like disk storage, disk or tape cartridges.
Note that if you choose to use this method, the timing for your jobs would be artifically inflated. For example, if you start all your jobs at 9 p.m. and the second highest priority job only gets the resource that it needed at 11 p.m. The two hour waiting time will count towards the total duration of this job.
If you are not using BE 2012, your job can wait at most 24 hours after it is started, You may have to delay the start of the job if the preceding jobs take longer than 24 hours. If you are using BE 2012, make sure that your wait time is long enough
If you are using tapes, then the order of the jobs is important. Normally, you would want the first job to overwrite the tape and the subsequent jobs to append to the tape. You might also want to the last job to write to the tape to eject the tape.
Simply take the job seriatim and start each job a minute apart. The subsequent jobs will wait if they are writing to the same destination and the current jobs parameter is set to 1. The drawbacks of this method are that the timing of the jobs will be inflated (previous discussed) and resource contention if the jobs are running simultaneously. Also, the wait time of the subsequent jobs should be long enough so that they do not auto-cancel.
If you are using tape, this method may not be suitable because you might find that your subsequent jobs do not append to the tape. See my article below for an explanation
Once you have run each of your jobs and have gotten the actual job duration by substracting the wait times, you might want to schedule the jobs such that they do not overlap. This is to overcome the drawbacks discussed earlier.
If your first job starts at 9 p.m. and finishes at 11 p.m., you can schedule your second job to start at 11.30 p.m. The half an hour gap between the two jobs is to cater for slight delays and growth of data which will lengthen the time of the job. You must make sure that your backup window allows you to schedule jobs in this way.
You also have to monitor the start and stop times of each job and adjust periodically so that they do not overlap. This is especially needed if the gaps between your jobs is narrow.
Although this method is the most difficult to manage, it is the initial setup which is most difficult. Once that is done, the maintenance of the schedule is quite manageable.
BE exclusion dates are on a global level and there is no job-level exclusion dates. This might present a problem when you are using this method. See my Idea below for an explanation
Lastly, you can chain your jobs. See my blog and article below on how to chain your jobs.
BE 2010 and below - https://www-secure.symantec.com/connect/blogs/use-bemcmd-start-jobs