Backup Set retention and expiry Handling on Removable (RDX etc) media

Backup Exec 2014, 15 and 16 handle removable media in a completely different way from the initial release of Backup Exec 2012 and BE 2010 R3 or earlier versions. This blog is an attempt to explain the operation and describe a scenario support have seen with removable media. I have based all information provided using a combination of training and developer supplied details, as well as testing I have performed myself to validate the information.  Some of the behaviour documented is under review and may change in the future.

NOTE: A colleague has written a parallel blog covering DLM in general which should also be reviewed to gain a full understanding. This Blog is available:

Automated-Disk-management-and-Data-retention-in-Backup-Exec-DLM

 


A few definitions to start things off:
- RDX, this is a specific type of cartridge based removable media technology, however for simplicity I will use RDX to represent all types in the following notes
- Backup set, in the context of this document a backup set represents  the backup of one resource on a server from one instance of running a backup job. As C: D: , System State, Mailbox Database etc are all different resources it can be seen that one backup job could create multiple backup sets and each time a job runs further sets are created.
- DLM (Data Lifecycle Management) this is a term for the processing and handling of disk based media introduced in Backup Exec 2012 and is separate from the tape (or legacy disk handling) mechanisms which depended on media set control
- BKF, a media file containing all or part of a non-GRT backup set
- IMG, a media folder containing a complete GRT enabled backup set
- SDR, Simplified Disaster Recovery


The following points outline some key points about DLM handling against RDX (some of the points also apply to standard disk storage backups)

WARNING: the points in this blog are only true for RDX drives directly attached to the BE server. If the RDX device is attached to a remote server and accessed over shares, then this blog does not apply, and in fact the configuration is NOT SUPPORTED with various symptoms being seen including unexpected offline status, needing service restarts, regular running of inventory and/or catalog jobs after cartridges are inserted through to issues with the cartridges not being individually identified (because the share is the individual object) and these all contributing to DLM not reclaiming space when expected.


1) RDX media actually means BKF or IMG and these relate directly to Backup Sets and can be seen in Backup Job Logs
2) the FLDRnnnnnn media object which may be seen in the BE console represents each RDX cartridge, but in reality should be considered to be a media container as it does not directly relate to one backup set.
3) There is no concept of APPEND to BKF or IMG as the process to use existing space on an RDX cartridge for a new backup set actually creates new BKF or IMG it does not add to an existing BKF or IMG
4) Overwrite is also misleading as again you do not overwrite a whole cartridge, the DLM process also does not re-use existing IMG or BKF files.
5) The process to remove expired backups sets and the associated on disk media is known as a RECLAIM
6) For RDX, a reclaim operation starts at the beginning of the creation of each new backup set. If a given job would create multiple backup sets, then multiple reclaim operations will be triggered during the timespan of the job.
7) At the end of each backup set database indexing takes place (in the background) until this indexing is completed the older version of the same resource will still be classified as Last Recovery Chain. If the following backup sets, up until the beginning of the last backup set in the job are all small (so very short backup times and indexing is still running) then it is likely that one or more sets will not be reclaimed during the job.
8) The default 1 hour (for current version) reclaim trigger that runs against standard disk storage and deduplication devices does not apply to RDX.
9) RDX cartridges do not try to reclaim when they fill up, as the cartridge gets ejected instead.
10) Once a reclaim operation starts the process looks for ALL expired media within the RDX cartridge to reclaim, it does not look just for a single backup set and also does not look for sets pertaining to only the previous history of the same job.
11) Expired backup sets will not be reclaimed if they form part of the Last Recovery Chain for a given resource, unless the setting for “Allow Backup Exec to delete all expired backup set” is enabled.
12) If multiple, separate job definitions are created to backup the same resource, this will create multiple Last Recovery Chains, one  per resource against each unique job definition and not just one Last Recovery Chain against the last job that ran for the resource.
13) If the backup job is SDR enabled then all of the critical SDR backups sets must be expired and also not be part of Last Recovery Chain for them to be reclaimed. As per the GRT/IMG sets this means a further job needs to run to reclaim these sets
14) The System State Backup (which is also needed for SDR) is always the last backup set against a specific server in terms of ordering within the job.
15) If sets are expired but cannot be reclaimed they remain visible in the Storage section of the console with an Expired status.
16) No reclaim can be performed against RDX cartridges that are not inserted/mounted in a drive
17) An individual RDX cartridge (FLDRnnnnn media container) that has been disconnected (ejected) from the media server for an extended time will be set to “Limited to Read Only” operations after a default of 30days. This timing can be adjusted or disabled and the setting can also be manually enabled. Once enabled no DLM activity can take place on the cartridge.
18) Incremental or differential sets can also be dependent on existing backup sets and block the reclaim of expired media, although as GRT enabled incremental or differential sets need to be online at the same time as the previous sets in the chain, this scenario is less likely to be seen when using RDX and swapping cartridges from day-to-day. Although if the full and incremental or differential sets are all on one cartridge then this might be seen as it  does extend the Last Recovery Chain.

 


Based on some of the above we recently experienced the following scenarios within Tech Support:

Scenario 1:

Requirement was to backup one server using a different RDX carriage for each day of the week, re-using the cartridge on the same day the following week.
The amount of data being backed up per day could potentially amount to more than 50% of cartridge size.
SDR backups were required.
There was at least one GRT enabled set involved on the server concerned
A separate job definition had been configured for each day of the week.
The retention was set to less than 7 days

The first week’s jobs all ran correctly, from the second week onwards disk space issues started to occur. Investigation showed that in this scenario the RDX cartridge would need enough space for the following:
- One copy of every backup set in the job
- A second copy of every backup set the job that was marked as SDR critical (at least C: and System State, but potentially might require EFI partitions and further volumes). This is needed because the system state is always last and means the earlier SDR sets will not be reclaimed during the job (in fact the SDR sets will be kept until a 3rd backup job to the same cartridge is run)
- A second copy of any large backup set that is followed by very small sets (for instance an Exchange information store that is followed by unused but enabled public folder or sharepoint databases)
- A second copy of the backup set currently subject to backup activity within the job. This is needed because until the current backup of the same set finishes, the Last Recovery Chain locks the previous one. If calculating before the backup job starts, then base the disk space requirement on the biggest non-SDR set in the job.

Depending on the backup data quantity on a day-to-day basis  one cartridge did not always have enough space even though it easily could hold one complete backup of the server concerned.

 

Scenario 2:

Backup administrator has  two 1TB cartidges where the intention is to run a full backup once a week to Cartridge 1 and incremental backups to Cartridge 2 daily (with this second cartridge being changed out the day the full needs to run and replaced the day after the full). The amount of data making up the full is 700GB. The incremental sets for the whole week will fit on one cartidge.  Retention of full is set to less than 7 days, retention of incrementals is set to less than 1 day. Again this strategy fails with not enough space for the second full because the last recovery chain ends up on the same cartridge and is also an SDR enabled set so can only be completly reclaimed at the end of the next full backup.

 

These problems could be avoided in multiple ways (there are pros and cons to them all):
- Use larger RDX cartridges - would recommended for simplicity more than twice the biggest expected data size for one night’s backup cycle. As per above if the backup contains non-SDR data as well as SDR data, then it probably won’t always need twice the size but it is a simple rule of thumb.
- Change the cartridges every day AND use one job definition that repeats each day and contains weekly or monthly stages as needed, instead of one job definition per day. This would have the effect of only maintaining one Last Recovery Chain across all the cartridges used and then at the very start of the next job, all the expired backup sets on the available cartridge would be reclaimed (assuming the cartridge had been changed so did not now own the Last Recovery Chain)
- Enable the “Allow Backup Exec to delete all expired backup sets” option (not recommended as potential for data loss enabled). This would mean that at the start of each night’s job all the expired sets should go from the cartridge. Care would need to be taken to make sure cartridges are swapped out properly and an understanding of when you might want to manually set Limit to Read Only might be needed. You should also take care with this setting if you have standard disk storage or backup to disk (B2D) on the same media server as the setting will affect all devices managed by DLM.
- Disable SDR in the jobs (not recommended as extends and complicates the recovery process). This would allow things like C: drive to be reclaimed as the second backup set in the job starts, thus performing a rolling clean up of disk space against all backup sets during the job operation.


Notes:
Duplicating Backup sets from RDX to something else may change the expiration dependencies (this article does not currently discuss the possible effects of duplicate jobs)



 

 

1 Comment

Thank you for this information Colin!