2 x backup Exec 2014 servers storing backup data locally to disk.
10TB of data backed up.
To duplicate backup jobs to Amazon S3 for offsite storage. This would only be accessed in a disaster scenario.
I've been playing with Amazon's Gateway VTL device and their Gateway Cached Volume. (I believe Gateway Stored option not available for BE)
As such I have implemented both a VTL device and a cached volume device and made the storage available to backup exec via Iscsi initiator.
I have performed test backup jobs of both solutions and both appear to work fine.
The cost of runnning a an Amazon Storage Gateway (either VTL or cached) is roughly £80 per month. Obviously with actual storage and reads etc on top of this.
If I go down the VTL route I need to provision 2 gateways as the Iscsi initiator and the virtual tape library can only be connected to one physical BE server.
My preferred method, to avoid having to use 2 gateways, is 1 Gateway cached Storage device. Within Amazon AWS I can then create 2 volumes each with a unique ID and then each BE server has it's own Iscsi volume that it can use as disk based storage. I have tested this scenario with success.
How does Backup Exec manage the expiry of data sets that are stored on an Amazon S3 device?
What I need to avoid is costly reads of the whole amazon volume as this is where extra additional costs are going to be incurred.
Does anyone have any experience of this or have any tips to share?
Are the 2 BE media servers independent? or is it a CAS/MBES setup (ESO license)? If you have a ESO license and setup the CAS/MBES scenario, a single VTL can be shared.
If the media servers are independent then you've already found your solution for paying for 1 gateway by creating multiple volumes.
I'm curious about your media retention plans...are the data sets you plan to write going to be recycled? or do you plan to use this as more of an archiving solution - "Do not overwrite"?
A large incentive for the VTL is the VTS-virtual tape shelf (Glacier storage/pricing) however if you're going to need immediate access to the data you plan to store w/AWS then you probably want to pass on Glacier as retrieval times could take up to 24 hours.
BE is blind to what's behind the disk or vtl so there is no special media management for these types of devices, meaning we just treat them like any other VTL or B2D folder. The BE database keeps all the info for the retention of media.
The big difference between the media management of the VTL vs. the Volumes is that the VTL will go on to reuse tapes that have expired whereas the Volumes will proactively delete (DLM) the expired media (from the Volume).
*I just want to note that the VTL is limited to the use of 1 drive without the 'VTL Unlimited Drive Option'.
You'd want to check with AWS on pricing but from what I've read delete requests are free for "Standard" S3 storage so the DLM would not impact your costs. An inventory/catalog/restore would obviously cost you (if the data wasn't available locally in the cache).
How large would you plan to make the 'local cache' of the Gateway? The cache stores all of the most recently accessed data and when doing a read/restore, it checks the cache before heading to S3 - saving you a ton of time and the retrieval costs.
We haven't tested the Gateway-Stored Volumes yet but that model relies on your entire backup set being local with point-in-time snapshots of the volume in S3. This is yet to be explored which is why it is not listed on the HCL.
I hope I cleared some things up but please reply, we're very interested in helping you find the best solution here.
Thanks for the response Justin. A great post!
My setup is 2 seperate media servers backing up data to local disk storage.
At the moment I also have a NAS onsite that the the data is replicated to and a matching offsite NAS that RSYNCS but the reliability isn't great hence my AWS investigation.
Media retention wise I would just be recycling each set after a couple of weeks. My plan is to have 2 complete sets on AWS at any one time.
As this is disaster recovery I hope (touch wood) that I never have any need to restore anything from AWS bar for test restores to ensure data integrity.
As I have local storage for my primary backup any restores will be done from there in the 1st instance.
Initially I did setup the gateway as a VTL solution (which worked) but I don't really need to the ability to offload the tape to glacier as I don't need to archive data for so long. I do feel that presenting BE another native disk from the OS feels like a cleaner solution for me.
In regard to the local cache I do have enough available disk space to assign enough to cover the full copy of the data I intend to throw on AWS. My calculations indicate that my max data stored on AWS will be 10TB (this is the amount a costed) and I could make my local cache storage that size if required.
As I'll be channeling 2 physical BE servers through 1 gateway will my upload buffer be crucial?
At the moment it's set to 250gb but I'm considering upping this to the max allowed. (2tb?)
Sounds like you’re on the right path…You’ll definitely want to do some estimating for your upload buffer since you’ll have 2 targets on the gateway.
Check out the ‘sizing the upload buffer’ section in their docs: http://docs.aws.amazon.com/storagegateway/latest/userguide/GatewayCachedLocalStorage.html
- There is a formula they provide, w/examples that can help in sizing the upload buffer.
You can always add more upload buffer capacity as you need (up to the limit of course), so feel it out. Same goes with the local cache storage - although your intentions for the cache are pretty clear.
If there is anything else, give us a shout!