We tried to initially full backup the flat files from on-premise to cloud and the slp process is backup going to MSDP then duplicate to AWS s3 with cloud catalyst appliance.
Im confuse because the raw size of files is 9 GB then after it backup to MSDP and duplicate to AWS s3 with cloud catalyst the size is the same. How the MSDP and catalyst works if we full backup again the same set of files?
So I think you backup flow is approximately:
client -> MSDP -> CloudCatalyst (AWS).
Your full backup is reporting 9GB. This should be deduplicating (I hope - but depends on the source data). You can see how much by reviewing the job details towards the end - you should see a line something like this:
17/03/2021 11:21:24 PM - Info nbu2 (pid=1913118) StorageServer=PureDisk:nbu2; Report=PDDO Stats for (nbu2): scanned: 1454436 KB, CR sent: 303821 KB, CR sent over FC: 0 KB, dedup: 79.1%, cache disabled, where dedup space saving:47.5%, compression space saving:31.7%
In the above case the backup was about 1.45 GB, but due to dedplication only 300 MB was needed to be sent to the disk pool. So that's the first stage.
The duplication (by SLP) would then send to AWS via the backup image and you will see similar stats as above for the particular image (although the actual deduplication rate may vary depending on what data already exists in the cloud - at the worst you would expect the the amount of data sent to the cloud is the same as from the backup.
It is difficult (if not impossible) once the backup has run to determine (other than the above job information) how much the backup data was deduplicated. NetBackup as a rule will report the original size of the backup.
One way to see how much storage has been consumed by the backup in the cloud is to simply caclulate the size of the bucket (hopefully it will be less than the 9GB original size).
Now if the source data doesn't deduplicate well or at all, then the space consumed might be the same as the original source (the job detail information will tell you). So in this case the first backup may consume the full 9GB, but subsequent backups should (assuming relatively static data) deduplicate well against itself (so the second full backup althopugh 9GB, will only increase your storage consumption by say 50MB).
Hope this helps.
Thank you for this information, I just want to ask another question regarding on the media server.
I define in storage unit of diskpool MSDP the cloud catalyst media server as the media server not the msdp media server, I follow the target controlled configurations below.
I would like to know if i use the cloud catalyst media server as the media server of the MSDP storage unit for backup operation, It will still send deduplicated data to second operation which is duplication?
Whichever way around you configure the optimized duplication it will be that - optimized (only data segments that don't exist on the target will be sent to the cloud).
If you are asking whether you can use the Cloud Catalyst as a load balancing media server for your (on premise) MSDP pool, sure. The data is still stored locally in the MSDP deduplicated and then further duplication to the cloud via the Cloud Catalyst will be optimized (i.e. only data segments not already in the cloud).
Also @StefanosM makes a good point that the Cloud Catalyst has reached end of life. You can still upgrade to 8.3.x, but I don't think it is supported from NetBackup 9 and above. MSDP-Cloud is a better way to go, but will require you to upgrade to NetBackup 8.3 first. One of the big advantages of MSDP-Cloud is that it does not require a standalone media server to run on (and can run concurrently on a media server with a local MSDP - the Cloud is simply a separate LSU to the storage server).
If I have 2 storage unit for 1 MSDP disk pool then for the first time I run full backup for the 9GB of data to stu_mspd1 (no SLP) then no deduplication occur right. Then for the second time I run the full backup for the same 9GB of data to stu_mspd2 (2nd stu for the MSDP disk pool and no SLP ) still no deduplication occur right? since they are different storage unit.
Thank you for the answer
No. The deduplication occurs in the disk pool - the storage units are a logical construct used by the backup to direct the backup to a storage device - in your case MSDP.
Okay the first backup may not dedupe very well depending on the data being protected and the existing data already in the disk pool. Hard to qualify without knowing what you are backing up and what may already exist - though if we assume this is the first backup to the pool, then you would probably get some deduplication from your data set ("flat files") due to additional copies of some and similar data.
The second backup although to a different storage unit will deduplicate very well as the 9GB will match what is already stored in MSDP (it may not be 100% deduplication, but may be close to it depending on the change in the source).
Using different STU's to the same target device (such as MSDP) is a good way to balance or prioritise backups within an environment - tyupically using the maxiumum concurrent jobs setting. So you might have your MSDP limited to 20 streams total. You can then use one STU and set its max jobs to say 15 and share among 30-40 unimportant backups, that you don't care if they queue for time. You still have 5 streams available for the other STU to use for your 1 or 2 important backups that need to run strainght away. Using multiple STU's is also useful for backup and duplication operations that might occur on the one pool - so one operataion doesn't block the other.
Hope that clears this up.