Backup Anomaly detection in NetBackup
Introduction NetBackup has in-build capability to identify the unusual pattern in the backup meta data. This unusual behavior could be because of any Ransomware attack on the source client system. Every day we read about companieswho have been hacked, and their data has been held for ransom or threatened with exposure of sensitive data. So this solution in NetBackup can help to identify such events early to take any actions. What is Anomaly Detection Now let’s understand what Anomaly detection is. Anomaly detection is the ability to detect any unusual pattern or any unusual data or any unusual events. These are unusual behavior when compared to the normal behavior which indicates some underlying problems such as Ransomware attack. Identifying such unusual changes earlier will help to take prompt actions against ransomware attack. Normal Data Anomalous Data From above images, we can identify that for a data plot of ‘Number of files’ vs ‘Image size’, we can see in first figure, data seems to be having normal trend. In second figure, the highlighted point in Red is deviated much away from the rest of normal data points. So just by looking at this graph we can see that there is something unusual behavior in that backup with huge number of backup files getting added. This is what we detect as the Anomaly. Impact on Backup metadata in Ransomware attack In the event of attack on the system, after taking backup from that system we can correlate the attack with the backup metadata. File contents encryption If attack encrypts file, then in this case the Deduplication rate would go down. File Renames Number of files will increase in the backup Files deletions Number of files will decrease Backup size will decrease New extensions files added Number of files will increase Backup size (and total data transferred) will increase. So with such scenarios with come up with the list of backup metadata attributes which we can use to identify any deviation in the normal trend. Backup images size Number of files Data transferred Deduplication rate Total time for job completion. How Anomalies get detected in NetBackup? Let’s us take below example to explain the detection logic. In this example data for 3 variables Image size, Number of files and Deduplication rate has been plotted. The data has been Standardized to make it in the same scale. Fig1 Fig 2 Figure 1 Here we have plotted 100 observations on a standard deviation scale. We can find each observation are reachable to each other by a distance ‘d’. So, all such observations are clubbed together in a cluster of similar observations. Since all observations are within the cluster, we can say that there is no outlier or anomaly in this data set. Figure 2 We can see that the marked cluster of observations which are near to each other by a distance ‘d’. Also, we can see one observation which is much away from rest of the observations. Hence that observation is not part of the cluster, and such observations are considered as anomalous. So based on this explanation lets see how NetBackup detects the Anomaly in the backup job. Baseline NetBackup Anomaly detection at first captures the last 3 months of Image data as a baseline. This ensures we have enough observations available to start the detection. Detection works for each combination Policy Client Policy Type Schedule Destination storage Asset ID Once the historical data captured, for the subsequent backups completed, NetBackup tries to fit the new backup metadata into the cluster using training data for the same client-policy combination mentioned above. Detection requires minimum 30 observations i.e., minimum of 30 backups should be done before doing detection. If for a given client-policy combination mentioned above there are not enough observations, detection logic will wait till there are minimum 30 observations. Till that point, all backups done are considered as training data only. Detection NetBackup has 3 different services for Training data gather Detection Alerting As mentioned in Baseline section, NetBackup first captures the Historical data and then it switches to detection mode. Below is the overall flow for the Anomaly detection solution in NetBackup. For a given client-policy combination (mentioned in Baseline section above), if there are minimum 30 observations then the detection for that backup job starts. As mentioned earlier, detection logic forms different clusters of observations which are near to each other. And if a given observations does not fit in any of the cluster, then it is considered as Anomaly. Refer Fig 2 above. For the very first time, the detection mechanism finds the suitable cluster diameter, the distance ‘d’ mentioned in Fig 1 above. This distance ‘d’ will help to form the clusters. For first time detection, it uses all training data and do the iterative processing to find the suitable distance ‘d’ such that all the training data are within the clusters. This process of finding the suitable distance ‘d’ to form cluster is largely dependent on the input training data for a given client-policy combination. If there are too many variations in the training data, then this distance ‘d’ would be larger than for the data with minimum variation. For example, if the backup size for given client-policy combination mentioned in Baseline section varies between say 10 GB to 12 GB, this will have minimum variation versus if the backup image size varies between 10 GB to 30 GB, which will have high variation. Depending on the data, there could be multiple clusters formed from the training data. See below figure. This cluster information gets stored and gets calibrated every after 14 days to accommodates the variation of the future data. At overall, below are the high-level flow to detect Anomaly in NetBackup Anomaly Score calculations for detected Anomaly Once the detection is done for the given job, detection logic finds the severity for the given Anomaly. Severity is nothing but the measure of how much the anomalous observations deviated from the normal trend. The higher is the deviation the higher is the score. Here we calculate the nearest cluster from the Anomalous observation (in standard deviation scale). And this ‘distance’ (Euclidian distance) we use as the Score. Above fig. explains the Score calculations logic. This diagram contains 3 clusters. And from the Anomalous observation, it calculates the distance to all clusters to find out the nearest cluster. In this example, Cluster 2 is nearest with distance ‘d2’. Hence for this Anomaly Score is ‘d1’ False Positive treatment Anomaly can get generated even if there is no real Ransomware attack. For example, if user carry out some activity on the system deleting huge number of files, then there would be anomaly detected on that client. Since user is aware of this activity and there would be chance that in future this kind of activity will get happen again, user can mark this Anomaly as False positive. Marking Anomaly as False positive will not generate ‘similar’ Anomaly in future for same client-policy combination mentioned in Baseline section As per above fig. user has one Anomaly marked as false positive. Marking that there would be new cluster gets created and any new observation falling in that cluster will be treated as non-anomalous. How sensitivity settings work. For Anomaly generated such as mentioned in Fig1 which is close to the cluster but still the Anomaly since it is away from rest of other observations more than distance ‘d’. In this case by increasing the sensitivity configuration parameter by certain percentage will actually increases the distance ‘d’ so that new observation can be reachable from other observation in the cluster. This can be viewed as new cluster with increased size as mentioned in Fig2 and this Anomaly will be no longer considered as Anomaly Fig 1 Fig24KViews5likes3Comments