Keeping true to my last blog post Unstructured Data – GDPR’s Wild Wild West, I will continue with the Western theme.
In my last post, I wrote about the hidden pitfalls of finding and managing data in unstructured data sources. In this post, I want to introduce the idea of determining the value of data in your environment.
When I talk with customers about intelligent data management, I usually say that over 50% of unstructured data is either stale, redundant, obsolete or trivial. Every time I discuss this topic, each person in the room not only agrees but usually says “I bet ours is higher than that!”.
Veritas’ own Databerg Report reveals that only 15% of data in most organizations is considered “clean”, while 32% is considered redundant, obsolete or trivial (ROT). This leaves 55% of data that is considered “dark data”. This is data whose value cannot be determined as “clean” or “ROT” category.
So how do you mine your unstructured data sources to find the gold? How do you place value on data when you know nothing of the data beyond a file name?
“Value” of data comes in two forms, and each one is required to give you a complete picture to help you determine what to do with the file.
First, you have contextual information. This is information that will give you insight into how the data is being used, who is using it, who is modifying it, how often it is being used or modified, who owns the data (actual and inferred), and how much of your data is stale.
Next, you have content. You can’t get a complete understanding of the data in your environment by looking at usage and utilization statistics alone. There is also value on the content of the files in your environment. By determining the content of a file, you can determine how the data is going to be managed, how long you will keep it, and what compliance and retention settings apply to it to match your corporate policy.
Once we understand both the context and the content, we can now start to place data into different logical “buckets” of information.
Data management has matured to the point where we can monitor how the data is being used, we can view the content within the file, and then manage the file accordingly. Imagine a world where you can be completely hands-off when managing your data, letting automation take care of everything. From the moment a file is created on the network we can monitor its use, look at the content, and then automatically classify and tag the file. Based on this classification we can then assign an automated retention and defensible deletion plan for the data. The file is managed throughout its lifecycle without any intervention from the IT staff.
Gone are the days of simply looking at a mountain or a stream and wondering where the gold may be. I’m sure the Gold Rush would have been a lot easier if there were signs and arrows pointing to where the valuables were stored. Veritas can help you find the gold in your mountains of unstructured data.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.