Unstructured Data – GDPR’s Wild Wild West

I have been involved in many General Data Protection Regulation (GDPR) initiatives with my customers over the last year. One stands out as rather eye-opening for both the customer and myself.

During the “Locate” portion of the GDPR solution deployment we give the customer the ability to get a view into their unstructured environment by providing them with an understanding of how the data is being used, who is using it, and how long it has been since it has been used. We also give them the ability to look at the content of the file and classify files accordingly.

On this exercise, we uncovered (among other things) the entire series of the HBO drama Deadwood saved and stored by an employee on the customer’s network. Deadwood is a TV series that is set in the Dakota Territory in the 1870’s, before it was part of the United States. Some say it is a very accurate depiction of life in the wild wild west. Movies and TV have romanticized the lawless and untamed antics of the “wild wild west” for many years. I mean, what’s not to love?

  • There were very few laws (if any).
  • Towns recognized unprecedented growth, mostly with lack of proper planning.
  • Nobody was really sure who was making the rules or enforcing them.
  • The only thing that was guaranteed was the fact that at some point your valuables would probably get stolen.

Sounds romantic to me. As bad as this sounds, I’m sure some organizations would call this tame in comparison to what they face in unstructured data sources in their data centers.

As you can imagine, having illegally copied episodes of a copyrighted TV show is not something you want on your network. This is just one of the trouble spots we typically uncover when starting GDPR assessments with customers.

Unstructured Data

This scenario sounds strangely familiar, doesn’t it? Unstructured data sources (File Servers, SAN, NAS, etc.) have been the wild wild west for many years:

  • There are few rules around storing data.
  • Organizations are experiencing unprecedented growth, especially in unstructured data sources.
  • Planning is often an afterthought, as the data repositories tend to be a “dumping ground”.
  • There are many people who seem to be in charge of their portion of the environment, leaving questions as to who is in charge or enforcing the rules.

Most structured data sources have controls in place and search mechanisms that provide a very rich and robust method to search for personal data. The very name “structured” points to this fact. How do we provide the same search and management functionality in our unstructured data repositories? Without the right tools, it’s nearly impossible.

Due to the nature of unstructured data sources, imagine how hard it would be to determine where Personal Data would be located within your environment. Where would you start your search? Can you scale that process to the possible millions of files you have on the network?

De-Structured Data

One of the arguments I often hear during GDPR discussions is “Our GDPR efforts are focused on our structured data because that is the only place we store personal data.” That is a very good practice and in theory it may be true, but what are the odds that someone has performed a database dump or backup of a structured data source and saved it to a departmental or personal share? What if someone ran a query against a structured data store and saved the results in a .CSV file on the network?

You cannot guarantee that your unstructured environments are free of personal data unless you have a way to gain insight into the unstructured environment to analyze the content of the data.

There’s a New Sheriff in Town

And her name is GDPR. Now that we have determined that not only is unstructured data a possible minefield of personal data, we must start to plan on how to get a handle on the problem. GDPR lays the groundwork for how we are going to clean up personal data within our unstructured data sources.

If we take a step back and elevate the conversation to a general best practice around Information Governance, we quickly find that by following good Information Governance practices can greatly enhance our GDPR efforts. 

GDPR takes these Information Governance processes and adds specific logic around personal data. If you have a good Information Governance process in place, GDPR efforts for your unstructured data are already underway.

So, how do we bridge the gap between a good Information Governance strategy and focus on GDPR?

Future posts will expand on the Veritas methodology for GDPR focused on unstructured data. For now, you can review our GDPR methodology and solutions at https://www.veritas.com/gdpr. GDPR_Wheel.pngVeritas Methodology for Managing Personal Data