Shining a classification light on dark data landfills

In our third e-discovery blog, we’d like to take a more in-depth look at the challenges of data classification and discuss the use of modern classification tools. Classification can be a powerful tool to speed up discovery, reduce risk, and save cost if used appropriately. 

The court rulings in 2005’s Zubulake v. UBS Warburg resulted in many corporate counsels issuing enterprise-wide legal holds. This signaled the end of the age of mailbox and user directory size limits that forced users to migrate or manually delete data. Few companies invested in early enterprise search and classification solutions required to preserve individual items selectively. Instead, they inadvertently created massive data landfills, most of which still exist today. With corporations migrating their IT infrastructure from data centers to the cloud, corporate counsel is under increasing pressure to approve remediation and decommissioning plans.

PatternExamples.pngThe dark data residing in bloated departmental shares, SharePoint libraries, and enterprise archives have posed a serious challenge to corporate stakeholders. Redundant, obsolete, and trivial (ROT) data is comingled with vital records, personal data, trade secrets, and data under legal hold. The cost and effort of manually remediating hundreds of terabytes of unstructured email and files have kept these repositories around for decades. So how can a corporation tackle this legacy dark data challenge most efficiently?

Legacy dark data increases discovery cost and risk

The sheer scale of these repositories requires an automated solution to analyze, classify, and appropriately act on individual data items. Legal does this on a smaller scale by processing custodial collections and using sampling search criteria or technology-assisted review techniques that can cost upwards of $1,600 per GB for a single relevant/non-relevant determination. Conducting expedited discovery on legacy data landfills can multiply that cost. Enterprise content may qualify for hundreds of disparate classifications under a hierarchy of business, retention, legal, privacy, and regulatory policies. Although this sounds daunting, the Veritas Information Classifier (VIC) has 700+ prebuilt patterns that support 120+ pre-configured policies. Named Entity Recognition can identify names, organizations, and locations that can support context-driven classifications and policies. We tackle one or multiple repositories by starting with the macro analysis that builds a picture of the context. Veritas Data Insight provides a dashboard perspective and visualizations of file types, location, ownership, age, and user activity (stale/orphaned). The VIC’s classifications and tags give you a micro view of the file content, automatically converting binary and image files to text. Your dark data now has context and content classification that enables you to filter out the ROT and act on the records.

Trust but Verify

PreconfiguredPolicy.pngJust like discovery, developing classifications is an iterative process that requires tools and feedback to minimize the burden and risk. VIC enables you to test your policies on known data sets or sample sets. Dynamic dashboards and reports give you fast insight into potential exceptions and additional custom classifications. You can use document matching analysis to tag specific corporate templates such as contracts, approval forms, or meeting notes. For critical custodians or classifications, you can easily export sample sets to your eDiscovery Platform to test and develop classifier criteria.

Acting on classified data

You can now navigate your terabytes of legacy file shares by metadata, classification, or policy visualizations. The primary business goal is to identify data with business value and migrate it into your long-term records archive. Data with PII, trade secrets, or confidential business information is tagged for protection. You can create legal hold classifications for ongoing holds of data migrated to active repositories. The remaining ‘non-record’ data on hold can be searched and migrated to a preservation repository so that legacy systems can be decommissioned.

Going forward – Smart Data

Although we have focused on the cost and risk posed by legacy landfills, the ‘smart data’ future is much brighter. Your investment gives Legal, IT, Security, and Compliance stakeholders the tools to find, filter, and act on classified data with confidence. Counsel can navigate dashboards, ownership reports, classification graphs, and more to efficiently scope legal holds or discovery requests. Your retained counsel will receive rich, targeted collections that minimize review costs and speed up matter resolutions. You can shift the primary burden of assigning record retention from the employee to the classifier.FindFilterAct.png