Time to reevaluate tools that rely on systemic data duplication
The European Union (EU) General Data Protection Regulation (GDPR) became effective in May 2018. To briefly review, the GDPR applies to the processing of “personal data” of EU citizens and residents (a.k.a. “data subjects”).” Personal data” is broadly defined to include “any information relating to an identified or identifiable natural person.” That could include email addresses and transactional business communications that are tied to a unique individual. GDPR is applicable to any organization that provides goods and services to individuals located in the EU on a regular enough basis, or maintains electronic records of their employees who are EU residents.
In additional to an overall framework of updated privacy policies and procedures, GDPR requires the ability to demonstrate and prove that personal data is being protected. Essential components for such compliance are data audit and discovery capabilities that allow companies to efficiently search and identify the information necessary, both proactively, and also reactively to respond to regulators and EU private citizen’s requests. As such, any GDPR compliance programs are ultimately hollow without consistent, operational execution and enforcement through an effective eDiscovery information governance platform.
However, some content management and archiving tool providers are repurposing their messaging with GDPR compliance. For example, an industry executive contact recently recounted a meeting with such a vendor, where their tool involved duplicating all of the emails and documents in the enterprise and then migrating all those copies to a central server cluster. That way, the tool could theoretically manage all the documents and emails centrally. Putting aside the difficulty of scaling up that process to manage and sync hundreds of terabytes of data in a medium-sized company (and petabytes in a Fortune 500), this anecdote underscores a fundamental flaw in tools that require systemic data duplication in order to search and manage content.
Under the GDPR, data needs to be minimized, not systematically duplicated en masse. It would be extremely difficult under such an architecture to sync up and remediate non-compliant documents and emails back at the original location. So at the end the day, this proposed solution would actually violate the GDPR by making duplicate copies of data sets that would inevitably include non-compliant information, without any real means to sync up remediation.
The same is true for the much of the traditional eDiscovery workflows, which require numerous steps involving data duplication at every turn. For instance, data collection is often accomplished through misapplied forensic tools that operate by a broadly collecting copies through over collection. As the court said in In re Ford Motor Company, 345 F.3d 1315 (11th Cir. 2003): “[E]xamination of a hard drive inevitably results in the production of massive amounts of irrelevant, and perhaps privileged, information…” Even worse, the collected data is then re-duplicated one or often two more times by the examiner for archival purposes. And then the data is sent downstream for processing, which results in even more data duplication. Load files are created for further transfers, which are also duplicated.
Chad Jones of D4 explains on a recent webinar and in his follow-on blog post about how such manual and inefficient handoffs throughout the discovery process greatly increase risk as well as cost. Like antiquated factories spewing tons of pollution, outdated eDiscovery processes spin out a lot of superfluous data duplication. Much of that data likely contains non-compliant information, thus “polluting” your organization, including through your eDiscovery services vendors, with increased GDPR and other regulatory risk.
In light of the above, when evaluating your compliance and eDiscovery software, organizations should keep in mind these five key requirements to keep in line with GDPR and good overall information governance:
- Search data in place. Data on laptops and file servers need to be in searched in place. Tools that require copy and migration to central locations to search and manage are part of the problem, not the solution.
- Delete Data in Place. GDPR requires that non-compliance data be deleted on demand. Purging data on managed archives does not suffice if other copies are on laptops, unmanaged servers and other unstructured sources. Your search in place solution should also delete in place.
- Data Minimization. GDPR requires that organizations minimize data as opposed to exploding data through mass duplication.
- Targeted and Efficient Data Collection: Only potentially relevant data should be collected for eDiscovery and data audits. Over-collection leads to much greater cost and risk.
- Seamless integration with attorney review platforms, to bypass the processing steps which requires manual handoffs and load files.
X1 Data Audit & Compliance is a ground-breaking platform that meets these criterion while enabling system-wide data discovery supporting GDPR and many other information governance requirements. Please visit here to learn more.