Collection 2.0: Has Anything Improved Since 2008?

By Craig Carpenter
November 15, 2018

Way back in the nascent years of eDiscovery – let’s say the early 2000s – the industry was a hodge-podge of processes, many of which were still paper-based.   I distinctly remember working as the head of the legal department at a “network security” company (they’re now “cybersecurity” companies) in 2003 and asking custodians to print out anything relevant they could find on their computers and bring to me for review and safekeeping.  As embarrassing as it is to relive 15 years later, this was our preservation and collection 1.0 process at the time, and suffice it to say it was neither efficient nor scalable.

Flash forward several years to 2008 and the preservation and collection process had evolved significantly.  It was by then largely digital and comprised of 3 distinct stages that preceded being able to truly review and analyze ESI: preservation (including the issuance of legal holds), collection and processing.  Preservation and collection were handled one of two ways, namely via custodial self-preservation/collection (where custodians themselves collected potentially relevant ESI) or via the imaging of entire drives, laptops and/or servers.  The former method was supported by workflow vendors like PSS Systems (acquired by IBM), Zapproved and Exterro, while the latter was the province of Guidance Software, AccessData, and RoboCopy among others.  Lastly, the separate step of processing ESI so the enormous volumes of information collected could be staged for review and analysis in a separate platform (e.g. Concordance, Summation, iConect, Relativity, Recommind, etc.) utilized technology like Law, DiscoveryCracker, Nuix, and Clearwell (brilliantly marketed as “ECA”).  This whole workflow took between a month or so for a few custodians to many months or even a year for larger matters with more custodians, and cost from tens of thousands of dollars to many millions – all before an attorney could look at the first document to begin forming an evidentiary strategy of any sort.

That was 2008.  So how are the pre-review/analysis stages handled today, a full decade removed from the time “collection 2.0” became common practice?  Marginally better in areas, but otherwise pretty much the same way it was handled in 2008.  While custodial self-preservation and collection may be less commonplace than it was back then, companies and their service providers generally still image entire drives, laptops or servers with minimal filtering (e.g. simple date ranges and static inclusive/exclusive file-type filtering) which weeds out very little data, then bring these large overcollections to a processing “lab” of sorts where the large data sets are run through Nuix, Law or Relativity processing to enable them to finally be reviewed and analyzed by an attorney for the first time in a review or analysis platform.

While the performance of these collection and processing applications have improved over the last 8 years and pricing-per-GB of data processed has come down significantly, data volume growth has effectively kept the process at parity from 2008 such that it still typically takes months to go from legal hold to first ability to analyze potential evidence in a matter.

Here are the challenges with Collection 2.0 as it is currently conducted:

Massive overcollection.  When entire drives, computers and/or servers are imaged with very little substantive precision, far more ESI is preserved and made subject to the discovery process than is necessary.  In addition to the far greater cost resulting from this (see below), there is enormous, unnecessary risk created as well, e.g. in the production of privileged information or the ability of ancillary or unrelated matters to obtain evidence legally which they wouldn’t otherwise be able to.

Time.  From the time a custodian is first sent a legal hold notice until the time an attorney is first able to look at the ESI that has been collected is typically measured in months, but at minimum several weeks.  During this time, whatever strategy inside and outside counsel are able to formulate is bereft of any evidentiary support, which can have major ramifications on not just how the discovery process is handled, but the speed and cost of every matter.

Complexity.  Ask yourself a simple question: what value to any party does the processing stage deliver?  Why does it even exist?  The (oversimplified) point of discovery is to find and analyze (for oneself) and then share all relevant, reasonably accessible and non-privileged information with the other side.  Processing is a step that only exists because too much information is collected in the first place.  Collect precisely and comprehensively at the beginning and processing becomes unnecessary.

Cost.  Not surprisingly, a process that casts a net far more widely than is necessary before spending many weeks if not months whittling things back down again before anything becomes usable costs a lot more than it should, especially when attorney “first pass review” is factored in.  Simply put, everyone involved in the process except the client makes more money the larger the collection is, the greater volume which must be processed and reviewed and the longer everything takes (in collection, processing and hosting fees).  The loser in all of this?  The client.

Collection 2.0 is a process which remains essentially the same as it was in 2008.  As surprising as this may sound, it becomes more understandable when one considers that only the client is incentivized to improve it while all other players in the process have a major disincentive to do so.  But there is a better way, one which is far faster, less risky and much cheaper for clients: we’re calling it “Collection 3.0”, which will be the subject of a blog post from my colleague John Patzakis next week.