eDiscovery Collection of Large File Shares: An Unaddressed Major Pain Point

By John Patzakis
January 6, 2021


One of the major unaddressed challenges for eDiscovery and other digital investigations involves very large file servers that host shared documents. The data volumes for these file shares is typically 10 to 20 Terabytes but can be much higher. Nearly every company and government agency maintain such large file shares, sometimes hundreds of them, depending on the size of the organization. The main purpose of a file share server is to enable multiple users to access the stored files and storage space on the file repository. These servers operate as the ubiquitous central storage place of internal company files for both collaboration and data archiving purposes. As such, they are heavily used and invariably contain numerous documents with highly relevant or otherwise important information.

Traditional eDiscovery collection methods fail to efficiently address these large file shares, due to significant logistical challenges. The data cannot simply be searched in place by traditional forensics tools or other crawling methods. Consequently, the data is typically copied in bulk and then migrated to another location for processing, where the data is finally indexed and then searched and culled. There are many problems with this approach.

First, it is very time-consuming and expensive. The process involves the over-collection of a massive amount of data, and it typically takes weeks for the copying and transfer of many terabytes of data to occur. Additionally, file shares are where companies’ most sensitive data typically resides. These repositories are often rife with trade secrets, intellectual property, and sensitive personal information. There is substantial risk in having such data copied in bulk and then shipped out of the company’s possession to a third party for eDiscovery processing.

A solution to these challenges is the utilization of index and search in-place technology. Indexing and search in-place in this context means that a software-based indexing technology (as opposed to an expensive and cumbersome stand-alone hardware appliance) is deployed directly onto the file server or an adjacent computing resource. This indexing occurs without a bulk data transfer of the data. Once indexed, the searches are performed in a few seconds, with complex Boolean operators, metadata filters and regular expression searches. The searches can be iterated and repeated without limitation, which is critical for large data sets.

Recently X1 released unique and unprecedented support for large file shares to address this exact eDiscovery workflow. X1 can be deployed directly onto a large file share in question, or to a virtual machine in near proximity to the target file servers or multiple file servers. Searches can be directed to a lone file server, or federated across multiple file servers and other endpoints, including those in different geographic locations across the enterprise. This functionality can be deployed remotely, on demand without physical access being required. This is essential for geographically diverse organizations including sensitive matters overseas. Once a targeted and responsive data set is identified through this in-place search and analysis process, the data can be exported directly to Relativity or a load file generated for upload to another review platform.

As mentioned, the searching can be full-text (including regular expression) or metadata only. In a recent matter involving over 100 Terabytes of data, X1 first generated a metadata and hash value only index, which allows for immediate de-duplication, file type filtering, and culling by date range and other parameters. This facilitated the culling of the data set by 70 percent as a first step, which then allowed for the full indexing of the data subset. This capability supports both eDiscovery and data governance and privacy workflows.

X1 large file share indexing service can be deployed on premise, or in the cloud. It can also address large volumes of cloud based data on service such as Dropbox and OneDrive.  This support of large file shares is an extension of the X1 Enterprise Collect platform.

For more information about this unique capability, please contact us for a demonstration.