Machine Learning Algorithms Making the E-Discovery Process Much Less Painful

Machine learning is a type of artificial intelligence which evolved from the study of pattern recognition. Through the construction of algorithms, machines can make predictions on data versus following preprogrammed commands. What’s more, these algorithms can make independent decisions when new data is applied. As computer processing power improves, this technology can be applied to much larger data sets for reliable, repeatable results. This is the technology behind self-driving cars, fraud detection and data mining for online advertising.

The legal field has also found an application for machine learning algorithms to reduce the cost of document review.  Specifically, these algorithms have been applied to doc review solutions such as; Computer Assisted Review, Technology Assisted Review (TAR), and predictive coding. The growth of email and storing of documents electronically, gave rise to the e-discovery industry in the early 2000’s.  The exponential data growth is outpacing the traditional e-discovery methodology of filtering/searching to reduce data sets for review hosting. Pressure to reduce these costs, including pricey attorney review hours, had solution providers looking around for an answer. The potential application of TAR to legal document discovery quickly made it the industry “holy grail”. However, TAR solutions for e-discovery were mostly limited to industry conference circuits and talking heads. By 2012 the technology had matured enough begin to deliver on its promise to revolutionize doc review with a reliable alternative to traditional attorney review.

As technology advances, so does the expectations of courts when handling electronic documents. Judges now have little leniency for incomplete preservation or collection efforts, which has changed since the early 2000’s. Assuming a proper legal hold was in place, attorneys and third party providers are expected to provide all relevant email, user created e-docs, and associated metadata. If not, they face possible sanctions. Similarly, judges are just starting to push firms to use TAR to conserve legal spending resources, versus limiting discovery due to data volume. One example of a court ordered use of automated coding in a federal case occurred very recently. Judge Peck, from the United States Southern District Court of New York, in the case of Da Silva Moore v. Publicic Groupe et al., ordered litigants to use computer assisted review instead of only using keyword searches. He reasoned there was a likelihood of fewer mistakes in the document review process using this new technology since it could quickly recognize relevant information that simple keyword searching would miss. The future of machine learning is not only focused on reducing cost, but it also has the ability to push technology review past the limits of keyword filters.

This technology is not perfect or 100% accurate, but neither is attorney review. The biggest benefit on the horizon for law firms and their clients will be the application of TAR to an initial “first pass” review of docs. This removes the unnecessary hourly rates of young associates or doc review centers, who review data relevant to the case, or privileged communications. Removing this low hanging fruit conserves attorney billable hours for the more complex review issues of e-discovery. The attorneys benefit from avoiding doc intensive cases that take up a lot of their time, leaving bandwidth for other matters.